概述
未来的SQL优化实践汇总都总结在本文。
未经特殊说明,搜索引擎都是INNODB。
合理使用主表
每个sql都会至少有一个主表,如果有多个主表,考虑使用非基础表和数据量少的表当主表。
优化结果:慢sql执行最慢16秒,平均6秒。优化后0.03秒。
原因分析:慢sql前后差别大可能和基础表锁表锁记录有关。逻辑可行的话,需要谨慎考虑使用基础表当主表。
场景
涉及到的表
pop_window_result:200W数据
user: 10W数据
user_app:10W数据
待优化sql
SELECT u.phone, ua.subscribe, u.id as userId, ua.appId, ua.openId, u.heartbeatTime, t.lastGreetTime
FROM user u
LEFT JOIN user_app ua ON u.id = ua.userId AND ua.appId = 'wxa7cg1br52pb0951u'
LEFT JOIN (
SELECT pwr.targetUserId, max(pwr.updateTime) as lastGreetTime
FROM pop_window_result pwr
WHERE pwr.hadGreet = 1
GROUP BY pwr.targetUserId
) t ON t.targetUserId = u.id
WHERE u.id in (101705,100723,101715,82949,83045,83948,81270,81629,81789,71264,78176,78498,85712,86383,84318,84466,3787,4313,10130,6234,2634,22293,22764,24738,2450,2627,2728,3115,3621,3637,4026,10180,10277,4150,4947,5560,7458,8480,5734,6548,9106,9160,9289,9316,24417,24466,24471,24866,25167,20047,23005,17613,17881,18286,5752,81447,86350,84155,19380,81339,81545,83308,11008,88226,85059,85715,8396,24526,70108,82559,82684,74077,81188,81348,81755,81996,76852,72229,85074,88090,84265,85884,77393,84896,77746,78580,79559,81121,81129,81138,81148,81325,81431,83275,85060,85644,83386,88229,88359,78804,79335,84485,84736,87808,15985,33609,24469,77328,81156,81182,81318,81367,81501,81528,81544,12368,12952,13525,14428,19175,20128,32246,33000,30875,24536,24059,24123,22402,22517,69934,73197,73695,74394,70748,71034,85696,85911,86079,86130,86324,82480,82568,82569,86403,86719,86829,82575,82735,82807,82836,82855,83197,83785,87057,87189,87339,87552,88169,80902,81223,81269,81631,81981,77304,78713,82464,79461,81133,81160,81166,81179,81205,81228,81257,81322,31343,11130,19031,19270,19857,83444,83567,83872,83929,85616,12036,12359,14224,21258,31957,32715,32930,25201,33732,22553,22705,35458,34805,77077,69293)
执行计划
优化后sql
优化的SQL1
SELECT u.phone, ua.subscribe, u.id as userId, ua.appId, ua.openId, u.heartbeatTime, t.lastGreetTime
FROM user u
LEFT JOIN user_app ua ON u.id = ua.userId AND ua.appId = 'wxa7cg1br52pb0951u'
-- 把这个改成右连接
RIGHT JOIN (
SELECT pwr.targetUserId, max(pwr.updateTime) as lastGreetTime
FROM pop_window_result pwr
WHERE pwr.hadGreet = 1 AND pwr.targetUserId in (101705,100723,101715,82949,83045,83948,81270,81629,81789,71264,78176,78498,85712,86383,84318,84466,3787,4313,10130,6234,2634,22293,22764,24738,2450,2627,2728,3115,3621,3637,4026,10180,10277,4150,4947,5560,7458,8480,5734,6548,9106,9160,9289,9316,24417,24466,24471,24866,25167,20047,23005,17613,17881,18286,5752,81447,86350,84155,19380,81339,81545,83308,11008,88226,85059,85715,8396,24526,70108,82559,82684,74077,81188,81348,81755,81996,76852,72229,85074,88090,84265,85884,77393,84896,77746,78580,79559,81121,81129,81138,81148,81325,81431,83275,85060,85644,83386,88229,88359,78804,79335,84485,84736,87808,15985,33609,24469,77328,81156,81182,81318,81367,81501,81528,81544,12368,12952,13525,14428,19175,20128,32246,33000,30875,24536,24059,24123,22402,22517,69934,73197,73695,74394,70748,71034,85696,85911,86079,86130,86324,82480,82568,82569,86403,86719,86829,82575,82735,82807,82836,82855,83197,83785,87057,87189,87339,87552,88169,80902,81223,81269,81631,81981,77304,78713,82464,79461,81133,81160,81166,81179,81205,81228,81257,81322,31343,11130,19031,19270,19857,83444,83567,83872,83929,85616,12036,12359,14224,21258,31957,32715,32930,25201,33732,22553,22705,35458,34805,77077,69293)
GROUP BY pwr.targetUserId
) t ON t.targetUserId = u.id;
执行计划1
可以看到,扫描的行更少了,虽然有all,但是只有1k+的扫描,性能依然很高。
优化的SQL2
SELECT u.phone, ua.subscribe, u.id as userId, ua.appId, ua.openId, u.heartbeatTime, t.lastGreetTime
FROM (
-- 修改主表,user是基础表,可能会有锁表,而pwr是非基础表,且过滤后拥有更少的记录,因此用pwr
SELECT pwr.targetUserId, max(pwr.updateTime) as lastGreetTime
FROM pop_window_result pwr
WHERE pwr.hadGreet = 1 AND pwr.targetUserId in (101705,100723,101715,82949,83045,83948,81270,81629,81789,71264,78176,78498,85712,86383,84318,84466,3787,4313,10130,6234,2634,22293,22764,24738,2450,2627,2728,3115,3621,3637,4026,10180,10277,4150,4947,5560,7458,8480,5734,6548,9106,9160,9289,9316,24417,24466,24471,24866,25167,20047,23005,17613,17881,18286,5752,81447,86350,84155,19380,81339,81545,83308,11008,88226,85059,85715,8396,24526,70108,82559,82684,74077,81188,81348,81755,81996,76852,72229,85074,88090,84265,85884,77393,84896,77746,78580,79559,81121,81129,81138,81148,81325,81431,83275,85060,85644,83386,88229,88359,78804,79335,84485,84736,87808,15985,33609,24469,77328,81156,81182,81318,81367,81501,81528,81544,12368,12952,13525,14428,19175,20128,32246,33000,30875,24536,24059,24123,22402,22517,69934,73197,73695,74394,70748,71034,85696,85911,86079,86130,86324,82480,82568,82569,86403,86719,86829,82575,82735,82807,82836,82855,83197,83785,87057,87189,87339,87552,88169,80902,81223,81269,81631,81981,77304,78713,82464,79461,81133,81160,81166,81179,81205,81228,81257,81322,31343,11130,19031,19270,19857,83444,83567,83872,83929,85616,12036,12359,14224,21258,31957,32715,32930,25201,33732,22553,22705,35458,34805,77077,69293)
GROUP BY pwr.targetUserId
) t LEFT JOIN user u ON t.targetUserId = u.id
LEFT JOIN user_app ua ON u.id = ua.userId AND ua.appId = 'wxa7cg1br52pb0951u'
执行计划2
和执行计划1一样,只是换了个写法
使用int字段备份createTime记录天
背景
需要查询一个星期内有推荐给自己的人和当天被推荐次数超过5次的人。
优化后,耗时从8s降低到0.3s,提高了20+倍。
pop_window_result:200W+的数据量,每天增加30W数据,未来可能每天增加到百万数据,需要分表。
原sql
原
SELECT targetUserId as id
FROM pop_window_result
WHERE userId = 10721
AND hadGreet = 1
AND createTime >= '2022-3-23'
UNION ALL
SELECT targetUserId as id
FROM pop_window_result
WHERE hadGreet = 1
AND createTime >= '2022-3-30'
GROUP BY targetUserId
HAVING count(*) >= 5
执行计划
优化后的sql
给pop_window_result增加一个整型字段simpleCreateTime记录createTime对应的天,比如2022-3-31 1:01:05,simpleCreateTime记录成20220331。
SELECT targetUserId as id
FROM pop_window_result
WHERE userId = 10721
AND hadGreet = 1
AND simpleCreateTime >= 20220323
UNION ALL
SELECT targetUserId as id
FROM pop_window_result
WHERE hadGreet = 1
AND simpleCreateTime = 20220331
GROUP BY targetUserId
HAVING count(*) >= 5
执行计划
使用limit对索引优化
其中priTmplId 模板ID为索引,不唯一,但是关联的openId唯一
explain
update wx_subscribe_user set sendTimes = sendTimes + 1 where openId = "oT10x5Gkfb5R-3HZ1Vb1KAlPoq2M" and priTmplId = "hEVQfvr-S2PlJp7SxqZT7ijc3l_xMnvIAiUozcvUgyq"
执行计划
type=index,遍历了索引所有值,涉及到全表数据。
优化之后
虽然索引列priTmplId 不是唯一的,但是openId是唯一的,因此子查询使用limit 1,找到对应的ID,再更新
explain
update wx_subscribe_user set sendTimes = sendTimes + 1 where id = (
select *
from (
select id
from wx_subscribe_user
where openId = "oT10x5Gkfb5R-3HZ1Vb1KAlPoq2M" and priTmplId = "hEVQfvr-S2PlJp7SxqZT7ijc3l_xMnvIAiUozcvUgyq"
limit 1
)a
)
执行计划
使用了type=ref,大约扫描了2W+行,然后使用到了主键索引
给openId增加索引
openId和priTmplId 都分别建立全值索引,分析原来的sql性能
explain
update wx_subscribe_user set sendTimes = sendTimes + 1 where openId = "oT10x5Gkfb5R-3HZ1Vb1KAlPoq2M" and priTmplId = "hEVQfvr-S2PlJp7SxqZT7ijc3l_xMnvIAiUozcvUgyq"
执行计划
优化后的sql,分析性能
explain
update wx_subscribe_user set sendTimes = sendTimes + 1 where id = (
select *
from (
select id
from wx_subscribe_user
where openId = "oT10x5Gkfb5R-3HZ1Vb1KAlPoq2M" and priTmplId = "hEVQfvr-S2PlJp7SxqZT7ijc3l_xMnvIAiUozcvUgyq"
limit 1
)a
)
执行计划
优化后,使用了另外一个索引,并且预估扫描行为1,未使用主键索引
其它优化点
由于openId和priTmplId是长字符串,因此考虑分别建立前缀索引来进一步优化,但是需要考虑前缀索引的长度。
利用主键减轻深度分页带来的性能下降
如果深度分页需要扫描全表,这会导致查询性能低下。利用主键过滤掉绝大多数不需要的数据,得到优化效果
-- 全表扫描
EXPLAIN
select * from user_main LIMIT 20000, 20;
-- 优化一,使用内连接
EXPLAIN
select * from user_main um INNER JOIN (select id FROM user_main LIMIT 20000, 20) um1 ON um.id = um1.id;
-- 优化二
EXPLAIN
select * from user_main WHERE id >= (select id from user_main LIMIT 20000,1) LIMIT 20;
查询的字段类型和数据库定义的类型不一致导致索引失效
在一张500W左右数据量的表中,定义组合索引,都是varchar类型:
KEY `idx_form_to` (`fromUserId`, `toUserId`)
使用整数查询,走全表扫描,7s+返回结果
explain select * from im_message_log where fromUserId = 174 and toUserId = 6
使用字符串类型查询,走索引,10ms内返回结果
explain select * from im_message_log where fromUserId = '174' and toUserId = '6'
EXISTS和IN的性能对比
前提
假设EXISTS和IN都能命中索引。
结论
存在逻辑
子表数据量大的时候,IN会建立中间索引表,从而优化查询。此时虽然是子外表双层loop,面对大数据量的时候也快。
EXISTS可以直接对子表做哈希连接,速度很快,但是外表一般是全表或者全索引,会降低性能。
此时,需要根据实际SQL的执行计划做选择。
非逻辑
IN无法使用中间索引表做优化,因此双层loop带来的性能下降明显。
EXISTS虽然外表依然是全表或者全索引扫描,但是内表依然是哈希连接,可以快速过滤数据。
此时,EXISTS优先于IN使用
结论
- 如果是非(NOT)逻辑,EXISTS 优先于IN考虑
- 如果是存在逻辑,IN 和EXISTS需要根据实际SQL的执行计划做对比
验证
有两张表,user_main表和circle_user表,其中user_main表有10808条记录,circle_user表有条记录。分别对比“存在逻辑”和“非逻辑”的性能。
非逻辑
-- not 无法建立中间索引表,内外表走全表或者全索引
explain SELECT * FROM user_main um WHERE um.id NOT IN (SELECT cu.userId FROM circle_user cu);
-- 外表会走all,内表使用hash连接
explain SELECT * FROM user_main um WHERE NOT exists (SELECT 1 FROM circle_user cu WHERE um.id = cu.userId);
-- 调换表顺序,内外表走全表或者全索引
explain SELECT * FROM circle_user cu WHERE cu.userId NOT IN (SELECT id FROM user_main);
-- 外表会走all或全索引,内表使用hash连接
explain SELECT * FROM circle_user cu WHERE NOT EXISTS (SELECT 1 FROM user_main um WHERE um.id = cu.userId);
执行计划依次截图
存在逻辑
存在逻辑中,IN的外表可能走主键索引,而EXISTS还是走ALL。
-- 建立中间索引表,虽然是双层loop,但是由于中间索引表的关系,可能比EXISTS快,且外表可能用到主键索引
explain SELECT * FROM user_main um WHERE um.id IN (SELECT cu.userId FROM circle_user cu);
-- 外表会走all或全索引,数据量大了,可能in更快。
explain SELECT * FROM user_main um WHERE EXISTS(SELECT 1 FROM circle_user cu WHERE um.id = cu.userId);
-- 因为子查询数据量太少,所以没有建立中间索引表,mysql认为,不建立中间索引表也很快
explain SELECT * FROM circle_user cu WHERE cu.userId IN (SELECT id FROM user_main);
--
explain SELECT * FROM circle_user cu WHERE EXISTS (SELECT 1 FROM user_main um WHERE um.id = cu.userId);
执行计划依次截图
在长字符串上建立索引带来的性能下降
背景
有两张用户表,因为都是在微信平台下面的,如果是同一用户,unionId相同。现在需要找有多少用户,同时存在两张表中。
unionId在两张表中都存在默认值为空串的记录。
其中user_main_box 表有四万多记录,user_main 有四十多万记录。
通过unionId关联
这种方式得到结果非常慢,需要几十分钟。
SELECT
count(*)
FROM
user_main_box umb left join user_main um ON um.unionId = umb.unionId AND um.unionId != ''
WHERE
umb.sex = 0
AND umb.birthYear IS NOT NULL
AND umb.workCity != ''
AND umb.workProvince != ''
AND umb.marryStatus IS NOT NULL
AND umb.eduLevel IS NOT NULL
AND umb.incomeLevel IS NOT NULL
AND umb.height IS NOT NULL
AND umb.head != ''
AND umb.head != 'https://pubser-res.zhenai.com/other/temp/202108/11/15113819659832.png'
AND umb.head != 'https://pubser-res.zhenai.com/other/temp/202108/11/1511462086822.png'
AND umb.delFlag = 0
AND umb.id != 1;
执行计划
对比通过ID关联的性能
本次花费0.36s得到结果。
select count(*)
from user_main um
left join user_extend ue on um.id = ue.userId and um.unionId != ''
WHERE
um.sex = 0
AND um.birthYear IS NOT NULL
AND um.workCity != ''
AND um.workProvince != ''
AND um.marryStatus IS NOT NULL
AND um.eduLevel IS NOT NULL
AND um.incomeLevel IS NOT NULL
AND um.height IS NOT NULL
AND um.head != ''
AND um.head != 'https://pubser-res.zhenai.com/other/temp/202108/11/15113819659832.png'
AND um.head != 'https://pubser-res.zhenai.com/other/temp/202108/11/1511462086822.png'
AND um.delFlag = 0
AND um.id != 1;
具备相同的执行计划,但是本次预计的影响行更多
结论
mysql对长字符串索引非常脆弱,建议尽量在整型字段上建立索引,尤其是需要涉及大范围关联的字段,一定不能使用长字符串索引。
如果非要使用长字符串做索引关联,应该在业务逻辑上做拆分。比如要查两张表中有多少相同用户,因为仅仅对有unionId值的用户才需要关联,所以以上左表可以先过滤掉unionId为空串的,再关联。
SELECT
count(*)
FROM
(select * from user_main_box ub where unionId != '') umb left join user_main um ON um.unionId = umb.unionId
WHERE
umb.sex = 0
AND umb.birthYear IS NOT NULL
AND umb.workCity != ''
AND umb.workProvince != ''
AND umb.marryStatus IS NOT NULL
AND umb.eduLevel IS NOT NULL
AND umb.incomeLevel IS NOT NULL
AND umb.height IS NOT NULL
AND umb.head != ''
AND umb.head != 'https://pubser-res.zhenai.com/other/temp/202108/11/15113819659832.png'
AND umb.head != 'https://pubser-res.zhenai.com/other/temp/202108/11/1511462086822.png'
AND umb.delFlag = 0
AND umb.id != 1;
执行计划