mysql之sql优化实践汇总

原创已于 2022-03-31 16:17:19 修改 · 793 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#sql #mysql #数据库

于 2021-12-28 11:11:05 首次发布

数据库专栏收录该内容

8 篇文章

订阅专栏

本文概述了SQL优化策略，如合理选择主表、使用临时表减少锁表影响、日期字段备份与索引优化、长字符串处理和JOIN操作技巧。优化案例包括慢查询优化、索引应用和数据关联效率提升，使查询速度从16秒降至0.03秒，提升显著。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

概述

未来的SQL优化实践汇总都总结在本文。
未经特殊说明，搜索引擎都是INNODB。

合理使用主表

每个sql都会至少有一个主表，如果有多个主表，考虑使用非基础表和数据量少的表当主表。
优化结果：慢sql执行最慢16秒，平均6秒。优化后0.03秒。
原因分析：慢sql前后差别大可能和基础表锁表锁记录有关。逻辑可行的话，需要谨慎考虑使用基础表当主表。

场景

涉及到的表
pop_window_result：200W数据
user： 10W数据
user_app：10W数据

待优化sql

SELECT u.phone, ua.subscribe, u.id as userId, ua.appId, ua.openId, u.heartbeatTime, t.lastGreetTime        
FROM user u            
LEFT JOIN user_app ua ON u.id = ua.userId AND ua.appId = 'wxa7cg1br52pb0951u'            
LEFT JOIN (   

SELECT pwr.targetUserId, max(pwr.updateTime) as lastGreetTime   
FROM pop_window_result pwr  
 WHERE pwr.hadGreet = 1 
 GROUP BY pwr.targetUserId
 
 ) t ON t.targetUserId = u.id  
 
 WHERE u.id in (101705,100723,101715,82949,83045,83948,81270,81629,81789,71264,78176,78498,85712,86383,84318,84466,3787,4313,10130,6234,2634,22293,22764,24738,2450,2627,2728,3115,3621,3637,4026,10180,10277,4150,4947,5560,7458,8480,5734,6548,9106,9160,9289,9316,24417,24466,24471,24866,25167,20047,23005,17613,17881,18286,5752,81447,86350,84155,19380,81339,81545,83308,11008,88226,85059,85715,8396,24526,70108,82559,82684,74077,81188,81348,81755,81996,76852,72229,85074,88090,84265,85884,77393,84896,77746,78580,79559,81121,81129,81138,81148,81325,81431,83275,85060,85644,83386,88229,88359,78804,79335,84485,84736,87808,15985,33609,24469,77328,81156,81182,81318,81367,81501,81528,81544,12368,12952,13525,14428,19175,20128,32246,33000,30875,24536,24059,24123,22402,22517,69934,73197,73695,74394,70748,71034,85696,85911,86079,86130,86324,82480,82568,82569,86403,86719,86829,82575,82735,82807,82836,82855,83197,83785,87057,87189,87339,87552,88169,80902,81223,81269,81631,81981,77304,78713,82464,79461,81133,81160,81166,81179,81205,81228,81257,81322,31343,11130,19031,19270,19857,83444,83567,83872,83929,85616,12036,12359,14224,21258,31957,32715,32930,25201,33732,22553,22705,35458,34805,77077,69293)

执行计划
在这里插入图片描述

优化后sql

优化的SQL1

SELECT u.phone, ua.subscribe, u.id as userId, ua.appId, ua.openId, u.heartbeatTime, t.lastGreetTime        
FROM user u            
LEFT JOIN user_app ua ON u.id = ua.userId AND ua.appId = 'wxa7cg1br52pb0951u'
-- 把这个改成右连接            
RIGHT JOIN (   

SELECT pwr.targetUserId, max(pwr.updateTime) as lastGreetTime   
FROM pop_window_result pwr  
 WHERE pwr.hadGreet = 1  AND pwr.targetUserId in (101705,100723,101715,82949,83045,83948,81270,81629,81789,71264,78176,78498,85712,86383,84318,84466,3787,4313,10130,6234,2634,22293,22764,24738,2450,2627,2728,3115,3621,3637,4026,10180,10277,4150,4947,5560,7458,8480,5734,6548,9106,9160,9289,9316,24417,24466,24471,24866,25167,20047,23005,17613,17881,18286,5752,81447,86350,84155,19380,81339,81545,83308,11008,88226,85059,85715,8396,24526,70108,82559,82684,74077,81188,81348,81755,81996,76852,72229,85074,88090,84265,85884,77393,84896,77746,78580,79559,81121,81129,81138,81148,81325,81431,83275,85060,85644,83386,88229,88359,78804,79335,84485,84736,87808,15985,33609,24469,77328,81156,81182,81318,81367,81501,81528,81544,12368,12952,13525,14428,19175,20128,32246,33000,30875,24536,24059,24123,22402,22517,69934,73197,73695,74394,70748,71034,85696,85911,86079,86130,86324,82480,82568,82569,86403,86719,86829,82575,82735,82807,82836,82855,83197,83785,87057,87189,87339,87552,88169,80902,81223,81269,81631,81981,77304,78713,82464,79461,81133,81160,81166,81179,81205,81228,81257,81322,31343,11130,19031,19270,19857,83444,83567,83872,83929,85616,12036,12359,14224,21258,31957,32715,32930,25201,33732,22553,22705,35458,34805,77077,69293)
 GROUP BY pwr.targetUserId
 
 ) t ON t.targetUserId = u.id;

执行计划1
可以看到，扫描的行更少了，虽然有all，但是只有1k+的扫描，性能依然很高。
在这里插入图片描述

优化的SQL2

 SELECT u.phone, ua.subscribe, u.id as userId, ua.appId, ua.openId, u.heartbeatTime, t.lastGreetTime
 FROM (
-- 修改主表，user是基础表，可能会有锁表，而pwr是非基础表，且过滤后拥有更少的记录，因此用pwr
SELECT pwr.targetUserId, max(pwr.updateTime) as lastGreetTime   
FROM pop_window_result pwr  
 WHERE pwr.hadGreet = 1  AND pwr.targetUserId in (101705,100723,101715,82949,83045,83948,81270,81629,81789,71264,78176,78498,85712,86383,84318,84466,3787,4313,10130,6234,2634,22293,22764,24738,2450,2627,2728,3115,3621,3637,4026,10180,10277,4150,4947,5560,7458,8480,5734,6548,9106,9160,9289,9316,24417,24466,24471,24866,25167,20047,23005,17613,17881,18286,5752,81447,86350,84155,19380,81339,81545,83308,11008,88226,85059,85715,8396,24526,70108,82559,82684,74077,81188,81348,81755,81996,76852,72229,85074,88090,84265,85884,77393,84896,77746,78580,79559,81121,81129,81138,81148,81325,81431,83275,85060,85644,83386,88229,88359,78804,79335,84485,84736,87808,15985,33609,24469,77328,81156,81182,81318,81367,81501,81528,81544,12368,12952,13525,14428,19175,20128,32246,33000,30875,24536,24059,24123,22402,22517,69934,73197,73695,74394,70748,71034,85696,85911,86079,86130,86324,82480,82568,82569,86403,86719,86829,82575,82735,82807,82836,82855,83197,83785,87057,87189,87339,87552,88169,80902,81223,81269,81631,81981,77304,78713,82464,79461,81133,81160,81166,81179,81205,81228,81257,81322,31343,11130,19031,19270,19857,83444,83567,83872,83929,85616,12036,12359,14224,21258,31957,32715,32930,25201,33732,22553,22705,35458,34805,77077,69293)
 GROUP BY pwr.targetUserId

 ) t LEFT JOIN user u ON t.targetUserId = u.id
 LEFT JOIN user_app ua ON u.id = ua.userId AND ua.appId = 'wxa7cg1br52pb0951u'

执行计划2
和执行计划1一样，只是换了个写法
在这里插入图片描述

使用int字段备份createTime记录天

背景

需要查询一个星期内有推荐给自己的人和当天被推荐次数超过5次的人。
优化后，耗时从8s降低到0.3s，提高了20+倍。

pop_window_result：200W+的数据量，每天增加30W数据，未来可能每天增加到百万数据，需要分表。

原sql

原

SELECT targetUserId as id
FROM pop_window_result
WHERE userId = 10721
  AND hadGreet = 1
  AND createTime >= '2022-3-23'

UNION ALL

SELECT targetUserId as id
     FROM pop_window_result
     WHERE hadGreet = 1
       AND createTime >= '2022-3-30'
     GROUP BY targetUserId
     HAVING count(*) >= 5

执行计划
在这里插入图片描述

优化后的sql

给pop_window_result增加一个整型字段simpleCreateTime记录createTime对应的天，比如2022-3-31 1:01:05，simpleCreateTime记录成20220331。

SELECT targetUserId as id
FROM pop_window_result
WHERE userId = 10721
  AND hadGreet = 1
  AND simpleCreateTime >= 20220323

UNION ALL

SELECT targetUserId as id
FROM pop_window_result
WHERE hadGreet = 1
AND simpleCreateTime = 20220331
GROUP BY targetUserId
HAVING count(*) >= 5

执行计划
在这里插入图片描述

使用limit对索引优化

其中priTmplId 模板ID为索引，不唯一，但是关联的openId唯一

explain 
update wx_subscribe_user set sendTimes = sendTimes + 1 where openId = "oT10x5Gkfb5R-3HZ1Vb1KAlPoq2M" and priTmplId = "hEVQfvr-S2PlJp7SxqZT7ijc3l_xMnvIAiUozcvUgyq"

执行计划
type=index,遍历了索引所有值，涉及到全表数据。
在这里插入图片描述
优化之后
虽然索引列priTmplId 不是唯一的，但是openId是唯一的，因此子查询使用limit 1，找到对应的ID，再更新

explain
update wx_subscribe_user set sendTimes = sendTimes + 1 where id = (
    select *
    from (
        select id
        from wx_subscribe_user
        where openId = "oT10x5Gkfb5R-3HZ1Vb1KAlPoq2M" and priTmplId = "hEVQfvr-S2PlJp7SxqZT7ijc3l_xMnvIAiUozcvUgyq"
        limit 1
    )a
)

执行计划
使用了type=ref，大约扫描了2W+行，然后使用到了主键索引
在这里插入图片描述

给openId增加索引

openId和priTmplId 都分别建立全值索引，分析原来的sql性能

explain 
update wx_subscribe_user set sendTimes = sendTimes + 1 where openId = "oT10x5Gkfb5R-3HZ1Vb1KAlPoq2M" and priTmplId = "hEVQfvr-S2PlJp7SxqZT7ijc3l_xMnvIAiUozcvUgyq"

执行计划
在这里插入图片描述
优化后的sql，分析性能

explain 
update wx_subscribe_user set sendTimes = sendTimes + 1 where id = (
    select *
    from (
        select id
        from wx_subscribe_user
        where openId = "oT10x5Gkfb5R-3HZ1Vb1KAlPoq2M" and priTmplId = "hEVQfvr-S2PlJp7SxqZT7ijc3l_xMnvIAiUozcvUgyq"
        limit 1
    )a
)

执行计划
优化后，使用了另外一个索引，并且预估扫描行为1，未使用主键索引
在这里插入图片描述

其它优化点

由于openId和priTmplId是长字符串，因此考虑分别建立前缀索引来进一步优化，但是需要考虑前缀索引的长度。

利用主键减轻深度分页带来的性能下降

如果深度分页需要扫描全表，这会导致查询性能低下。利用主键过滤掉绝大多数不需要的数据，得到优化效果

	-- 全表扫描
	EXPLAIN
	select * from user_main LIMIT 20000, 20;
	-- 优化一，使用内连接
	EXPLAIN
	select * from user_main um INNER JOIN (select id FROM user_main LIMIT 20000, 20) um1 ON um.id = um1.id;
	-- 优化二
	EXPLAIN
	select * from user_main WHERE id >= (select id from user_main LIMIT 20000,1) LIMIT 20;

查询的字段类型和数据库定义的类型不一致导致索引失效

在一张500W左右数据量的表中，定义组合索引,都是varchar类型:

KEY `idx_form_to` (`fromUserId`, `toUserId`)

使用整数查询，走全表扫描，7s+返回结果

explain select * from im_message_log where fromUserId = 174 and toUserId = 6

使用字符串类型查询，走索引，10ms内返回结果

explain select * from im_message_log where fromUserId = '174' and toUserId = '6'

EXISTS和IN的性能对比

前提
假设EXISTS和IN都能命中索引。

结论

存在逻辑
子表数据量大的时候，IN会建立中间索引表，从而优化查询。此时虽然是子外表双层loop，面对大数据量的时候也快。
EXISTS可以直接对子表做哈希连接，速度很快，但是外表一般是全表或者全索引，会降低性能。
此时，需要根据实际SQL的执行计划做选择。
非逻辑
IN无法使用中间索引表做优化，因此双层loop带来的性能下降明显。
EXISTS虽然外表依然是全表或者全索引扫描，但是内表依然是哈希连接，可以快速过滤数据。
此时，EXISTS优先于IN使用
结论

如果是非(NOT)逻辑，EXISTS 优先于IN考虑
如果是存在逻辑，IN 和EXISTS需要根据实际SQL的执行计划做对比

验证
有两张表，user_main表和circle_user表，其中user_main表有10808条记录，circle_user表有条记录。分别对比“存在逻辑”和“非逻辑”的性能。

非逻辑

-- not 无法建立中间索引表，内外表走全表或者全索引
explain SELECT * FROM user_main um WHERE um.id NOT IN (SELECT cu.userId FROM circle_user cu);

-- 外表会走all,内表使用hash连接
explain SELECT * FROM user_main um WHERE  NOT exists (SELECT 1 FROM circle_user cu WHERE um.id = cu.userId);

-- 调换表顺序，内外表走全表或者全索引
explain SELECT * FROM circle_user cu WHERE cu.userId NOT IN (SELECT id FROM user_main);

-- 外表会走all或全索引,内表使用hash连接
explain SELECT * FROM circle_user cu WHERE NOT EXISTS (SELECT 1 FROM user_main um WHERE um.id = cu.userId);

执行计划依次截图
在这里插入图片描述

在这里插入图片描述

在这里插入图片描述
存在逻辑
存在逻辑中，IN的外表可能走主键索引，而EXISTS还是走ALL。

-- 建立中间索引表，虽然是双层loop，但是由于中间索引表的关系，可能比EXISTS快，且外表可能用到主键索引
explain SELECT * FROM user_main um WHERE um.id IN (SELECT cu.userId FROM circle_user cu);

-- 外表会走all或全索引，数据量大了，可能in更快。
explain SELECT * FROM user_main um WHERE  EXISTS(SELECT 1 FROM circle_user cu WHERE um.id = cu.userId);

-- 因为子查询数据量太少，所以没有建立中间索引表，mysql认为，不建立中间索引表也很快
explain SELECT * FROM circle_user cu WHERE cu.userId IN (SELECT id FROM user_main);

-- 
explain SELECT * FROM circle_user cu WHERE EXISTS (SELECT 1 FROM user_main um WHERE um.id = cu.userId);

执行计划依次截图
在这里插入图片描述

在长字符串上建立索引带来的性能下降

背景
有两张用户表，因为都是在微信平台下面的，如果是同一用户，unionId相同。现在需要找有多少用户，同时存在两张表中。

unionId在两张表中都存在默认值为空串的记录。

其中user_main_box 表有四万多记录，user_main 有四十多万记录。
通过unionId关联
这种方式得到结果非常慢，需要几十分钟。

SELECT
    count(*)
FROM
    user_main_box umb left join user_main um ON um.unionId = umb.unionId AND um.unionId != ''
WHERE
    umb.sex = 0
  AND umb.birthYear IS NOT NULL
  AND umb.workCity != ''
  AND umb.workProvince != ''
  AND umb.marryStatus IS NOT NULL
  AND umb.eduLevel IS NOT NULL
  AND umb.incomeLevel IS NOT NULL
  AND umb.height IS NOT NULL
  AND umb.head != ''
  AND umb.head != 'https://pubser-res.zhenai.com/other/temp/202108/11/15113819659832.png'
  AND umb.head != 'https://pubser-res.zhenai.com/other/temp/202108/11/1511462086822.png'
  AND umb.delFlag = 0
  AND umb.id != 1;

执行计划
在这里插入图片描述
对比通过ID关联的性能
本次花费0.36s得到结果。

select count(*) 
from user_main um 
    left join user_extend ue on um.id = ue.userId and um.unionId != ''
WHERE
    um.sex = 0
  AND um.birthYear IS NOT NULL
  AND um.workCity != ''
  AND um.workProvince != ''
  AND um.marryStatus IS NOT NULL
  AND um.eduLevel IS NOT NULL
  AND um.incomeLevel IS NOT NULL
  AND um.height IS NOT NULL
  AND um.head != ''
  AND um.head != 'https://pubser-res.zhenai.com/other/temp/202108/11/15113819659832.png'
  AND um.head != 'https://pubser-res.zhenai.com/other/temp/202108/11/1511462086822.png'
  AND um.delFlag = 0
  AND um.id != 1;

具备相同的执行计划，但是本次预计的影响行更多
在这里插入图片描述
结论
mysql对长字符串索引非常脆弱，建议尽量在整型字段上建立索引，尤其是需要涉及大范围关联的字段，一定不能使用长字符串索引。

如果非要使用长字符串做索引关联，应该在业务逻辑上做拆分。比如要查两张表中有多少相同用户，因为仅仅对有unionId值的用户才需要关联，所以以上左表可以先过滤掉unionId为空串的，再关联。

SELECT
    count(*)
FROM
    (select * from user_main_box ub where unionId != '') umb left join user_main um ON um.unionId = umb.unionId
WHERE
    umb.sex = 0
  AND umb.birthYear IS NOT NULL
  AND umb.workCity != ''
  AND umb.workProvince != ''
  AND umb.marryStatus IS NOT NULL
  AND umb.eduLevel IS NOT NULL
  AND umb.incomeLevel IS NOT NULL
  AND umb.height IS NOT NULL
  AND umb.head != ''
  AND umb.head != 'https://pubser-res.zhenai.com/other/temp/202108/11/15113819659832.png'
  AND umb.head != 'https://pubser-res.zhenai.com/other/temp/202108/11/1511462086822.png'
  AND umb.delFlag = 0
  AND umb.id != 1;