坑爹的标量子查询

最新推荐文章于 2024-03-03 06:30:00 发布

原创最新推荐文章于 2024-03-03 06:30:00 发布 · 564 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#postgresql #sql优化

PostgreSQL 专栏收录该内容

282 篇文章

订阅专栏

不知道大家在看pg的执行计划时主要关注什么呢，cost还是执行时间？

最近碰到一个奇葩的SQL，执行计划里cost显示1.9亿，时间执行时间只有1秒多，大致如下：

EXPLAIN ANALYZE select count(*) from (select (
-> SELECT
-> string_agg(lsd4.itemname, ',') bb
-> FROM
-> (
-> SELECT
-> lsd3.itemcode,
-> lsd3.itemname
-> FROM
-> lis_sampledetail lsd3
-> WHERE
-> lsd3.sampleinfo_id = lsi.id
-> AND lsd3.sample_status != '3'
-> GROUP BY
-> lsd3.itemcode,
-> lsd3.itemname
-> ) lsd4
-> ) item_names,
-> lsi.sample_receive_time receivetime
-> FROM
-> lis_sampleinfo lsi WHERE
-> lsi.sample_status != '3'
-> ) aa
-> WHERE
-> aa.item_names IS NOT NULL
-> and aa.receivetime >= '2022-02-14 00:00:00'
-> AND aa.receivetime <= '2022-03-01 23:59:59';

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=190804877.15..190804877.16 rows=1 width=8) (actual time=1601.530..1601.532 rows=1 loops=1)
	-> Index Scan using idx_lis_sampleinfo_rectime_orgcode on lis_sampleinfo lsi (cost=0.43..190804824.94 rows=20883 width=0) (actual time=7.170..1601.091 rows=3745 loops=1)
		Index Cond: ((sample_receive_time >= '2022-02-14 00:00:00'::timestamp without time zone) AND (sample_receive_time <= '2022-03-01 23:59:59'::timestamp without time zone))
		Filter: (((sample_status)::text <> '3'::text) AND ((SubPlan 1) IS NOT NULL))
		Rows Removed by Filter: 828369
		SubPlan 1
			-> Aggregate (cost=231.95..231.96 rows=1 width=32) (actual time=0.036..0.036 rows=1 loops=7316)
				-> Group (cost=231.91..231.92 rows=2 width=39) (actual time=0.036..0.036 rows=1 loops=7316)
					Group Key: lsd3.itemcode, lsd3.itemname
					-> Sort (cost=231.91..231.91 rows=2 width=39) (actual time=0.035..0.035 rows=1 loops=7316)
						Sort Key: lsd3.itemcode, lsd3.itemname
						Sort Method: quicksort Memory: 25kB
						-> Index Scan using idx_lis_sampledetail_ix01 on lis_sampledetail lsd3 (cost=0.56..231.90 rows=2 width=39) (actual time=0.033..0.033 rows=1 loops=7316)
							Index Cond: ((sampleinfo_id)::text = (lsi.id)::text)
							Filter: ((sample_status)::text <> '3'::text)
							Rows Removed by Filter: 1
Planning Time: 1.577 ms
Execution Time: 1601.832 ms
(18 rows)

Time: 1606.280 ms (00:01.606)

可以看到“Index Scan using idx_lis_sampleinfo_rectime_orgcode on lis_sampleinfo lsi”这一步的cost竟然高达190804824.94，这张表也就几十万条数据，实际执行时间也是很快，那么这高达1.9亿的cost是怎么计算出来的呢？

不难发现，上面这个SQL用到了标量子查询。而我们都知道标量子查询类似于一个天然的嵌套循环，而且驱动表固定为主表。例如下面这个SQL：

bill=# explain analyze select id,info,(select info from t2 where t2.id=t1.id) as name
bill-# from t1 ;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..2190194.00 rows=10000 width=69) (actual time=0.798..5411.748 rows=10000 loops=1)
   SubPlan 1
     ->  Seq Scan on t2  (cost=0.00..219.00 rows=1 width=33) (actual time=0.270..0.540 rows=1 loops=10000)
           Filter: (id = t1.id)
           Rows Removed by Filter: 9999
 Planning Time: 0.144 ms
 Execution Time: 5412.437 ms
(7 rows)

t1和t2表均只有10000条数据，Seq Scan on t1的cost也高达2190194.00，仔细看看也不难发现，这是因为包含了标量子查询中所有循环的总cost，即219*10000，再加上t1表自身全表扫描一次的cost。

但是前面那个SQL不同的是，Index Scan using idx_lis_sampledetail_ix01 on lis_sampledetail lsd3 (cost=0.56…231.90 rows=2 width=39) (actual time=0.033…0.033 rows=1 loops=7316) 中也仅仅循环了7316次，每次的cost是231.90，这么算的话也和显示的1.9亿相差甚远。

这又是为什么呢？这就是这个标量子查询的SQL坑爹的地方：将标量子查询作为where条件。

所以问题就在于：
Filter: (((sample_status)::text <> ‘3’::text) AND ((SubPlan 1) IS NOT NULL))
Rows Removed by Filter: 828369

这一步中由于使用子查询去过滤了828369条数据，但是pg执行计划中在计算cost时还是会将这部分算上，828369乘以每次执行的cost那么就可以的出1.9亿这个夸张的cost了。

但实际上，子查询并没有实际执行这么多次，只是执行了显示的7316次，所以时间上也就没有那么夸张了。

因此我们可以得出结论：当我们使用标量子查询作为查询条件时，pg执行计划中得出的cost并不是实际的cost，它不会去过滤掉被标量子查询过滤的部分，而实际执行却是会先进行过滤。