MySQL子查询(subqueries)优化——派生条件下推
作者:黄华亮,现任Oracle MySQL大中华区MySQL解决方案工程师,专注MySQL技术十余年,对MySQL和开源数据库有丰富的实践经验。
1.序言
最近遇到了不少MySQL性能优化的案例,都和子查询有关,今天就这个话题做一定的分析。
首先,很多数据库的从业者都会有一种感觉,MySQL的优化器较弱,这可能是固化在MySQL早期版本的认识,现如今,MySQL在8.0已经在优化器方面做了非常多的优化。而子查询的优化通常也会令DBA感受一些压力,通常DBA会建议研发不要写复杂的子查询SQL,但现实却经常打脸,一些框架封装生成的SQL或一些外采系统,改写SQL变得不太实际,因此对SQL上优化在关键时候也非常有效。今天我就子查询派生条件下推做一些讨论。
2.派生条件下推
2.1 何为派生条件下推,翻译官方的描述如下:
MySQL 8.0.22及更高版本支持符合条件的子查询的派生条件下推。对于如SELECT*FROM(SELECT i,j FROM t1)as dt WHERE i>常量之类的查询,在许多情况下,可以将外部WHERE条件下推到派生表,在这种情况下,会导致SELECT*FROM(SELECT i、j FROM t1 WHERE i>常量)as dt。当派生表无法合并到外部查询中时(例如:如果派生表使用聚合),将外部WHERE条件下推到派生表应该会减少需要处理的行数,从而加快查询的执行。 https://dev.mysql.com/doc/refman/8.0/en/derived-condition-pushdown-optimization.html
派生条件下推一句话理解即为:外查询与派生表相关的条件会被推入到派生表中作为条件,以减少处理的数据行数,加速查询速度。
派生表 (Derived Table):派生表是在查询FROM子句的范围内生成表的表达式。例如,SELECT语句FROM子句中的子查询是派生表:
代码语言:javascript复制SELECT ... FROM (subquery) [AS] tb_alias_name ...
JSON_TABLE()函数生成一个表,并提供另一种创建派生表的方法:
代码语言:javascript复制SELECT * FROM JSON_TABLE(arg_list) [AS] tb_alias_name ...
[AS]tb_alias_name子句是必需的,因为FROM子句中的每个表都必须有一个名称。派生表中的任何列都必须具有唯一的名称。或者tb_alias_name后面可以跟一个带括号的派生表列名称列表:
代码语言:javascript复制SELECT ... FROM (subquery) [AS] tb_alias_name (col_list) ...
col_list的列的数目必须与subquery列的数目相同。
*派生表可以返回标量、列、行或表。
2.2 在以下情况下,可以将外部WHERE条件下推到派生的物化表中:
- 当派生表不使用聚合或窗口函数时,可以直接向下推送外部WHERE条件。这包括WHERE条件,该条件具有用AND、OR或两者连接的多个谓词。
例如:SELECT*FROM(SELECT f1,f2 FROM t1)AS dt WHERE f1<3 AND f2>11;
被重写为: SELECT f1,f2 FROM(SELECT f1,f2 FROM t1 WHERE f1<3 AND f2>11)AS dt。
- 当派生表具有GROUP BY并且不使用窗口函数时,引用一个或多个不属于GROUP BY的列的外部WHERE条件可以作为HAVING条件下推到派生表。
例如:SELECT * FROM (SELECT i, j, SUM(k) AS sum FROM t1 GROUP BY i, j) AS dt WHERE sum > 100;
通过物化条件下推重写为:SELECT * FROM (SELECT i, j, SUM(k) AS sum FROM t1 GROUP BY i, j HAVING sum > 100) AS dt.
- 当派生表使用GROUP BY并且外部WHERE条件中的列是GROUP BY列时,引用这些列的WHERE条件可以直接下推到派生表。
例如:SELECT * FROM (SELECT i,j, SUM(k) AS sum FROM t1 GROUP BY i,j) AS dt WHERE i > 10;
被重写为: SELECT * FROM (SELECT i,j, SUM(k) AS sum FROM t1 WHERE i > 10 GROUP BY i,j) AS dt.
2.3 如何启用派生条件下推?
1.优化器开关系统变量derived_condition_pushdown(8.0.22新增此变量)必须设置为on,这也是默认设置。如果优化器开关禁用了此优化,则可以使用DERIVED_CONDITION_PUSHDOWN优化器提示为特定查询启用此优化。要禁用给定查询的优化,请使用NO_DERIVED_CONDITION_PUSHDOWN优化器提示。
2.hints
对派生条件下推,MySQL提供了以下Hints:
- /* NO_DERIVED_CONDITION_PUSHDOWN(dervied table name) */
- /* DERIVED_CONDITION_PUSHDOWN(dervied table name) */
因hints是定义在SQL级别,优先级高于优化器变量的设置。
代码语言:javascript复制mysql> explain select /* NO_DERIVED_CONDITION_PUSHDOWN(a) */ a.c, a.pad, a.k from
-> (select k, c, pad from sbtest1
-> union all
-> select k, c, pad from sbtest2) a
-> where k=10739884;
---- ------------- ------------ ------------ ------ --------------- ------------- --------- ------- ---------- ---------- -------
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---- ------------- ------------ ------------ ------ --------------- ------------- --------- ------- ---------- ---------- -------
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 4 | const | 10 | 100.00 | NULL |
| 2 | DERIVED | sbtest1 | NULL | ALL | NULL | NULL | NULL | NULL | 39402496 | 100.00 | NULL |
| 3 | UNION | sbtest2 | NULL | ALL | NULL | NULL | NULL | NULL | 39456098 | 100.00 | NULL |
---- ------------- ------------ ------------ ------ --------------- ------------- --------- ------- ---------- ---------- -------
3 rows in set, 1 warning (0.00 sec)
mysql> show warnings;
------- ------ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Level | Code | Message |
------- ------ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Note | 1003 | /* select#1 */ select /* NO_DERIVED_CONDITION_PUSHDOWN(`a`@`select#1`) */ `a`.`c` AS `c`,`a`.`pad` AS `pad`,`a`.`k` AS `k` from (/* select#2 */ select `demo`.`sbtest1`.`k` AS `k`,`demo`.`sbtest1`.`c` AS `c`,`demo`.`sbtest1`.`pad` AS `pad` from `demo`.`sbtest1` union all /* select#3 */ select `demo`.`sbtest2`.`k` AS `k`,`demo`.`sbtest2`.`c` AS `c`,`demo`.`sbtest2`.`pad` AS `pad` from `demo`.`sbtest2`) `a` where (`a`.`k` = 10739884) |
------- ------ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 row in set (0.00 sec)
2.4 派生条件下推演示
以下演示基于如下MySQL版本及数据测试:
- MySQL version: 8.0.34
- Data Size: 2张表,单表都约9.1GB,约 39,402,496 行。
- 表元数据如下:
mysql> show create table sbtest1G
*************************** 1. row ***************************
Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
`id` int NOT NULL AUTO_INCREMENT,
`k` int NOT NULL DEFAULT '0',
`c` char(120) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
`pad` char(60) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `idx_k` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=39907018 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
1 row in set (0.00 sec)
mysql> show create table sbtest2G
*************************** 1. row ***************************
Table: sbtest2
Create Table: CREATE TABLE `sbtest2` (
`id` int NOT NULL AUTO_INCREMENT,
`k` int NOT NULL DEFAULT '0',
`c` char(120) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
`pad` char(60) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `idx_k` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=39989509 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
1 row in set (0.00 sec)
优化器派生条件下推变量derived_condition_pushdown默认开启,如下:
代码语言:javascript复制mysql> SELECT @@optimizer_switchG
*************************** 1. row ***************************
@@optimizer_switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on,use_invisible_indexes=off,skip_scan=on,hash_join=on,subquery_to_derived=off,prefer_ordering_index=on,hypergraph_optimizer=off,derived_condition_pushdown=on
1 row in set (0.00 sec)
1)本文2.2中的第一种情况
代码语言:javascript复制mysql> explain SELECT * FROM (SELECT k, c, pad FROM sbtest1) AS dt WHERE k=10739884;
---- ------------- --------- ------------ ------ --------------- ------- --------- ------- ------ ---------- -------
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---- ------------- --------- ------------ ------ --------------- ------- --------- ------- ------ ---------- -------
| 1 | SIMPLE | sbtest1 | NULL | ref | idx_k | idx_k | 4 | const | 1 | 100.00 | NULL |
---- ------------- --------- ------------ ------ --------------- ------- --------- ------- ------ ---------- -------
1 row in set, 1 warning (0.01 sec)
#-- 查看到原sql被rewrite。
mysql> show warnings;
------- ------ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Level | Code | Message |
------- ------ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Note | 1003 | /* select#1 */ select `demo`.`sbtest1`.`k` AS `k`,`demo`.`sbtest1`.`c` AS `c`,`demo`.`sbtest1`.`pad` AS `pad` from `demo`.`sbtest1` where (`demo`.`sbtest1`.`k` = 10739884) |
------- ------ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 row in set (0.01 sec)
mysql> SELECT * FROM (SELECT k, c, pad FROM sbtest1) AS dt WHERE k=10739884;
---------- ------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------
| k | c | pad |
---------- ------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------
| 10739884 | 03816723415-61774461890-10431645322-98253769003-42610075148-02216493313-28973989001-60586458076-35045965632-83190664828 | 81783953567-06379628259-87543933402-92690557354-81683285060 |
---------- ------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------
1 row in set (0.00 sec)
2)本文2.2中的第二种情况
代码语言:javascript复制mysql> desc SELECT * FROM (SELECT k, SUM(k) AS sum FROM sbtest1 GROUP BY k) AS a WHERE sum < 10;
---- ------------- ------------ ------------ ------- --------------- ------- --------- ------ ---------- ---------- -------------
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---- ------------- ------------ ------------ ------- --------------- ------- --------- ------ ---------- ---------- -------------
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 39402496 | 100.00 | NULL |
| 2 | DERIVED | sbtest1 | NULL | index | idx_k | idx_k | 4 | NULL | 39402496 | 100.00 | Using index |
---- ------------- ------------ ------------ ------- --------------- ------- --------- ------ ---------- ---------- -------------
2 rows in set, 1 warning (0.01 sec)
mysql> show warnings;
------- ------ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Level | Code | Message |
------- ------ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Note | 1003 | /* select#1 */ select `a`.`k` AS `k`,`a`.`sum` AS `sum` from (/* select#2 */ select `demo`.`sbtest1`.`k` AS `k`,sum(`demo`.`sbtest1`.`k`) AS `sum` from `demo`.`sbtest1` group by `demo`.`sbtest1`.`k` having (sum(`demo`.`sbtest1`.`k`) < 10)) `a` |
------- ------ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 row in set (0.00 sec)
3)本文2.2中的第三种情况
代码语言:javascript复制mysql> explain SELECT * FROM (SELECT k, SUM(id) AS sum FROM sbtest1 GROUP BY k, c) AS a WHERE k =10739884;
---- ------------- ------------ ------------ ------ --------------- ------- --------- ------- ------ ---------- -----------------
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---- ------------- ------------ ------------ ------ --------------- ------- --------- ------- ------ ---------- -----------------
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | NULL |
| 2 | DERIVED | sbtest1 | NULL | ref | idx_k | idx_k | 4 | const | 1 | 100.00 | Using temporary |
---- ------------- ------------ ------------ ------ --------------- ------- --------- ------- ------ ---------- -----------------
2 rows in set, 1 warning (0.00 sec)
mysql> show warnings;
------- ------ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Level | Code | Message |
------- ------ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Note | 1003 | /* select#1 */ select `a`.`k` AS `k`,`a`.`sum` AS `sum` from (/* select#2 */ select `demo`.`sbtest1`.`k` AS `k`,sum(`demo`.`sbtest1`.`id`) AS `sum` from `demo`.`sbtest1` where (`demo`.`sbtest1`.`k` = 10739884) group by `demo`.`sbtest1`.`k`,`demo`.`sbtest1`.`c`) `a` |
------- ------ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 row in set (0.00 sec)
mysql> SELECT * FROM (SELECT k, SUM(id) AS sum FROM sbtest1 GROUP BY k, c) AS a WHERE k =10739884;
---------- --------
| k | sum |
---------- --------
| 10739884 | 528507 |
---------- --------
1 row in set (0.00 sec)
4)带union的派生表
下面我们对带有union的派生表测试如下:
SQL statement:
代码语言:javascript复制select a.c, a.pad, a.k from
(select k, c, pad from sbtest1
union all
select k, c, pad from sbtest2) a
where k=10739884;
执行计划及执行结果如下:
代码语言:javascript复制mysql> explain select sql_no_cache a.c, a.pad, a.k from
-> (select k, c, pad from sbtest1
-> union all
-> select k, c, pad from sbtest2) a
-> where k=10739884;
---- ------------- ------------ ------------ ------ --------------- ------- --------- ------- ------ ---------- -------
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---- ------------- ------------ ------------ ------ --------------- ------- --------- ------- ------ ---------- -------
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | NULL |
| 2 | DERIVED | sbtest1 | NULL | ref | idx_k | idx_k | 4 | const | 1 | 100.00 | NULL |
| 3 | UNION | sbtest2 | NULL | ref | idx_k | idx_k | 4 | const | 1 | 100.00 | NULL |
---- ------------- ------------ ------------ ------ --------------- ------- --------- ------- ------ ---------- -------
3 rows in set, 2 warnings (0.00 sec)
mysql> show warnings;
--------- ------ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Level | Code | Message |
--------- ------ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Warning | 1681 | 'SQL_NO_CACHE' is deprecated and will be removed in a future release. |
| Note | 1003 | /* select#1 */ select `a`.`c` AS `c`,`a`.`pad` AS `pad`,`a`.`k` AS `k` from (/* select#2 */ select `demo`.`sbtest1`.`k` AS `k`,`demo`.`sbtest1`.`c` AS `c`,`demo`.`sbtest1`.`pad` AS `pad` from `demo`.`sbtest1` where (`demo`.`sbtest1`.`k` = 10739884) union all /* select#3 */ select `demo`.`sbtest2`.`k` AS `k`,`demo`.`sbtest2`.`c` AS `c`,`demo`.`sbtest2`.`pad` AS `pad` from `demo`.`sbtest2` where (`demo`.`sbtest2`.`k` = 10739884)) `a` |
--------- ------ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2 rows in set (0.00 sec)
mysql> select a.c, a.pad, a.k from
-> (select k, c, pad from sbtest1
-> union all
-> select k, c, pad from sbtest2) a
-> where k=10739884;
------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------- ----------
| c | pad | k |
------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------- ----------
| 03816723415-61774461890-10431645322-98253769003-42610075148-02216493313-28973989001-60586458076-35045965632-83190664828 | 81783953567-06379628259-87543933402-92690557354-81683285060 | 10739884 |
------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------- ----------
1 row in set (0.00 sec)
可以看到在对于union的派生表,通过下推派生条件做了优化。
由此可得到结论:
8.0.22 开始支持派生条件pushdown,但是到了8.0.29 才开始支持union 派生条件pushdown(除了下面2.5第一条中提到的2点不支持的union)。因8.0.29 不提供下载,原则上需要8.0.30 及以上版本。
2.5 派生条件下推的限制
- 如果派生表包含UNION,则无法使用优化。但是MySQL 8.0.29中取消了此限制(2.4中的演示有验证这一点)。
以下2种情况带有union的派生表不能派生条件下推。
- 如果作为UNION一部分的任何物化派生表是递归通用表表达式,则条件下推不能与UNION查询一起使用(请参见递归通用表表达:https://dev.mysql.com/doc/refman/8.0/en/with.html#common-table-expressions-recursive)。
- 不能将包含不确定性表达式的条件下推到派生表中。
- 派生表不能使用limit子句。
mysql> explain select * from (select k, c, pad from sbtest1 limit 10) as a where k=10739884;
---- ------------- ------------ ------------ ------ --------------- ------------- --------- ------- ---------- ---------- -------
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---- ------------- ------------ ------------ ------ --------------- ------------- --------- ------- ---------- ---------- -------
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 4 | const | 1 | 100.00 | NULL |
| 2 | DERIVED | sbtest1 | NULL | ALL | NULL | NULL | NULL | NULL | 39402496 | 100.00 | NULL |
---- ------------- ------------ ------------ ------ --------------- ------------- --------- ------- ---------- ---------- -------
2 rows in set, 1 warning (0.00 sec)
mysql> show warnings;
------- ------ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Level | Code | Message |
------- ------ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Note | 1003 | /* select#1 */ select `a`.`k` AS `k`,`a`.`c` AS `c`,`a`.`pad` AS `pad` from (/* select#2 */ select `demo`.`sbtest1`.`k` AS `k`,`demo`.`sbtest1`.`c` AS `c`,`demo`.`sbtest1`.`pad` AS `pad` from `demo`.`sbtest1` limit 10) `a` where (`a`.`k` = 10739884) |
------- ------ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 row in set (0.01 sec)
- 条件包含子查询,不能推入到派生表。
- 如果派生表是外部联接的内部表,则无法使用优化。
- 如果物化派生表是通用表表达式,则如果多次引用该表,则不会向下推送条件。
- 如果条件的形式为derived_column>?,则可以向下推送使用参数的条件?。如果外部WHERE条件中的派生列是具有?在基础派生表中,不能向下推此条件。
- 如果派生表是一个视图,视图创建时使用了ALGORITHM=TEMPTABLE,则条件不会推入进视图。
- 从MySQL 8.0.28开始,如果派生表的SELECT包含对用户变量的任何赋值,则不能向下推条件到派生表。(Bug#104918)
关于MySQL对于SQL子查询派生条件下推到派生表的优化,这次就暂时到这里。后续会介绍更多关于MySQL中对于SQL子查询的其它优化方式,敬待后续文章。