背景
近期有用户在咨询查询的问题,发现一个比较典型的案例,SQL 语句无法选择正确的索引,导致查询效率偏低,正好借这个案例来学习一下查询优化器的知识。所使用的的 SQL 语句、数据以及表结构均已脱敏,
问题描述
使用 MariaDB 10.3,在一张业务表中,包含了 col1 和 col2 的唯一索引,使用如下语句查询时,只用了唯一索引的第一列,而没有用到第二列。
select col2 from t1 where col1 = 1 and col2 >= 4000000 order by col2 limit 50;
从常理来看,这个查询用上 col1 和 col2 的联合索引应该是最优解。而且衍生的相关查询均无法利用联合索引的两个列。
代码语言:txt复制select col2 from t1 where col1 = 1 and col2 >= 4000000 order by col2;
select col2 from t1 where col1 = 1 and col2 >= 4000000 limit 50;
select col2 from t1 where col1 = 1 and col2 >= 4000000;
模拟验证
部署 MariaDB 10.3.34 官方版本,构造测试环境用的语句如下:
代码语言:txt复制CREATE TABLE `t1` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`col1` int(11) NOT NULL,
`col2` bigint(20) NOT NULL,
`col3` varchar(20) NOT NULL DEFAULT '',
`col4` varchar(20) DEFAULT NULL,
`col5` varchar(100) DEFAULT NULL,
`col6` varchar(20) DEFAULT NULL,
`col7` varchar(100) DEFAULT NULL,
`col8` varchar(100) DEFAULT NULL,
`col9` varchar(100) DEFAULT NULL,
`col10` tinyint(1) NOT NULL DEFAULT 0,
`col11` timestamp NOT NULL DEFAULT current_timestamp(),
`col12` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
PRIMARY KEY (`id`,`col1`),
UNIQUE KEY `unq_col1_col2` (`col1`,`col2`),
KEY `idx_1` (`col2`),
KEY `idx_2` (`col5`),
KEY `idx_3` (`col1`,`col4`),
KEY `idx_4` (`col1`,`col11`),
KEY `idx_5` (`col11`),
KEY `idx_6` (`col12`)
) ENGINE=InnoDB;
insert into t1(col1,col2) values(1,1);
insert into t1 (col2, col1) select col2 1, col1 from t1 where col1 = 1;
insert into t1 (col2, col1) select col2 2, col1 from t1 where col1 = 1;
insert into t1 (col2, col1) select col2 4, col1 from t1 where col1 = 1;
insert into t1 (col2, col1) select col2 8, col1 from t1 where col1 = 1;
insert into t1 (col2, col1) select col2 16, col1 from t1 where col1 = 1;
insert into t1 (col2, col1) select col2 32, col1 from t1 where col1 = 1;
insert into t1 (col2, col1) select col2 64, col1 from t1 where col1 = 1;
insert into t1 (col2, col1) select col2 128, col1 from t1 where col1 = 1;
insert into t1 (col2, col1) select col2 256, col1 from t1 where col1 = 1;
insert into t1 (col2, col1) select col2 512, col1 from t1 where col1 = 1;
insert into t1 (col2, col1) select col2 1024, col1 from t1 where col1 = 1;
insert into t1 (col2, col1) select col2 2048, col1 from t1 where col1 = 1;
insert into t1 (col2, col1) select col2 4096, col1 from t1 where col1 = 1;
insert into t1 (col2, col1) select col2 8192, col1 from t1 where col1 = 1;
insert into t1 (col2, col1) select col2 16384, col1 from t1 where col1 = 1;
insert into t1 (col2, col1) select col2 32768, col1 from t1 where col1 = 1;
insert into t1 (col2, col1) select col2 65536, col1 from t1 where col1 = 1;
insert into t1 (col2, col1) select col2 131072, col1 from t1 where col1 = 1;
insert into t1 (col2, col1) select col2 262144, col1 from t1 where col1 = 1;
insert into t1 (col2, col1) select col2 524288, col1 from t1 where col1 = 1;
insert into t1 (col2, col1) select col2 1048576, col1 from t1 where col1 = 1;
insert into t1 (col2, col1) select col2 2097152, col1 from t1 where col1 = 1;
insert into t1(col1,col2) values(10,10),(10,11),(10,12);
那么先看看执行计划:
代码语言:txt复制MariaDB [test]> explain select col2 from t1 where col1 = 1 and col2 >= 4000000 order by col2 limit 50;
------ ------------- ------- ------ --------------------------------- --------------- --------- ------- -------- --------------------------
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
------ ------------- ------- ------ --------------------------------- --------------- --------- ------- -------- --------------------------
| 1 | SIMPLE | t1 | ref | unq_col1_col2,idx_1,idx_3,idx_4 | unq_col1_col2 | 4 | const | 388450 | Using where; Using index |
------ ------------- ------- ------ --------------------------------- --------------- --------- ------- -------- --------------------------
1 row in set (0.000 sec)
MariaDB [test]> explain select col2 from t1 where col1 = 1 and col2 >= 4000000 order by col2;
------ ------------- ------- ------ --------------------------------- --------------- --------- ------- -------- --------------------------
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
------ ------------- ------- ------ --------------------------------- --------------- --------- ------- -------- --------------------------
| 1 | SIMPLE | t1 | ref | unq_col1_col2,idx_1,idx_3,idx_4 | unq_col1_col2 | 4 | const | 388450 | Using where; Using index |
------ ------------- ------- ------ --------------------------------- --------------- --------- ------- -------- --------------------------
1 row in set (0.000 sec)
MariaDB [test]> explain select col2 from t1 where col1 = 1 and col2 >= 4000000 limit 50;
------ ------------- ------- ------ --------------------------------- --------------- --------- ------- -------- --------------------------
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
------ ------------- ------- ------ --------------------------------- --------------- --------- ------- -------- --------------------------
| 1 | SIMPLE | t1 | ref | unq_col1_col2,idx_1,idx_3,idx_4 | unq_col1_col2 | 4 | const | 388450 | Using where; Using index |
------ ------------- ------- ------ --------------------------------- --------------- --------- ------- -------- --------------------------
1 row in set (0.000 sec)
MariaDB [test]> explain select col2 from t1 where col1 = 1 and col2 >= 4000000;
------ ------------- ------- ------ --------------------------------- --------------- --------- ------- -------- --------------------------
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
------ ------------- ------- ------ --------------------------------- --------------- --------- ------- -------- --------------------------
| 1 | SIMPLE | t1 | ref | unq_col1_col2,idx_1,idx_3,idx_4 | unq_col1_col2 | 4 | const | 388450 | Using where; Using index |
------ ------------- ------- ------ --------------------------------- --------------- --------- ------- -------- --------------------------
1 row in set (0.000 sec)
MariaDB [test]> explain select col2 from t1 where col1 = 1 and col2 >= 1 order by col2 limit 50;
------ ------------- ------- ------ --------------------------------- --------------- --------- ------- --------- --------------------------
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
------ ------------- ------- ------ --------------------------------- --------------- --------- ------- --------- --------------------------
| 1 | SIMPLE | t1 | ref | unq_col1_col2,idx_1,idx_3,idx_4 | unq_col1_col2 | 4 | const | 2090440 | Using where; Using index |
------ ------------- ------- ------ --------------------------------- --------------- --------- ------- --------- --------------------------
1 row in set (0.001 sec)
当变更查询条件,直到实际上结果集为空的时候,查询计划才会符合预期:
代码语言:txt复制MariaDB [test]> explain select col2 from t1 where col1 = 1 and col2 >= 5000000 order by col2 limit 50;
------ ------------- ------- ------- --------------------------------- --------------- --------- ------ ------ --------------------------
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
------ ------------- ------- ------- --------------------------------- --------------- --------- ------ ------ --------------------------
| 1 | SIMPLE | t1 | range | unq_col1_col2,idx_1,idx_3,idx_4 | unq_col1_col2 | 12 | NULL | 1 | Using where; Using index |
------ ------------- ------- ------- --------------------------------- --------------- --------- ------ ------ --------------------------
1 row in set (0.000 sec)
PS:随机生成的数据中,col2 的最大值为 4194304。
原因简析
由于 MariaDB 10.3 并没有 optimizer_trace,因此很难去准确判断查询优化器因为什么原因没有选择联合索引,那么采用通常的人为干预手段,去试试看联合索引的效果,看看是否会有较好的查询效率:
代码语言:txt复制MariaDB [test]> explain select col2 from t1 force index(unq_col1_col2) where col1 = 1 and col2 >= 4000000 order by col2 limit 50;
------ ------------- ------- ------- --------------- --------------- --------- ------ -------- --------------------------
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
------ ------------- ------- ------- --------------- --------------- --------- ------ -------- --------------------------
| 1 | SIMPLE | t1 | range | unq_col1_col2 | unq_col1_col2 | 12 | NULL | 388450 | Using where; Using index |
------ ------------- ------- ------- --------------- --------------- --------- ------ -------- --------------------------
1 row in set (0.000 sec)
MariaDB [test]> select col2 from t1 force index(unq_col1_col2) where col1 = 1 and col2 >= 4000000 order by col2 limit 50;
---------
| col2 |
---------
| 4000000 |
| 4000001 |
......
......
| 4000049 |
---------
50 rows in set (0.000 sec)
MariaDB [test]> select col2 from t1 where col1 = 1 and col2 >= 4000000 order by col2 limit 50;
---------
| col2 |
---------
| 4000000 |
| 4000001 |
......
......
| 4000047 |
| 4000048 |
| 4000049 |
---------
50 rows in set (0.796 sec)
可以发现查询时间上有非常明显的差距,确实是查询优化器选错了索引。
换到 MySQL 8.0 之后(官方版本和腾讯云数据库),查询计划选择了正确的索引,可以faxian 执行计划完全没有问题,且随着查询条件的变化,选择的索引都是合理且效率很高的。
代码语言:txt复制mysql> explain select col2 from t1 where col1 = 1 and col2 >= 4000000 order by col2 limit 50;
---- ------------- ------- ------------ ------- --------------------------------- ------- --------- ------ -------- ---------- --------------------------
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---- ------------- ------- ------------ ------- --------------------------------- ------- --------- ------ -------- ---------- --------------------------
| 1 | SIMPLE | t1 | NULL | range | unq_col1_col2,idx_1,idx_3,idx_4 | idx_1 | 8 | NULL | 388450 | 50.00 | Using where; Using index |
---- ------------- ------- ------------ ------- --------------------------------- ------- --------- ------ -------- ---------- --------------------------
1 row in set, 1 warning (0.00 sec)
mysql> explain select col2 from t1 where col1 = 1 and col2 >= 1000000 order by col2 limit 50;
---- ------------- ------- ------------ ------- --------------------------------- --------------- --------- ------ --------- ---------- --------------------------
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---- ------------- ------- ------------ ------- --------------------------------- --------------- --------- ------ --------- ---------- --------------------------
| 1 | SIMPLE | t1 | NULL | range | unq_col1_col2,idx_1,idx_3,idx_4 | unq_col1_col2 | 12 | NULL | 2090440 | 100.00 | Using where; Using index |
---- ------------- ------- ------------ ------- --------------------------------- --------------- --------- ------ --------- ---------- --------------------------
1 row in set, 1 warning (0.00 sec)
如上示例中可以看到随着条件的变化,执行计划是发生了变化的,恰好 8.0 的版本有 optimizer_trace,那也正好借此机会看一下查询计划的变更与索引,查询条件的关系。
截取 trace 的部分信息如下当查询条件为 4000000 < mid 的时候:
代码语言:txt复制......
{
"index": "unq_col1_col2",
"ranges": [
"col1 = 1 AND 4000000 <= col2"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": true,
"in_memory": 0.999848,
"rows": 388450,
"cost": 39129.3, // 判断查询计划好坏的最终指标
"chosen": true
},
{
"index": "idx_1",
"ranges": [
"4000000 <= col2"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": true,
"in_memory": 1,
"rows": 388450,
"cost": 39082.1, // 判断查询计划好坏的最终指标
"chosen": true
},
......
可以看到 mid 单列索引以微弱的 cost 优势胜过了唯一索引。而 1000000 > mid 的时候,再看一下 trace 的信息:
代码语言:txt复制......
{
"index": "unq_col1_col2",
"ranges": [
"col1 = 1 AND 1000000 <= col2"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": true,
"in_memory": 0.999848,
"rows": 2090440,
"cost": 210573,
"chosen": true
},
{
"index": "idx_1",
"ranges": [
"1000000 <= col2"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": true,
"in_memory": 1,
"rows": 2090440,
"cost": 210319,
"chosen": true
},
......
{
"rows_to_scan": 2090440,
"filtering_effect": [
],
"final_filtering_effect": 0.5,
"access_type": "range",
"range_details": {
"used_index": "idx_1"
},
"resulting_rows": 1.04522e 06,
"cost": 419363, //execution 阶段估算出来的 idx_1 最终 cost
"chosen": false
}
......
"reconsidering_access_paths_for_index_ordering":{
"clause":"ORDER BY",
"steps":[
......
"chosen_range_access_summary": {
"range_access_plan": {
"type": "range_scan",
"index": "unq_col1_col2",
"rows": 2.09044e 06,
"ranges": [
"col1 = 1 AND 1000000 <= col2"
]
},
"rows_for_plan": 2.09044e 06,
"cost_for_plan": 210573,
"chosen": true
......
],
"index_order_summary":{
"table":"`t1`",
"index_provides_order":true,
"order_direction":"asc",
"index":"unq_col1_col2",
"plan_changed":true,
"access_type":"range"
}
}
......
可以看到,在执行计划的选择中,其实还是用 mid 单列索引比较好,但是执行优化器在考虑到排序的时候,变更了一次索引,最终选择了唯一索引 unq_col1_col2。
总结
在大多数时候,MySQL 选择索引的唯一依据其实还是 cost,即通过统计信息来估算不同索引的执行代价,因此同一个语句出现执行计划变更的时候,不一定是遇到了 bug 或者是问题,可能只是在当前版本的代价计算模型中,数据库选择了一个“它认为更好的索引”。
诚然算法不是万能的,总会有一些处理不好的 case,相对而言高版本的代价计算总是相对准确的,可以考虑尽量使用大版本较新的数据库来支撑业务。