MySQL 案例:left join 与 BUG

2020-10-08 15:39:24 浏览数 (1)

问题描述

最近遇到用户反馈同样的 SQL 在自建的 MySQL 测试实例和腾讯云的 CDB 实例上查询结果不一致的现象。

解决方案

自建库建议升级版本,至少高于 5.7.21,腾讯云数据库的话,建议临时改写 SQL,绕过这个 bug,后续版本更新之后,可以通过升级修复这个问题

场景还原

构造一下复现的场景,并根据实际的案例,改写一下 SQL:

代码语言:txt复制
CREATE TABLE `stu` (
  `id` int(11) NOT NULL,
  `sname` varchar(16) NOT NULL,
  `cname` varchar(8) DEFAULT NULL,
  `math` int(11) NOT NULL,
  `eng` int(11) DEFAULT NULL,
  `his` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

CREATE TABLE `stu1` LIKE stu;

INSERT INTO `stu` VALUES (100,'Dave','1',100,80,77),(101,'Lily','1',87,91,99),(102,'Wolf','3',82,99,89),(103,'Tom','2',69,74,75),(104,'Adam','2',99,94,95),(105,'Jay','3',77,64,55),(106,'adam','2',95,94,95),(107,'jay','2',95,94,95),(109,'jayy','2',95,94,95),(110,'jayyy','2',95,94,95);

INSERT INTO `stu1` VALUES (100,'Dave','1',100,80,77),(101,'Lily','1',87,91,99),(102,'Wolf','3',82,99,89),(103,'Tom','2',69,74,75),(104,'Adam','2',99,94,95),(105,'Jay','3',77,64,55),(110,'jayyy','2',95,94,95);

有问题的 SQL:

代码语言:txt复制
select * from stu t1 
left join(select id,sname as sname_t1,1 as cname from stu1) t2 on t1.id=t2.id 
where t2.cname is not null;

在腾讯云 CDB 上的执行结果:

代码语言:txt复制
mysql> select * from stu t1 left join(select id,sname as sname_t1,1 as cname from stu1) t2 on t1.id=t2.id where t2.cname is not null;
 ----- ------- ------- ------ ------ ------ ------ ---------- ------- 
| id  | sname | cname | math | eng  | his  | id   | sname_t1 | cname |
 ----- ------- ------- ------ ------ ------ ------ ---------- ------- 
| 100 | Dave  | 1     |  100 |   80 |   77 |  100 | Dave     |     1 |
| 101 | Lily  | 1     |   87 |   91 |   99 |  101 | Lily     |     1 |
| 102 | Wolf  | 3     |   82 |   99 |   89 |  102 | Wolf     |     1 |
| 103 | Tom   | 2     |   69 |   74 |   75 |  103 | Tom      |     1 |
| 104 | Adam  | 2     |   99 |   94 |   95 |  104 | Adam     |     1 |
| 105 | Jay   | 3     |   77 |   64 |   55 |  105 | Jay      |     1 |
| 106 | adam  | 2     |   95 |   94 |   95 | NULL | NULL     |  NULL |
| 107 | jay   | 2     |   95 |   94 |   95 | NULL | NULL     |  NULL |
| 109 | jayy  | 2     |   95 |   94 |   95 | NULL | NULL     |  NULL |
| 110 | jayyy | 2     |   95 |   94 |   95 |  110 | jayyy    |     1 |
 ----- ------- ------- ------ ------ ------ ------ ---------- ------- 
10 rows in set (0.00 sec)

在 Oracle 官方的执行结果:

代码语言:txt复制
mysql> select * from stu t1 left join(select id,sname as sname_t1,1 as cname from stu1) t2 on t1.id=t2.id where t2.cname is not null;
 ----- ------- ------- ------ ------ ------ ------ ---------- ------- 
| id  | sname | cname | math | eng  | his  | id   | sname_t1 | cname |
 ----- ------- ------- ------ ------ ------ ------ ---------- ------- 
| 100 | Dave  | 1     |  100 |   80 |   77 |  100 | Dave     |     1 |
| 101 | Lily  | 1     |   87 |   91 |   99 |  101 | Lily     |     1 |
| 102 | Wolf  | 3     |   82 |   99 |   89 |  102 | Wolf     |     1 |
| 103 | Tom   | 2     |   69 |   74 |   75 |  103 | Tom      |     1 |
| 104 | Adam  | 2     |   99 |   94 |   95 |  104 | Adam     |     1 |
| 105 | Jay   | 3     |   77 |   64 |   55 |  105 | Jay      |     1 |
| 110 | jayyy | 2     |   95 |   94 |   95 |  110 | jayyy    |     1 |
 ----- ------- ------- ------ ------ ------ ------ ---------- ------- 
7 rows in set (0.00 sec)

可以看到,两者的结果存在差异,NULL 的行在 Oracle 官方的版本中并未出现,而且根据 SQL 的语义来看,结果中没有 NULL 才是正确的。

原理简析

实际上,引发这个问题的原因是 MySQL bug,而在确认这个是 bug 之前,无论是 show warnings,还是检查 optimizer_trace,都没发现什么异常,最初分析问题的时候,看起来不像是 MySQL 的执行优化器出现了误判。

明确这个 bug 的情况之后,解决问题的方法自然也明确了:where 条件中不要使用这个被定义为常量的 cname 就行,比如:

代码语言:txt复制
select * from stu t1 
left join(select id,sname as sname_t1,1 as cname from stu1) t2 on t1.id=t2.id 
where t2.sname_t1 is not null;

不过在改写之后,发现了一个有意思的现象,MySQL 执行优化器把这个 left join 改写成了 join:

代码语言:txt复制
mysql> explain select * from stu t1 left join(select id,sname as sname_t1,1 as cname from stu1) t2 on t1.id=t2.id where t2.sname_t1 is not null;
 ---- ------------- ------- ------------ -------- --------------- --------- --------- -------------- ------ ---------- ------------- 
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref          | rows | filtered | Extra       |
 ---- ------------- ------- ------------ -------- --------------- --------- --------- -------------- ------ ---------- ------------- 
|  1 | SIMPLE      | stu1  | NULL       | ALL    | PRIMARY       | NULL    | NULL    | NULL         |    7 |    85.71 | Using where |
|  1 | SIMPLE      | t1    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | test.stu1.id |    1 |   100.00 | NULL        |
 ---- ------------- ------- ------------ -------- --------------- --------- --------- -------------- ------ ---------- ------------- 
2 rows in set, 1 warning (0.00 sec)

mysql> show warnings;
 ------- ------ ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
| Level | Code | Message                                                                                                                                                                                                   |
 ------- ------ ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
| Note  | 1003 | /* select#1 */ select `test`.`t1`.`id` AS `id`,...(省略)...,1 AS `cname` from `test`.`stu` `t1` join `test`.`stu1` where ((`test`.`t1`.`id` = `test`.`stu1`.`id`) and (`test`.`stu1`.`sname` is not null)) |
 ------- ------ ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
1 row in set (0.00 sec)

实际上,这个和 MySQL 的空值拒绝优化策略有一定的关系,简单来说,如果 where 条件中出现了 is not null 或者是效果等同的条件,那么 MySQL 可以把外连接改写成内连接,此处引用官方文档的说明:

Instead, MySQL converts the query to a query with no outer join operation if the WHERE condition is null-rejected. (That is, it converts the outer join to an inner join.) A condition is said to be null-rejected for an outer join operation if it evaluates to FALSE or UNKNOWN for any NULL-complemented row generated for the operation.

后续有机会在详细解析这个优化策略,从结果上来看,替换掉 where 条件的内容之后,查询结果就恢复正常了:

代码语言:txt复制
mysql> select * from stu t1 left join(select id,sname as sname_t1,1 as cname from stu1) t2 on t1.id=t2.id where t2.sname_t1 is not null;
 ----- ------- ------- ------ ------ ------ ------ ---------- ------- 
| id  | sname | cname | math | eng  | his  | id   | sname_t1 | cname |
 ----- ------- ------- ------ ------ ------ ------ ---------- ------- 
| 100 | Dave  | 1     |  100 |   80 |   77 |  100 | Dave     |     1 |
| 101 | Lily  | 1     |   87 |   91 |   99 |  101 | Lily     |     1 |
| 102 | Wolf  | 3     |   82 |   99 |   89 |  102 | Wolf     |     1 |
| 103 | Tom   | 2     |   69 |   74 |   75 |  103 | Tom      |     1 |
| 104 | Adam  | 2     |   99 |   94 |   95 |  104 | Adam     |     1 |
| 105 | Jay   | 3     |   77 |   64 |   55 |  105 | Jay      |     1 |
| 110 | jayyy | 2     |   95 |   94 |   95 |  110 | jayyy    |     1 |
 ----- ------- ------- ------ ------ ------ ------ ---------- ------- 
7 rows in set (0.00 sec)

0 人点赞