我们生产环境的建表语句,之前一直要求研发提交时候不要带上字符集和排序集,这样就可以follow数据库默认的配置。但是最近发现掉坑里了。
至于是什么坑,为什么掉坑。可以看下面的例子
代码语言:javascript复制环境:
CentOS7
GreatSQL8.0.25,可以理解为Percona的加强版,增加了一些挺不错的feature,例如并行查询之类特性。
这个案例只要是MySQL8的任一版本就可以。
[test]> s
--------------
/usr/local/mysql/bin/mysql Ver 8.0.25-15 for Linux on x86_64 (GreatSQL, Release 15, Revision c7feae175e0)
Connection id: 27
Current database: test
Current user: root@localhost
SSL: Not in use
Current pager: less
Using outfile: ''
Using delimiter: ;
Server version: 8.0.25-15 GreatSQL, Release 15, Revision c7feae175e0
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: utf8mb4
Db characterset: utf8mb4
Client characterset: utf8mb4
Conn. characterset: utf8mb4
UNIX socket: /data/GreatSQL/mysql.sock
Binary data as: Hexadecimal
Uptime: 15 hours 44 min 8 sec
Threads: 4 Questions: 272 Slow queries: 25 Opens: 431 Flush tables: 3 Open tables: 347 Queries per second avg: 0.004
--------------
-- 字符集和字符排序集
[test]> show global variables like '%charac%';
-------------------------- --------------------------------------------
| Variable_name | Value |
-------------------------- --------------------------------------------
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8mb3 |
| character_sets_dir | /usr/local/GreatSQL-8.0.25/share/charsets/ |
-------------------------- --------------------------------------------
[test]> show global variables like '%collation%';
------------------------------- --------------------
| Variable_name | Value |
------------------------------- --------------------
| collation_connection | utf8mb4_0900_ai_ci |
| collation_database | utf8mb4_0900_ai_ci |
| collation_server | utf8mb4_0900_ai_ci |
| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
------------------------------- --------------------
4 rows in set (0.00 sec)
-- 开始创建库表
-- db1模拟的是mysql5.7升级到8.0之前,业务侧提交的建库建表语句
CREATE DATABASE db1 DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
use db1;
create table t ( `a` int DEFAULT NULL,
`b` varchar(100) DEFAULT NULL
) ENGINE=InnoDB;
-- db2模拟的是升级到8.0之后,业务侧提交的建库建表语句
CREATE DATABASE db2 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
use db2;
create table t ( `a` int DEFAULT NULL,
`b` varchar(100) DEFAULT NULL
) ENGINE=InnoDB;
-- 看下字符集的情况
[db2]> select
`TABLE_SCHEMA`,
`TABLE_NAME`,
`TABLE_COLLATION`,
`CREATE_TIME`
from information_schema.`TABLES`
where
`TABLE_SCHEMA` IN ('db1','db2') ;
-------------- ------------ -------------------- ---------------------
| TABLE_SCHEMA | TABLE_NAME | TABLE_COLLATION | CREATE_TIME |
-------------- ------------ -------------------- ---------------------
| db1 | t | utf8_general_ci | 2022-04-18 13:14:54 |
| db2 | t | utf8mb4_0900_ai_ci | 2022-04-18 13:14:57 |
-------------- ------------ -------------------- ---------------------
2 rows in set (0.00 sec)
上面这种在生产上这回出现下面这种情况:
刚开始运行的时候,我们用的是mysql5.7,建库默认用字符集utf8 字符排序集utf8_general_ci。这里一点问题也没有。
后来,数据库版本升级到了8.0了,并且改了默认字符集为utf8mb4 字符排序集为utf8mb4_0900_ai_ci,这里也没任何问题。
新创建的数据库,不显式指定的话,也follow数据库层面的字符集和字符排序集,也就是utf8mb4 和utf8mb4_0900_ai_ci 。
但是, 如果我们在老的数据库里面创建新表的时候,如果不显式指定的话,会follow所在数据库的字符集和字符排序集的(也就是 虽然升级到8.x了,但是创建的表还是用的utf8 和utf8_general_ci),这就操蛋了啊。。
代码语言:javascript复制-- 按时间排序,看下最近有哪些表的字符集存在异常
SELECT
`TABLE_SCHEMA`,
`TABLE_NAME`,
`TABLE_COLLATION`,
`CREATE_TIME`
FROM information_schema.`TABLES`
WHERE
`TABLE_SCHEMA` NOT IN ('sys','mysql','information_schema','performance_schema')
AND
TABLE_COLLATION='utf8_general_ci'
ORDER BY CREATE_TIME DESC
LIMIT 10
这一堆的表,咋改进呢,挺费事的,如果你直接执行 alter database xxx DEFAULT CHARACTER SET utf8mb4; 这样风险很高的,基本上停服搞了,如果库下面的表有事务还没提交,这个alter database一直是pending的。
那么, 只能从其它方面下手了:
1、严格控制DDL语句,新的库和表必须显式定义 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
2、对于已经是utf8编码的的表 ,暂时不去动它
3、找个时间窗口,把表全部转为 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci (工作量不小。。)
tips,字符集问题带来的索引不生效的案例:
代码语言:javascript复制[test]> alter table db1.t add index idx_b(b);
[test]> alter table db2.t add index idx_b(b);
[test]> select * from db1.t;
------ ------
| a | b |
------ ------
| 1 | abc |
------ ------
1 row in set (0.00 sec)
[test]> select * from db2.t;
------ ------
| a | b |
------ ------
| 1 | abc |
------ ------
1 row in set (0.00 sec)
-- 示例1, utf8mb3 join utf8mb4
[test]> select aa.*,bb.* from db1.t as aa inner join db2.t bb on aa.b=bb.b ;
------ ------ ------ ------
| a | b | a | b |
------ ------ ------ ------
| 1 | abc | 1 | abc |
------ ------ ------ ------
1 row in set (0.00 sec)
[test]> desc select aa.*,bb.* from db1.t as aa inner join db2.t bb on aa.b=bb.b G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: aa
partitions: NULL
type: ALL
possible_keys: idx_b
key: NULL
key_len: NULL
ref: NULL
rows: 1
filtered: 100.00
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: bb
partitions: NULL
type: ref
possible_keys: idx_b
key: idx_b
key_len: 403
ref: db1.aa.b
rows: 1
filtered: 100.00
Extra: Using index condition
2 rows in set, 2 warnings (0.00 sec)
[test]> show warnings G
*************************** 1. row ***************************
Level: Warning
Code: 1739
Message: Cannot use ref access on index 'idx_b' due to type or collation conversion on field 'b'
*************************** 2. row ***************************
Level: Note
Code: 1003
Message: /* select#1 */ select `db1`.`aa`.`a` AS `a`,`db1`.`aa`.`b` AS `b`,`db2`.`bb`.`a` AS `a`,`db2`.`bb`.`b` AS `b` from `db1`.`t` `aa` join `db2`.`t` `bb` where (`db1`.`aa`.`b` = `db2`.`bb`.`b`)
2 rows in set (0.00 sec)
-- 示例2, utf8mb4 join utf8mb3
[test]> select aa.*,bb.* from db2.t as aa inner join db1.t bb on aa.b=bb.b ;
------ ------ ------ ------
| a | b | a | b |
------ ------ ------ ------
| 1 | abc | 1 | abc |
------ ------ ------ ------
1 row in set (0.00 sec)
[test]> desc select aa.*,bb.* from db2.t as aa inner join db1.t bb on aa.b=bb.b G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: aa
partitions: NULL
type: ALL
possible_keys: idx_b
key: NULL
key_len: NULL
ref: NULL
rows: 1
filtered: 100.00
Extra: NULL
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: bb
partitions: NULL
type: ALL
possible_keys: idx_b
key: NULL
key_len: NULL
ref: NULL
rows: 1
filtered: 100.00
Extra: Using where; Using join buffer (hash join)
2 rows in set, 3 warnings (0.00 sec)
[test]> show warnings G
*************************** 1. row ***************************
Level: Warning
Code: 1739
Message: Cannot use ref access on index 'idx_b' due to type or collation conversion on field 'b'
*************************** 2. row ***************************
Level: Warning
Code: 1739
Message: Cannot use range access on index 'idx_b' due to type or collation conversion on field 'b'
*************************** 3. row ***************************
Level: Note
Code: 1003
Message: /* select#1 */ select `db2`.`aa`.`a` AS `a`,`db2`.`aa`.`b` AS `b`,`db1`.`bb`.`a` AS `a`,`db1`.`bb`.`b` AS `b` from `db2`.`t` `aa` join `db1`.`t` `bb` where (`db2`.`aa`.`b` = `db1`.`bb`.`b`)
3 rows in set (0.00 sec)