在MySQL的数据库中,我们有时会发现MySQL数据库明明没有负载,CPU、硬盘、内存和网络等资源都很空闲,但很多SQL都pending在哪儿,MySQL数据库无法处理交易。这是怎么回事呢?
关于号主,姚远:
- Oracle ACE(Oracle和MySQL数据库方向)
- 华为云最有价值专家
- 《MySQL 8.0运维与优化》的作者
- 拥有 Oracle 10g、12c和19c OCM等数十项数据库认证
- 曾任IBM公司数据库部门经理
- 20 年DBA经验,服务2万 客户
- 精通C和Java,发明两项计算机专利
在数据库系统中出现这种情况通常是锁竞争造成的,MySQL数据库更加容易出现这种情况,因为它的存储层和服务层是分开的,我们来看锁竞争在MySQL 5.7和8.0里的表现和解决办法。
5.7版本中查询锁竞争
在MySQL 8.0 之前,必须SET GLOBAL innodb_status_output_locks=ON后才能在SHOW ENGINE INNODB STATUS中查到数据锁,例如下面这个事务:
代码语言:javascript复制mysql> begin;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from t1 where intcol1=0 for update;
...
900 rows in set (0.00 sec)
在SHOW ENGINE INNODB STATUS中查到
代码语言:javascript复制---TRANSACTION 7827, ACTIVE 11 sec
222 lock struct(s), heap size 24784, 5720 row lock(s)
MySQL thread id 9912, OS thread handle 139967683151616, query id 11123 localhost root
TABLE LOCK table `mysqlslap`.`t1` trx id 7827 lock mode IX
RECORD LOCKS space id 25 page no 37 n bits 1056 index in_1 of table `mysqlslap`.`t1` trx id 7827 lock_mode X
...
修改了900行,却锁住了5720行。查询space id为25对应的对象:
代码语言:javascript复制mysql> select * from information_schema.INNODB_SYS_DATAFILES where space=25;
------- --------------------
| SPACE | PATH |
------- --------------------
| 25 | ./mysqlslap/t1.ibd |
------- --------------------
1 row in set (0.00 sec)
在另外一个session里执行
代码语言:javascript复制mysql> update t1 set intcol1=1 where intcol1=0;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
查询锁的情况
代码语言:javascript复制
mysql> select * from information_schema.innodb_lock_waits;
------------------- ------------------- ----------------- ------------------
| requesting_trx_id | requested_lock_id | blocking_trx_id | blocking_lock_id |
------------------- ------------------- ----------------- ------------------
| 7829 | 7829:25:4:2 | 7827 | 7827:25:4:2 |
------------------- ------------------- ----------------- ------------------
1 row in set, 1 warning (0.00 sec)
mysql> select * from information_schema.innodb_locks;
------------- ------------- ----------- ----------- ------------------ ----------------- ------------ ----------- ---------- ----------------
| lock_id | lock_trx_id | lock_mode | lock_type | lock_table | lock_index | lock_space | lock_page | lock_rec | lock_data |
------------- ------------- ----------- ----------- ------------------ ----------------- ------------ ----------- ---------- ----------------
| 7829:25:4:2 | 7829 | X | RECORD | `mysqlslap`.`t1` | GEN_CLUST_INDEX | 25 | 4 | 2 | 0x000000000200 |
| 7827:25:4:2 | 7827 | X | RECORD | `mysqlslap`.`t1` | GEN_CLUST_INDEX | 25 | 4 | 2 | 0x000000000200 |
------------- ------------- ----------- ----------- ------------------ ----------------- ------------ ----------- ---------- ----------------
2 rows in set, 1 warning (0.00 sec)
查询阻塞的线程:
代码语言:javascript复制SELECT b.trx_mysql_thread_id AS 'blocked_thread_id'
,b.trx_query AS 'blocked_sql_text'
,c.trx_mysql_thread_id AS 'blocker_thread_id'
,c.trx_query AS 'blocker_sql_text'
,( Unix_timestamp() - Unix_timestamp(c.trx_started) )
AS 'blocked_time'
FROM information_schema.innodb_lock_waits a
INNER JOIN information_schema.innodb_trx b
ON a.requesting_trx_id = b.trx_id
INNER JOIN information_schema.innodb_trx c
ON a.blocking_trx_id = c.trx_id
WHERE ( Unix_timestamp() - Unix_timestamp(c.trx_started) ) > 4;
------------------- ----------------------------------------- ------------------- ------------------ --------------
| blocked_thread_id | blocked_sql_text | blocker_thread_id | blocker_sql_text | blocked_time |
------------------- ----------------------------------------- ------------------- ------------------ --------------
| 9921 | update t1 set intcol1=1 where intcol1=0 | 9917 | NULL | 782 |
------------------- ----------------------------------------- ------------------- ------------------ --------------
1 row in set, 1 warning (0.00 sec)
根据线程号查询执行的SQL
代码语言:javascript复制SELECT a.sql_text,
c.id,
d.trx_started
FROM performance_schema.events_statements_current a
join performance_schema.threads b
ON a.thread_id = b.thread_id
join information_schema.processlist c
ON b.processlist_id = c.id
join information_schema.innodb_trx d
ON c.id = d.trx_mysql_thread_id
where c.id=9917
ORDER BY d.trx_startedG
*************************** 1. row ***************************
sql_text: select * from t1 where intcol1=0 for update
id: 9917
trx_started: 2023-05-26 13:24:59
1 row in set (0.00 sec)
注意这里查询出的SQL是阻塞事务的最后一条SQL,并不一定是阻塞的源头。
解决锁竞争
解决方法是针对where中的条件增加索引,使MySQL服务层的过滤能在存储层完成,例如
代码语言:javascript复制mysql> create index in_1 on t1(intcol1);
Query OK, 0 rows affected (0.04 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> show create table t1G
*************************** 1. row ***************************
Table: t1
Create Table: CREATE TABLE `t1` (
`intcol1` int(32) DEFAULT NULL,
`intcol2` int(32) DEFAULT NULL,
`charcol1` varchar(128) DEFAULT NULL,
`charcol2` varchar(128) DEFAULT NULL,
`charcol3` varchar(128) DEFAULT NULL,
KEY `in_1` (`intcol1`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
再锁住同样的行
代码语言:javascript复制mysql> begin;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from t1 where intcol1=0 for update;
...
900 rows in set (0.00 sec)
在SHOW ENGINE INNODB STATUS中查到
代码语言:javascript复制---TRANSACTION 7841, ACTIVE 15 sec
155 lock struct(s), heap size 24784, 1801 row lock(s)
MySQL thread id 9917, OS thread handle 139966608627456, query id 11227 localhost root
TABLE LOCK table `mysqlslap`.`t1` trx id 7841 lock mode IX
RECORD LOCKS space id 25 page no 37 n bits 1056 index in_1 of table `mysqlslap`.`t1` trx id 7841 lock_mode X
...
被锁住的记录从之前的5720条减少到1801条。
有索引后执行计划也不同,加索引之前的执行计划
代码语言:javascript复制mysql> explain select * from t1 where intcol1=0 for update;
---- ------------- ------- ------------ ------ --------------- ------ --------- ------ ------ ---------- -------------
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---- ------------- ------- ------------ ------ --------------- ------ --------- ------ ------ ---------- -------------
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 6143 | 10.00 | Using where |
---- ------------- ------- ------------ ------ --------------- ------ --------- ------ ------ ---------- -------------
1 row in set, 1 warning (0.00 sec)
加了索引后的执行计划是:
代码语言:javascript复制mysql> explain select * from t1 where intcol1=0 for update;
---- ------------- ------- ------------ ------ --------------- ------ --------- ------- ------ ---------- -------
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---- ------------- ------- ------------ ------ --------------- ------ --------- ------- ------ ---------- -------
| 1 | SIMPLE | t1 | NULL | ref | in_1 | in_1 | 5 | const | 900 | 100.00 | NULL |
---- ------------- ------- ------------ ------ --------------- ------ --------- ------- ------ ---------- -------
1 row in set, 1 warning (0.00 sec)
对比一下可以发现后者的Extra字段中没有“Using where”,因为过滤再存储层已经完成了。
生产中可以先 select 出 主键id,再用 主键id 去 update
8.0版本
从 MySQL 8.0 开始,performance_schema.data_locks显示 InnoDB 数据锁。具体参见拙作《MySQL 8.0运维于优化》第18章第3节“优化索引”。