[MYSQL] show engine innodb status中的死锁 分析

2024-08-26 13:56:49 浏览数 (1)

导读

很久以前(也才2年)写过一个解析innodb_status的脚本. 看起来像那么回事, 其实就是做了个翻译和总结.

比如:

代码语言:shell复制
(venv) 10:15:56 [root@ddcw21 innodb_status]#python innodb_status.py -h127.0.0.1 -P3314 -p123456
说明:
采集时间: 2024-08-26 10:15:57
下面涉及到的 每秒平均值 计算时间均为最近 27 秒内

master线程:
系统繁忙程度(越大越繁忙): 0.07 %
日志写入和刷新次数: 0

SEMAPHORES信号量:
rw_s_spin_wait_count 0
rw_s_spin_round_count 0
rw_s_os_wait_count 0
rw_x_spin_wait_count 0
rw_x_spin_round_count 0
rw_x_os_wait_count 0
rw_sx_spin_wait_count 0
rw_sx_spin_round_count 0
rw_sx_os_wait_count 0
每次空转等待的锁: rw_s:0.00  rw_x:0.00 rw_sx:0.00

死锁(最近一条)
产生死锁的时间: 2024-08-26 
事务ID:51399739  锁类型:X  thread_id:9  578 localhost 127.0.0.1 root updating SQL如下:
update db1.sbtest1 set c='session 1' where id = 110011


事务ID:51399740  锁类型:X  thread_id:10  581 localhost 127.0.0.1 root updating SQL如下:
update db1.sbtest1 set c='session 1' where id = 110010


回滚事务: 51399740

事务汇总信息
max_trx_id : 51399742
min_trx_id : 51399742
max_undo_id: 0
purge线程状态: running but idle
undo包含的事务数: 0
事务ID:51399739  事务状态:ACTIVE 1606 sec  锁:3  堆大小:1128  锁行数:2  事务中修改或插入的行数:2  MYSQL_PROCESS_ID:9

文件IO
Pending normal 异步IO READ (对应read thread) :  [0, 0, 0, 0, 0, 0, 0, 0]
Pending normal 异步IO WRITE(对应write thread):  [0, 0, 0, 0]
挂起(pending)的redo log flush: 0
挂起(pending)的tablespace flush: 0
OS总读次数: 2082  速度: 0.00 次/秒.  平均每次读 0 字节
OS总写次数: 445  速度: 0.00 次/秒. 
OS总flush次数: 160  速度: 0.00 次/秒. 

insert/change buffer和自适应hash索引
已合并页的数量: 1 页.   ibuf空闲列表长度: 3078 页.  ibuf大小: 3080 页.  合并插入次数: 2
合并操作次数: insert buffer: 0  delete buffer: 2   purge buffer: 0
无需合并操作的次数: insert buffer: 0  delete buffer: 0   purge buffer: 0
使用hash索引的查询 0.00次/秒  未使用hash使用的查询 0.00次/秒   自适应hash索引使用率0.0%

日志信息(redo)
最新产生的LSN: 112989067081
已刷盘的LSN: 112989067081
最老的LSN: 112989067081
最新检查点LSN: 112989067081
redo已完成的IO次数: 67  速度:0.00次/秒

BUFFER POOL AND MEMORY(不含具体实例的,只含汇总的)
总内存: 0 字节  (0.0 GB)
系统(字典)使用: 2052830 字节
buffer pool: 8192 页
free buffer: 5926 页
LRU        : 2193 页
old LRU    : 829 页
脏页(flush list)            : 0 页
等待读入的页(pending read)  : 0 页
等待的写(pending write) : LRU: 0 页.   flush_list(等待刷新的脏页): 0 页.   单页: 0 页
LRU made young(LRU中移动到前部的页数,就是经常使用的页) 0 页(速度:0.00/s),   non-young 0 页(速度:0.00/s)
从磁盘读取的页: 2050(0.00/s)   在内存中创建的页(无数据): 143(0.00/s)     写入磁盘的页: 270(0.00/s)
缓存命中率:数据库太闲,无此数据.
预读速度: 0.00/s   (因未被访问)驱除速度: 0.00/s   随机预读速度: 0.00/s

行操作ROW OPERATIONS
read view:  0
主进程ID: 3517 (sleeping)
插入行数: 0(0.00/s)   更新行数: 2(0.00/s)   删除行数: 3(0.00/s)   读行数: 17(0.00/s)

Total large memory allocated 为0是官方的BUG,不是脚本的BUG. 好像是从8.0.28开始的, 到8.0.37都还没修复....

其中死锁这一块显示得比较简单, 就列出相关连接和相关SQL(通常就够了). 但随着我们技术的提升, 就想了解得更清楚. 尤其是我们学完ibd和undo之后, 看这死锁信息就更加眼熟了.

死锁分析

对于死锁, 官方的描述是: A deadlock is a situation in which multiple transactions are unable to proceed because each transaction holds a lock that is needed by another one. Because all transactions involved are waiting for the same resource to become available, none of them ever releases the lock it holds.

总结就是: 死锁是多个事务相互等待锁导致无法继续

如果配置了参数innodb_deadlock_detect(默认启用), innodb会自动处理死锁(回滚产生死锁的事务,通常是后者)

我们可以通过show engine innodb status查看死锁, 或者启用参数innodb_print_all_deadlocks, 讲死锁信息打印到error log中.

构造死锁

我们使用2个事务来模拟死锁. 数据我是使用sysbench提前建好的.

代码语言:sql复制
-- session 1 (锁110010)
begin;
delete from db1.sbtest1 where id=110010;

-- session 2 (锁110011)
begin;
delete from db1.sbtest1 where id=110011;

-- session 1 (会等待session 2释放锁)
update db1.sbtest1 set c='session 1' where id = 110011;

-- session 2 (造成了死锁, 会被回滚)
update db1.sbtest1 set c='session 1' where id = 110010;

-- session 1 (update执行成功)

接下来就是枯燥的分析了

死锁分析

然后我们先使用show engine innodb statusG查看死锁信息. 得到如下信息

代码语言:sql复制
------------------------
LATEST DETECTED DEADLOCK
------------------------
2024-08-26 10:34:35 139905796822784
*** (1) TRANSACTION:
TRANSACTION 51399743, ACTIVE 69 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1
MySQL thread id 9, OS thread handle 139906147886848, query id 598 localhost 127.0.0.1 root updating
update db1.sbtest1 set c='session 1' where id = 110011

*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 13450 page no 1731 n bits 144 index PRIMARY of table `db1`.`sbtest1` trx id 51399743 lock_mode X locks rec but not gap
Record lock, heap no 37 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
 0: len 4; hex 8001adba; asc     ;;
 1: len 6; hex 000003104c3f; asc     L?;;
 2: len 7; hex 020000366a0281; asc    6j  ;;
 3: len 4; hex 800f6578; asc   ex;;
 4: len 30; hex 39373838313231323839312d39383931323636313630362d333330373837; asc 97881212891-98912661606-330787; (total 120 bytes);
 5: len 30; hex 31323734303538313139302d36303433373433323738372d343232303830; asc 12740581190-60437432787-422080; (total 60 bytes);


*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 13450 page no 1731 n bits 144 index PRIMARY of table `db1`.`sbtest1` trx id 51399743 lock_mode X locks rec but not gap waiting
Record lock, heap no 38 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
 0: len 4; hex 8001adbb; asc     ;;
 1: len 6; hex 000003104c46; asc     LF;;
 2: len 7; hex 01000040272ff2; asc    @'/ ;;
 3: len 4; hex 800fabf1; asc     ;;
 4: len 30; hex 39393431393830393133322d36393835333232393639322d323138343434; asc 99419809132-69853229692-218444; (total 120 bytes);
 5: len 30; hex 30323637363539323331392d35353835353738333138392d393433373031; asc 02676592319-55855783189-943701; (total 60 bytes);


*** (2) TRANSACTION:
TRANSACTION 51399750, ACTIVE 11 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1
MySQL thread id 10, OS thread handle 139906146830080, query id 599 localhost 127.0.0.1 root updating
update db1.sbtest1 set c='session 1' where id = 110010

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 13450 page no 1731 n bits 144 index PRIMARY of table `db1`.`sbtest1` trx id 51399750 lock_mode X locks rec but not gap
Record lock, heap no 38 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
 0: len 4; hex 8001adbb; asc     ;;
 1: len 6; hex 000003104c46; asc     LF;;
 2: len 7; hex 01000040272ff2; asc    @'/ ;;
 3: len 4; hex 800fabf1; asc     ;;
 4: len 30; hex 39393431393830393133322d36393835333232393639322d323138343434; asc 99419809132-69853229692-218444; (total 120 bytes);
 5: len 30; hex 30323637363539323331392d35353835353738333138392d393433373031; asc 02676592319-55855783189-943701; (total 60 bytes);


*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 13450 page no 1731 n bits 144 index PRIMARY of table `db1`.`sbtest1` trx id 51399750 lock_mode X locks rec but not gap waiting
Record lock, heap no 37 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
 0: len 4; hex 8001adba; asc     ;;
 1: len 6; hex 000003104c3f; asc     L?;;
 2: len 7; hex 020000366a0281; asc    6j  ;;
 3: len 4; hex 800f6578; asc   ex;;
 4: len 30; hex 39373838313231323839312d39383931323636313630362d333330373837; asc 97881212891-98912661606-330787; (total 120 bytes);
 5: len 30; hex 31323734303538313139302d36303433373433323738372d343232303830; asc 12740581190-60437432787-422080; (total 60 bytes);

*** WE ROLL BACK TRANSACTION (2)

通过描述看是回滚了 TRANSACTION (2) (thread id 10, trx id 51399750). 我们在数据库里面查询验证下:

代码语言:sql复制
(root@127.0.0.1) [information_schema]> select * from information_schema.processlist where id=10;
 ---- ------ ----------------- ------ --------- ------ ------- ------ 
| ID | USER | HOST            | DB   | COMMAND | TIME | STATE | INFO |
 ---- ------ ----------------- ------ --------- ------ ------- ------ 
| 10 | root | localhost:45864 | db1  | Sleep   | 1347 |       | NULL |
 ---- ------ ----------------- ------ --------- ------ ------- ------ 
1 row in set (0.00 sec)

现在它睡着了(Sleep状态), 我们看下它之前跑了哪些SQL.

代码语言:sql复制
(root@127.0.0.1) [performance_schema]> select * from performance_schema.events_statements_current where THREAD_ID in (select THHREAD_ID from performance_schema.threads where PROCESSLIST_ID=10)G
*************************** 1. row ***************************
              THREAD_ID: 53
               EVENT_ID: 295
           END_EVENT_ID: 295
             EVENT_NAME: statement/sql/update
                 SOURCE: init_net_server_extension.cc:95
            TIMER_START: 5208043199351000
              TIMER_END: 5208091954758000
             TIMER_WAIT: 48755407000
              LOCK_TIME: 47960000000
               SQL_TEXT: update db1.sbtest1 set c='session 1' where id = 110010
                 DIGEST: 4ec5ab547e9e3a9577d75340971b5b98e2feb1de6f37b2bd318b99186387c497
            DIGEST_TEXT: UPDATE `db1` . `sbtest1` SET `c` = ? WHERE `id` = ?
         CURRENT_SCHEMA: db1
            OBJECT_TYPE: NULL
          OBJECT_SCHEMA: NULL
            OBJECT_NAME: NULL
  OBJECT_INSTANCE_BEGIN: NULL
            MYSQL_ERRNO: 1213
      RETURNED_SQLSTATE: 40001
           MESSAGE_TEXT: Deadlock found when trying to get lock; try restarting transaction
                 ERRORS: 1
               WARNINGS: 0
          ROWS_AFFECTED: 0
              ROWS_SENT: 0
          ROWS_EXAMINED: 0
CREATED_TMP_DISK_TABLES: 0
     CREATED_TMP_TABLES: 0
       SELECT_FULL_JOIN: 0
 SELECT_FULL_RANGE_JOIN: 0
           SELECT_RANGE: 0
     SELECT_RANGE_CHECK: 0
            SELECT_SCAN: 0
      SORT_MERGE_PASSES: 0
             SORT_RANGE: 0
              SORT_ROWS: 0
              SORT_SCAN: 0
          NO_INDEX_USED: 0
     NO_GOOD_INDEX_USED: 0
       NESTING_EVENT_ID: 293
     NESTING_EVENT_TYPE: TRANSACTION
    NESTING_EVENT_LEVEL: 0
           STATEMENT_ID: 599
               CPU_TIME: 0
1 row in set (0.00 sec)

上一条sql是update 然后被回滚了, 再看看之前的历史sql信息呢(方便还原事务逻辑).

代码语言:sql复制
(root@127.0.0.1) [performance_schema]> select SQL_TEXT from performance_schema.events_statements_history where THREAD_ID in (seelect THREAD_ID from performance_schema.threads where PROCESSLIST_ID=10);
 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 
| SQL_TEXT                                                                                                                                                                                                                                       |
 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 
| update db1.sbtest1 set c='session 1' where id = 110010                                                                                                                                                                                         |
| rollback                                                                                                                                                                                                                                       |
| delete from db1.sbtest1 where id=110011                                                                                                                                                                                                        |
| INSERT INTO `db1`.`sbtest1` VALUES (110011, 1027057, '99419809132-69853229692-21844469818-64365421768-18331047661-28311633260-44480639767-40347505668-33492861467-91598865267', '02676592319-55855783189-94370134366-73974511206-02616100121') |
| select * from db1.sbtest1 limit 2                                                                                                                                                                                                              |
| begin                                                                                                                                                                                                                                          |
| delete from db1.sbtest1 where id=110011                                                                                                                                                                                                        |
| update db1.sbtest1 set c='session 1' where id = 110010                                                                                                                                                                                         |
| begin                                                                                                                                                                                                                                          |
| delete from db1.sbtest1 where id=110011                                                                                                                                                                                                        |
 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 
10 rows in set (0.00 sec)

可以看到是做的begin和delete, 就是我们上面模拟的sql. 然后就可以找开发理论了

再上面的insert是我模拟的时候忘记begin了 -_-, 然后使用ibd2sql恢复的数据....

进一步分析死锁

上面那些是我们之前就能分析出来的, 现在我们要来分析点'高级'的东西. 看看HOLDS THE LOCK(S)(我锁了啥)和WAITING FOR THIS LOCK TO BE GRANTED(我要啥)部分. 这两部分的内容实际上是相似的(就数据不同而已). 所以我们就只分析其中一部分即可. 就挑第2部分吧(方便看undo)

代码语言:txt复制
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 13450 page no 1731 n bits 144 index PRIMARY of table `db1`.`sbtest1` trx id 51399743 lock_mode X locks rec but not gap waiting
Record lock, heap no 38 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
 0: len 4; hex 8001adbb; asc     ;;
 1: len 6; hex 000003104c46; asc     LF;;
 2: len 7; hex 01000040272ff2; asc    @'/ ;;
 3: len 4; hex 800fabf1; asc     ;;
 4: len 30; hex 39393431393830393133322d36393835333232393639322d323138343434; asc 99419809132-69853229692-218444; (total 120 bytes);
 5: len 30; hex 30323637363539323331392d35353835353738333138392d393433373031; asc 02676592319-55855783189-943701; (total 60 bytes);

我们可以得到表的 SPACE_ID=13450 对应数据的PAGE_NO=1731. 是X锁. 堆号:38 字段数量:6 (含trx_id&rollptr). 行格式: compact

然后是16进制的数据, 我们使用python来解析.比如, 第一个字段主键是int.

代码语言:python代码运行次数:0复制
import struct
hdata = '8001adbb'
bdata = bytes.fromhex(hdata)
is_unsigned = False
_t = struct.unpack('>L',bdata)[0]
n = 4 # 4字节
_s = n*8 - 1
data = (_t&((1<<_s)-1))-2**_s if _t < 2**_s and not is_unsigned else (_t&((1<<_s)-1))
print(data)

无符号类型(trx_id,rollptr)直接读就行. 比如: int.from_bytes(bytes.fromhex('000003104c46'),'big')

各数据类型的解析方式可以查看: https://github.com/ddcw/ibd2sql/blob/main/docs/README_DEV.md

于是我们就得到了

代码语言:txt复制
 0: len 4; hex 8001adbb              主键:110011
 1: len 6; hex 000003104c46          事务ID:51399750 (行事务,即锁的事务, 不一定是当前事务)
 2: len 7; hex 01000040272ff2        回滚指针:281476053020658 
 3: len 4; hex 800fabf1              第一个非主键值: 1027057
 4: len 30; hex 39393431393830393133322d36393835333232393639322d323138343434  第二个非主键值:99419809132-69853229692-218444
 5: len 30; hex 30323637363539323331392d35353835353738333138392d393433373031  第三个非主键值: 02676592319-55855783189-943701

我们先在数据库里面根据space_id查询出对应的Ibd文件.(虽然看表名字就知道那张表了)

代码语言:sql复制
(root@127.0.0.1) [information_schema]> select * from information_schema.INNODB_TABLES where SPACE=13450G
*************************** 1. row ***************************
     TABLE_ID: 17484
         NAME: db1/sbtest1
         FLAG: 33
       N_COLS: 7
        SPACE: 13450
   ROW_FORMAT: Dynamic
ZIP_PAGE_SIZE: 0
   SPACE_TYPE: Single
 INSTANT_COLS: 0
1 row in set (22.40 sec)

然后我们使用ibd2sql解析对应的page

代码语言:shell复制
(venv) 13:33:46 [root@ddcw21 ibd2sql]#python main.py /data/mysql_3314/mysqldata/db1/sbtest1.ibd --page-start=1731 --limit 1 --sql --debug | grep -C 20 '110011'
[2024-08-26 13:34:16] [DEBUG] NO:3 READ RECORD HEADER (5 bytes) _offset:7615 offset:7615 START
[2024-08-26 13:34:16] [DEBUG] 	READ RECORD HEADER (5 bytes) _offset:7610 offset:7615 FINISH
[2024-08-26 13:34:16] [DEBUG] 	PAGE NO     : 1731
[2024-08-26 13:34:16] [DEBUG] 	READ ROW NO : 1   CURRENT_OFFSET:7615
[2024-08-26 13:34:16] [DEBUG] 	REC INSTANT : False
[2024-08-26 13:34:16] [DEBUG] 	REC DELETED : False
[2024-08-26 13:34:16] [DEBUG] 	REC MIN_REC : False
[2024-08-26 13:34:16] [DEBUG] 	REC OWNED   : 0
[2024-08-26 13:34:16] [DEBUG] 	REC HEAP_NO : 304
[2024-08-26 13:34:16] [DEBUG] 	REC TYPE    : 0
[2024-08-26 13:34:16] [DEBUG] 	REC NEXT    : 208
[2024-08-26 13:34:16] [DEBUG] 	INSTANT     FLAG : False
[2024-08-26 13:34:16] [DEBUG] 	ROW VERSION FLAG : False
[2024-08-26 13:34:16] [DEBUG] 	20 bytes ON BOTH SIDES OF RECORD, b'69993-79145826637 <x', b'x00x010x00xd0x80x01xadxbbx00x00x03x10L?x02x00x006j'
[2024-08-26 13:34:16] [DEBUG] READ NULL BITMASK
[2024-08-26 13:34:16] [DEBUG] 	NO NULLABLE FIELD.
[2024-08-26 13:34:16] [DEBUG] 	NULLABLE FILED COUNT: 0  NULLABLE FIELD COUNT(FOR INSTANT):0
[2024-08-26 13:34:16] [DEBUG] 	NULL BITMASK: COUNT:0  ID: 0 []
[2024-08-26 13:34:16] [DEBUG] READ KEY FILED
[2024-08-26 13:34:16] [DEBUG] 	READ KEY COLNO:1 NAME:id
[2024-08-26 13:34:16] [DEBUG] 	7615 ----> 7619 data:110011  bdata:b'x80x01xadxbb'
[2024-08-26 13:34:16] [DEBUG] 	READ KEY NO:1 NAME:id FINISH. VALUES: 110011
[2024-08-26 13:34:16] [DEBUG] READ TRX(6) AND ROLLPTR(7) INFO
[2024-08-26 13:34:16] [DEBUG] 	TRX: 51399743  ROLLPTR: 562950866338476
[2024-08-26 13:34:16] [DEBUG] ROW VERSION      : -1
[2024-08-26 13:34:16] [DEBUG] INSTANT FLAG     : False
[2024-08-26 13:34:16] [DEBUG] ROW VERSION FLAG : False
[2024-08-26 13:34:16] [DEBUG] 	NAME: k  VERSION_ADDED:0  VERSION_DROPED:0 COL_INSTANT:False ROW VERSION:-1
[2024-08-26 13:34:16] [DEBUG] 	7632 ----> 7636 data:1027057  bdata:b'x80x0fxabxf1'
[2024-08-26 13:34:16] [DEBUG] ######## DDCW FLAG 8 ########
[2024-08-26 13:34:16] [DEBUG] 	NAME: c  VERSION_ADDED:0  VERSION_DROPED:0 COL_INSTANT:False ROW VERSION:-1
[2024-08-26 13:34:16] [DEBUG] 	7636 ----> 7756 data:session 1  bdata:b'session 1                                                                                                               '
[2024-08-26 13:34:16] [DEBUG] ######## DDCW FLAG 8 ########
[2024-08-26 13:34:16] [DEBUG] 	NAME: pad  VERSION_ADDED:0  VERSION_DROPED:0 COL_INSTANT:False ROW VERSION:-1
[2024-08-26 13:34:16] [DEBUG] 	7756 ----> 7816 data:02676592319-55855783189-94370134366-73974511206-02616100121  bdata:b'02676592319-55855783189-94370134366-73974511206-02616100121 '

我们得到

事务ID:51399743 (session 1)

ROLLPTR: 562950866338476

数据部分也是吻合的(update之后的), 然后我们根据回滚指针查看undo日志. 脚本见: https://github.com/ddcw/ddcw/tree/master/python/undo_reader

代码语言:shell复制
(venv) 13:35:53 [root@ddcw21 ei]#python undo_reader.py /data/mysql_3314/mysqldata/undo_002 -r 562950866338476
PAGENO:13930  OFFSET:684 --> 836  rseg_id:2  is_insert:False
DATA: b'\x00x01xc0DLx00x00x03x10LCxe0x81x00x00x17x8cx01x10x04x80x01xadxbbx01x04x99419809132-69853229692-21844469818-64365421768-18331047661-28311633260-44480639767-40347505668-33492861467-91598865267 '

得到了update之前的数据. 但我们发现只有主键修改前的字段的值(并没有完整的字段)

代码语言:txt复制
x80x01xadxbb 110011

知识都串起来了.

总结

  1. mysql会自动处理死锁
  2. 死锁信息只包含最近的sql, 若要看历史sql(完整事务sql), 得查看performance_schema.events_statements_history
  3. update之前的信息只记录修改部分在undo里面, 不会完整记录整行数据.

参考:

https://dev.mysql.com/doc/refman/8.0/en/innodb-deadlocks.html

0 人点赞