导读
很久以前(也才2年)写过一个解析innodb_status的脚本. 看起来像那么回事, 其实就是做了个翻译和总结.
比如:
代码语言:shell复制(venv) 10:15:56 [root@ddcw21 innodb_status]#python innodb_status.py -h127.0.0.1 -P3314 -p123456
说明:
采集时间: 2024-08-26 10:15:57
下面涉及到的 每秒平均值 计算时间均为最近 27 秒内
master线程:
系统繁忙程度(越大越繁忙): 0.07 %
日志写入和刷新次数: 0
SEMAPHORES信号量:
rw_s_spin_wait_count 0
rw_s_spin_round_count 0
rw_s_os_wait_count 0
rw_x_spin_wait_count 0
rw_x_spin_round_count 0
rw_x_os_wait_count 0
rw_sx_spin_wait_count 0
rw_sx_spin_round_count 0
rw_sx_os_wait_count 0
每次空转等待的锁: rw_s:0.00 rw_x:0.00 rw_sx:0.00
死锁(最近一条)
产生死锁的时间: 2024-08-26
事务ID:51399739 锁类型:X thread_id:9 578 localhost 127.0.0.1 root updating SQL如下:
update db1.sbtest1 set c='session 1' where id = 110011
事务ID:51399740 锁类型:X thread_id:10 581 localhost 127.0.0.1 root updating SQL如下:
update db1.sbtest1 set c='session 1' where id = 110010
回滚事务: 51399740
事务汇总信息
max_trx_id : 51399742
min_trx_id : 51399742
max_undo_id: 0
purge线程状态: running but idle
undo包含的事务数: 0
事务ID:51399739 事务状态:ACTIVE 1606 sec 锁:3 堆大小:1128 锁行数:2 事务中修改或插入的行数:2 MYSQL_PROCESS_ID:9
文件IO
Pending normal 异步IO READ (对应read thread) : [0, 0, 0, 0, 0, 0, 0, 0]
Pending normal 异步IO WRITE(对应write thread): [0, 0, 0, 0]
挂起(pending)的redo log flush: 0
挂起(pending)的tablespace flush: 0
OS总读次数: 2082 速度: 0.00 次/秒. 平均每次读 0 字节
OS总写次数: 445 速度: 0.00 次/秒.
OS总flush次数: 160 速度: 0.00 次/秒.
insert/change buffer和自适应hash索引
已合并页的数量: 1 页. ibuf空闲列表长度: 3078 页. ibuf大小: 3080 页. 合并插入次数: 2
合并操作次数: insert buffer: 0 delete buffer: 2 purge buffer: 0
无需合并操作的次数: insert buffer: 0 delete buffer: 0 purge buffer: 0
使用hash索引的查询 0.00次/秒 未使用hash使用的查询 0.00次/秒 自适应hash索引使用率0.0%
日志信息(redo)
最新产生的LSN: 112989067081
已刷盘的LSN: 112989067081
最老的LSN: 112989067081
最新检查点LSN: 112989067081
redo已完成的IO次数: 67 速度:0.00次/秒
BUFFER POOL AND MEMORY(不含具体实例的,只含汇总的)
总内存: 0 字节 (0.0 GB)
系统(字典)使用: 2052830 字节
buffer pool: 8192 页
free buffer: 5926 页
LRU : 2193 页
old LRU : 829 页
脏页(flush list) : 0 页
等待读入的页(pending read) : 0 页
等待的写(pending write) : LRU: 0 页. flush_list(等待刷新的脏页): 0 页. 单页: 0 页
LRU made young(LRU中移动到前部的页数,就是经常使用的页) 0 页(速度:0.00/s), non-young 0 页(速度:0.00/s)
从磁盘读取的页: 2050(0.00/s) 在内存中创建的页(无数据): 143(0.00/s) 写入磁盘的页: 270(0.00/s)
缓存命中率:数据库太闲,无此数据.
预读速度: 0.00/s (因未被访问)驱除速度: 0.00/s 随机预读速度: 0.00/s
行操作ROW OPERATIONS
read view: 0
主进程ID: 3517 (sleeping)
插入行数: 0(0.00/s) 更新行数: 2(0.00/s) 删除行数: 3(0.00/s) 读行数: 17(0.00/s)
Total large memory allocated 为0是官方的BUG,不是脚本的BUG. 好像是从8.0.28开始的, 到8.0.37都还没修复....
其中死锁这一块显示得比较简单, 就列出相关连接和相关SQL(通常就够了). 但随着我们技术的提升, 就想了解得更清楚. 尤其是我们学完ibd和undo之后, 看这死锁信息就更加眼熟了.
死锁分析
对于死锁, 官方的描述是: A deadlock is a situation in which multiple transactions are unable to proceed because each transaction holds a lock that is needed by another one. Because all transactions involved are waiting for the same resource to become available, none of them ever releases the lock it holds.
总结就是: 死锁是多个事务相互等待锁导致无法继续
如果配置了参数innodb_deadlock_detect(默认启用), innodb会自动处理死锁(回滚产生死锁的事务,通常是后者)
我们可以通过show engine innodb status
查看死锁, 或者启用参数innodb_print_all_deadlocks, 讲死锁信息打印到error log
中.
构造死锁
我们使用2个事务来模拟死锁. 数据我是使用sysbench提前建好的.
代码语言:sql复制-- session 1 (锁110010)
begin;
delete from db1.sbtest1 where id=110010;
-- session 2 (锁110011)
begin;
delete from db1.sbtest1 where id=110011;
-- session 1 (会等待session 2释放锁)
update db1.sbtest1 set c='session 1' where id = 110011;
-- session 2 (造成了死锁, 会被回滚)
update db1.sbtest1 set c='session 1' where id = 110010;
-- session 1 (update执行成功)
接下来就是枯燥的分析了
死锁分析
然后我们先使用show engine innodb statusG
查看死锁信息. 得到如下信息
------------------------
LATEST DETECTED DEADLOCK
------------------------
2024-08-26 10:34:35 139905796822784
*** (1) TRANSACTION:
TRANSACTION 51399743, ACTIVE 69 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1
MySQL thread id 9, OS thread handle 139906147886848, query id 598 localhost 127.0.0.1 root updating
update db1.sbtest1 set c='session 1' where id = 110011
*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 13450 page no 1731 n bits 144 index PRIMARY of table `db1`.`sbtest1` trx id 51399743 lock_mode X locks rec but not gap
Record lock, heap no 37 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
0: len 4; hex 8001adba; asc ;;
1: len 6; hex 000003104c3f; asc L?;;
2: len 7; hex 020000366a0281; asc 6j ;;
3: len 4; hex 800f6578; asc ex;;
4: len 30; hex 39373838313231323839312d39383931323636313630362d333330373837; asc 97881212891-98912661606-330787; (total 120 bytes);
5: len 30; hex 31323734303538313139302d36303433373433323738372d343232303830; asc 12740581190-60437432787-422080; (total 60 bytes);
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 13450 page no 1731 n bits 144 index PRIMARY of table `db1`.`sbtest1` trx id 51399743 lock_mode X locks rec but not gap waiting
Record lock, heap no 38 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
0: len 4; hex 8001adbb; asc ;;
1: len 6; hex 000003104c46; asc LF;;
2: len 7; hex 01000040272ff2; asc @'/ ;;
3: len 4; hex 800fabf1; asc ;;
4: len 30; hex 39393431393830393133322d36393835333232393639322d323138343434; asc 99419809132-69853229692-218444; (total 120 bytes);
5: len 30; hex 30323637363539323331392d35353835353738333138392d393433373031; asc 02676592319-55855783189-943701; (total 60 bytes);
*** (2) TRANSACTION:
TRANSACTION 51399750, ACTIVE 11 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1
MySQL thread id 10, OS thread handle 139906146830080, query id 599 localhost 127.0.0.1 root updating
update db1.sbtest1 set c='session 1' where id = 110010
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 13450 page no 1731 n bits 144 index PRIMARY of table `db1`.`sbtest1` trx id 51399750 lock_mode X locks rec but not gap
Record lock, heap no 38 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
0: len 4; hex 8001adbb; asc ;;
1: len 6; hex 000003104c46; asc LF;;
2: len 7; hex 01000040272ff2; asc @'/ ;;
3: len 4; hex 800fabf1; asc ;;
4: len 30; hex 39393431393830393133322d36393835333232393639322d323138343434; asc 99419809132-69853229692-218444; (total 120 bytes);
5: len 30; hex 30323637363539323331392d35353835353738333138392d393433373031; asc 02676592319-55855783189-943701; (total 60 bytes);
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 13450 page no 1731 n bits 144 index PRIMARY of table `db1`.`sbtest1` trx id 51399750 lock_mode X locks rec but not gap waiting
Record lock, heap no 37 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
0: len 4; hex 8001adba; asc ;;
1: len 6; hex 000003104c3f; asc L?;;
2: len 7; hex 020000366a0281; asc 6j ;;
3: len 4; hex 800f6578; asc ex;;
4: len 30; hex 39373838313231323839312d39383931323636313630362d333330373837; asc 97881212891-98912661606-330787; (total 120 bytes);
5: len 30; hex 31323734303538313139302d36303433373433323738372d343232303830; asc 12740581190-60437432787-422080; (total 60 bytes);
*** WE ROLL BACK TRANSACTION (2)
通过描述看是回滚了 TRANSACTION (2) (thread id 10, trx id 51399750). 我们在数据库里面查询验证下:
代码语言:sql复制(root@127.0.0.1) [information_schema]> select * from information_schema.processlist where id=10;
---- ------ ----------------- ------ --------- ------ ------- ------
| ID | USER | HOST | DB | COMMAND | TIME | STATE | INFO |
---- ------ ----------------- ------ --------- ------ ------- ------
| 10 | root | localhost:45864 | db1 | Sleep | 1347 | | NULL |
---- ------ ----------------- ------ --------- ------ ------- ------
1 row in set (0.00 sec)
现在它睡着了(Sleep状态), 我们看下它之前跑了哪些SQL.
代码语言:sql复制(root@127.0.0.1) [performance_schema]> select * from performance_schema.events_statements_current where THREAD_ID in (select THHREAD_ID from performance_schema.threads where PROCESSLIST_ID=10)G
*************************** 1. row ***************************
THREAD_ID: 53
EVENT_ID: 295
END_EVENT_ID: 295
EVENT_NAME: statement/sql/update
SOURCE: init_net_server_extension.cc:95
TIMER_START: 5208043199351000
TIMER_END: 5208091954758000
TIMER_WAIT: 48755407000
LOCK_TIME: 47960000000
SQL_TEXT: update db1.sbtest1 set c='session 1' where id = 110010
DIGEST: 4ec5ab547e9e3a9577d75340971b5b98e2feb1de6f37b2bd318b99186387c497
DIGEST_TEXT: UPDATE `db1` . `sbtest1` SET `c` = ? WHERE `id` = ?
CURRENT_SCHEMA: db1
OBJECT_TYPE: NULL
OBJECT_SCHEMA: NULL
OBJECT_NAME: NULL
OBJECT_INSTANCE_BEGIN: NULL
MYSQL_ERRNO: 1213
RETURNED_SQLSTATE: 40001
MESSAGE_TEXT: Deadlock found when trying to get lock; try restarting transaction
ERRORS: 1
WARNINGS: 0
ROWS_AFFECTED: 0
ROWS_SENT: 0
ROWS_EXAMINED: 0
CREATED_TMP_DISK_TABLES: 0
CREATED_TMP_TABLES: 0
SELECT_FULL_JOIN: 0
SELECT_FULL_RANGE_JOIN: 0
SELECT_RANGE: 0
SELECT_RANGE_CHECK: 0
SELECT_SCAN: 0
SORT_MERGE_PASSES: 0
SORT_RANGE: 0
SORT_ROWS: 0
SORT_SCAN: 0
NO_INDEX_USED: 0
NO_GOOD_INDEX_USED: 0
NESTING_EVENT_ID: 293
NESTING_EVENT_TYPE: TRANSACTION
NESTING_EVENT_LEVEL: 0
STATEMENT_ID: 599
CPU_TIME: 0
1 row in set (0.00 sec)
上一条sql是update 然后被回滚了, 再看看之前的历史sql信息呢(方便还原事务逻辑).
代码语言:sql复制(root@127.0.0.1) [performance_schema]> select SQL_TEXT from performance_schema.events_statements_history where THREAD_ID in (seelect THREAD_ID from performance_schema.threads where PROCESSLIST_ID=10);
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| SQL_TEXT |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| update db1.sbtest1 set c='session 1' where id = 110010 |
| rollback |
| delete from db1.sbtest1 where id=110011 |
| INSERT INTO `db1`.`sbtest1` VALUES (110011, 1027057, '99419809132-69853229692-21844469818-64365421768-18331047661-28311633260-44480639767-40347505668-33492861467-91598865267', '02676592319-55855783189-94370134366-73974511206-02616100121') |
| select * from db1.sbtest1 limit 2 |
| begin |
| delete from db1.sbtest1 where id=110011 |
| update db1.sbtest1 set c='session 1' where id = 110010 |
| begin |
| delete from db1.sbtest1 where id=110011 |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
10 rows in set (0.00 sec)
可以看到是做的begin和delete, 就是我们上面模拟的sql. 然后就可以找开发理论了
再上面的insert是我模拟的时候忘记begin了 -_-, 然后使用ibd2sql恢复的数据....
进一步分析死锁
上面那些是我们之前就能分析出来的, 现在我们要来分析点'高级'的东西. 看看HOLDS THE LOCK(S)
(我锁了啥)和WAITING FOR THIS LOCK TO BE GRANTED
(我要啥)部分. 这两部分的内容实际上是相似的(就数据不同而已). 所以我们就只分析其中一部分即可. 就挑第2部分吧(方便看undo)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 13450 page no 1731 n bits 144 index PRIMARY of table `db1`.`sbtest1` trx id 51399743 lock_mode X locks rec but not gap waiting
Record lock, heap no 38 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
0: len 4; hex 8001adbb; asc ;;
1: len 6; hex 000003104c46; asc LF;;
2: len 7; hex 01000040272ff2; asc @'/ ;;
3: len 4; hex 800fabf1; asc ;;
4: len 30; hex 39393431393830393133322d36393835333232393639322d323138343434; asc 99419809132-69853229692-218444; (total 120 bytes);
5: len 30; hex 30323637363539323331392d35353835353738333138392d393433373031; asc 02676592319-55855783189-943701; (total 60 bytes);
我们可以得到表的 SPACE_ID=13450 对应数据的PAGE_NO=1731. 是X锁. 堆号:38 字段数量:6 (含trx_id&rollptr). 行格式: compact
然后是16进制的数据, 我们使用python来解析.比如, 第一个字段主键是int.
代码语言:python代码运行次数:0复制import struct
hdata = '8001adbb'
bdata = bytes.fromhex(hdata)
is_unsigned = False
_t = struct.unpack('>L',bdata)[0]
n = 4 # 4字节
_s = n*8 - 1
data = (_t&((1<<_s)-1))-2**_s if _t < 2**_s and not is_unsigned else (_t&((1<<_s)-1))
print(data)
无符号类型(trx_id,rollptr)直接读就行. 比如: int.from_bytes(bytes.fromhex('000003104c46'),'big')
各数据类型的解析方式可以查看: https://github.com/ddcw/ibd2sql/blob/main/docs/README_DEV.md
于是我们就得到了
代码语言:txt复制 0: len 4; hex 8001adbb 主键:110011
1: len 6; hex 000003104c46 事务ID:51399750 (行事务,即锁的事务, 不一定是当前事务)
2: len 7; hex 01000040272ff2 回滚指针:281476053020658
3: len 4; hex 800fabf1 第一个非主键值: 1027057
4: len 30; hex 39393431393830393133322d36393835333232393639322d323138343434 第二个非主键值:99419809132-69853229692-218444
5: len 30; hex 30323637363539323331392d35353835353738333138392d393433373031 第三个非主键值: 02676592319-55855783189-943701
我们先在数据库里面根据space_id查询出对应的Ibd文件.(虽然看表名字就知道那张表了)
代码语言:sql复制(root@127.0.0.1) [information_schema]> select * from information_schema.INNODB_TABLES where SPACE=13450G
*************************** 1. row ***************************
TABLE_ID: 17484
NAME: db1/sbtest1
FLAG: 33
N_COLS: 7
SPACE: 13450
ROW_FORMAT: Dynamic
ZIP_PAGE_SIZE: 0
SPACE_TYPE: Single
INSTANT_COLS: 0
1 row in set (22.40 sec)
然后我们使用ibd2sql解析对应的page
代码语言:shell复制(venv) 13:33:46 [root@ddcw21 ibd2sql]#python main.py /data/mysql_3314/mysqldata/db1/sbtest1.ibd --page-start=1731 --limit 1 --sql --debug | grep -C 20 '110011'
[2024-08-26 13:34:16] [DEBUG] NO:3 READ RECORD HEADER (5 bytes) _offset:7615 offset:7615 START
[2024-08-26 13:34:16] [DEBUG] READ RECORD HEADER (5 bytes) _offset:7610 offset:7615 FINISH
[2024-08-26 13:34:16] [DEBUG] PAGE NO : 1731
[2024-08-26 13:34:16] [DEBUG] READ ROW NO : 1 CURRENT_OFFSET:7615
[2024-08-26 13:34:16] [DEBUG] REC INSTANT : False
[2024-08-26 13:34:16] [DEBUG] REC DELETED : False
[2024-08-26 13:34:16] [DEBUG] REC MIN_REC : False
[2024-08-26 13:34:16] [DEBUG] REC OWNED : 0
[2024-08-26 13:34:16] [DEBUG] REC HEAP_NO : 304
[2024-08-26 13:34:16] [DEBUG] REC TYPE : 0
[2024-08-26 13:34:16] [DEBUG] REC NEXT : 208
[2024-08-26 13:34:16] [DEBUG] INSTANT FLAG : False
[2024-08-26 13:34:16] [DEBUG] ROW VERSION FLAG : False
[2024-08-26 13:34:16] [DEBUG] 20 bytes ON BOTH SIDES OF RECORD, b'69993-79145826637 <x', b'x00x010x00xd0x80x01xadxbbx00x00x03x10L?x02x00x006j'
[2024-08-26 13:34:16] [DEBUG] READ NULL BITMASK
[2024-08-26 13:34:16] [DEBUG] NO NULLABLE FIELD.
[2024-08-26 13:34:16] [DEBUG] NULLABLE FILED COUNT: 0 NULLABLE FIELD COUNT(FOR INSTANT):0
[2024-08-26 13:34:16] [DEBUG] NULL BITMASK: COUNT:0 ID: 0 []
[2024-08-26 13:34:16] [DEBUG] READ KEY FILED
[2024-08-26 13:34:16] [DEBUG] READ KEY COLNO:1 NAME:id
[2024-08-26 13:34:16] [DEBUG] 7615 ----> 7619 data:110011 bdata:b'x80x01xadxbb'
[2024-08-26 13:34:16] [DEBUG] READ KEY NO:1 NAME:id FINISH. VALUES: 110011
[2024-08-26 13:34:16] [DEBUG] READ TRX(6) AND ROLLPTR(7) INFO
[2024-08-26 13:34:16] [DEBUG] TRX: 51399743 ROLLPTR: 562950866338476
[2024-08-26 13:34:16] [DEBUG] ROW VERSION : -1
[2024-08-26 13:34:16] [DEBUG] INSTANT FLAG : False
[2024-08-26 13:34:16] [DEBUG] ROW VERSION FLAG : False
[2024-08-26 13:34:16] [DEBUG] NAME: k VERSION_ADDED:0 VERSION_DROPED:0 COL_INSTANT:False ROW VERSION:-1
[2024-08-26 13:34:16] [DEBUG] 7632 ----> 7636 data:1027057 bdata:b'x80x0fxabxf1'
[2024-08-26 13:34:16] [DEBUG] ######## DDCW FLAG 8 ########
[2024-08-26 13:34:16] [DEBUG] NAME: c VERSION_ADDED:0 VERSION_DROPED:0 COL_INSTANT:False ROW VERSION:-1
[2024-08-26 13:34:16] [DEBUG] 7636 ----> 7756 data:session 1 bdata:b'session 1 '
[2024-08-26 13:34:16] [DEBUG] ######## DDCW FLAG 8 ########
[2024-08-26 13:34:16] [DEBUG] NAME: pad VERSION_ADDED:0 VERSION_DROPED:0 COL_INSTANT:False ROW VERSION:-1
[2024-08-26 13:34:16] [DEBUG] 7756 ----> 7816 data:02676592319-55855783189-94370134366-73974511206-02616100121 bdata:b'02676592319-55855783189-94370134366-73974511206-02616100121 '
我们得到
事务ID:51399743 (session 1)
ROLLPTR: 562950866338476
数据部分也是吻合的(update之后的), 然后我们根据回滚指针查看undo日志. 脚本见: https://github.com/ddcw/ddcw/tree/master/python/undo_reader
代码语言:shell复制(venv) 13:35:53 [root@ddcw21 ei]#python undo_reader.py /data/mysql_3314/mysqldata/undo_002 -r 562950866338476
PAGENO:13930 OFFSET:684 --> 836 rseg_id:2 is_insert:False
DATA: b'\x00x01xc0DLx00x00x03x10LCxe0x81x00x00x17x8cx01x10x04x80x01xadxbbx01x04x99419809132-69853229692-21844469818-64365421768-18331047661-28311633260-44480639767-40347505668-33492861467-91598865267 '
得到了update之前的数据. 但我们发现只有主键和修改前的字段的值(并没有完整的字段)
代码语言:txt复制x80x01xadxbb 110011
知识都串起来了.
总结
- mysql会自动处理死锁
- 死锁信息只包含最近的sql, 若要看历史sql(完整事务sql), 得查看
performance_schema.events_statements_history
- update之前的信息只记录修改部分在undo里面, 不会完整记录整行数据.
参考:
https://dev.mysql.com/doc/refman/8.0/en/innodb-deadlocks.html