关于死锁,相信计算机专业的同学不止在一门课里面接触过,操作系统老师讲过死锁吧,数据库系统概念的老师也讲过死锁吧。至于死锁原理,这里不赘述了,有兴趣的同学可以参考下这两篇博文(http://hedengcheng.com/?p=844,http://www.cnblogs.com/LBSer/p/5183300.html)写的非常棒,本文纯粹讲述如何使用pt-deadlock-logger这个工具来记录死锁。
有同学可能有疑问了,pt-deadlock-logger仅仅只有一个死锁记录功能,这有啥用啊?又不能避免死锁,又不能打开死锁。这么讲吧,当线上系统出现死锁,那是相当严重的事情,业务积压,各种领导都蹦出来了,数据库咋回事,各种运维围到你身边,这是比较严重事情哦,但是死锁场景又很难抓出来,让开发修改。但是pt-deadlock-logger就帮我们解决了这个问题。
pt-deadlock-logger工具介绍
pt-deadlock-logger其实就是一个脚本连接到数据库服务器,用于探测死锁,一旦有死锁发现,就会从数据库中采集死锁相关信息,输出到你指定的地方。只要有死锁,我们就可以看到死锁SQL,死锁的线程ID等等相关信息。是不是超棒的功能呢。
模拟死锁场景
代码语言:javascript复制root@localhost [3306][wjq]>CREATE TABLE `wjqtab1` (
-> `id` int(11) NOT NULL AUTO_INCREMENT,
-> `a` int(11) DEFAULT NULL,
-> `b` int(11) DEFAULT NULL,
-> PRIMARY KEY (`id`),
-> KEY `idxa` (`a`)
-> ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.02 sec)
root@localhost [3306][wjq]>insert into wjqtab1(a,b) values(2,7),(5,10),(6,45);
Query OK, 3 rows affected (0.00 sec)
Records: 3 Duplicates: 0 Warnings: 0
session1:
代码语言:javascript复制[session1] root@localhost [wjq]begin;
Query OK, 0 rows affected (0.00 sec)
[session1] root@localhost [wjq]
[session1] root@localhost [wjq]delete from wjqtab1 where a=5;
Query OK, 1 row affected (0.00 sec)
session2:
代码语言:javascript复制[session2] root@localhost [wjq] begin;
Query OK, 0 rows affected (0.00 sec)
[session2] root@localhost [wjq] delete from wjqtab1 where a=5;
等待。。。
session1:
代码语言:javascript复制[session1] root@localhost [wjq]insert into wjqtab1(a,b) values(2,30);
Query OK, 1 row affected (0.00 sec)
session2:
代码语言:javascript复制[session2] root@localhost [wjq] delete from wjqtab1 where a=5;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
产生了死锁,通常情况下我们通过show engine innodb status命令可以查看到死锁的信息,如下所示:
代码语言:javascript复制------------------------
LATEST DETECTED DEADLOCK
------------------------
2019-03-11 18:05:50 0x7f213c188700
*** (1) TRANSACTION:
TRANSACTION 350213, ACTIVE 6 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 103064, OS thread handle 139780852061952, query id 9628 localhost root updating
delete from wjqtab1 where a=5
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1093 page no 4 n bits 72 index idxa of table `wjq`.`wjqtab1` trx id 350213 lock_mode X waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 80000005; asc ;;
1: len 4; hex 80000002; asc ;;
*** (2) TRANSACTION:
TRANSACTION 350212, ACTIVE 29 sec inserting
mysql tables in use 1, locked 1
5 lock struct(s), heap size 1136, 4 row lock(s), undo log entries 2
MySQL thread id 100428, OS thread handle 139780718888704, query id 9629 localhost root update
insert into wjqtab1(a,b) values(2,30)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 1093 page no 4 n bits 72 index idxa of table `wjq`.`wjqtab1` trx id 350212 lock_mode X
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 80000005; asc ;;
1: len 4; hex 80000002; asc ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1093 page no 4 n bits 72 index idxa of table `wjq`.`wjqtab1` trx id 350212 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 80000005; asc ;;
1: len 4; hex 80000002; asc ;;
*** WE ROLL BACK TRANSACTION (1)
1、直接使用pt-deadlock-logger检测数据库中产生的死锁
代码语言:javascript复制[root@VM_54_118_centos ~]# pt-deadlock-logger h=localhost,u=root,p=xxxxxx,S=/tmp/mysql3306.sock
server ts thread txn_id txn_time user hostname ip db tbl idx lock_type lock_mode wait_hold victim query
localhost 2019-03-11T18:05:50 100428 0 29 root localhost wjq wjqtab1 idxa RECORD X w 0 insert into wjqtab1(a,b) values(2,30)
localhost 2019-03-11T18:05:50 103064 0 6 root localhost wjq wjqtab1 idxa RECORD X w 1 delete from wjqtab1 where a=5
输出参数相关说明: server:数据库服务器地址,即死锁产生的数据库主机 ts:检测到死锁的时间戳 thread:产生死锁的线程id,这个id和show processlist里面的线程id是一致的 txn_id:innodb的事务ID txd_time:死锁检查到前,事务执行时间 user:执行transcation的用户名 hostname:客户端主机名 ip:客户端ip db:发生死锁的DB名 tbl:死锁发生的表名 idx:产生死锁的索引名(在上面这个demo里面, 我们直接走的主键,加的记录锁) lock_type:锁的类型(记录锁,gap锁,next-key锁) lock_mode:锁模式(S,X) wait_hold:是否等着锁释放,一般死锁都是两个wait victim:该会话是否做了牺牲,终止了执行 query:造成死锁的SQL语句
2、将死锁的信息输出到日志文件中
代码语言:javascript复制[root@VM_54_118_centos ~]# cat pt-deadlocks.conf
#login to MySQL server#
host=localhost
port=3306
socket=/tmp/mysql3306.sock
user=root
password=qcloud@2018
charset=utf8mb4
#log to file
tab
log=/tmp/pt_deadlock.log
daemonize
interval=5
代码语言:javascript复制配置文件中参数说明 tab:代表,输出是带上制表符,格式化显示,更加清晰 log:指定日志文件用来存储死锁检测信息输出 daemonize:放到后台运行,守护进程模式 interval:每隔5秒检测一次 还有几个参数没有列出 run-time:pt-deadlock-logger运行时间 iterations:pt-deadlock-logger检测的次数 假如–run-time 2m –iterations 4 –interval 30,运行两分钟,每隔30s检测一次,一共检测四次
[root@VM_54_118_centos ~]# pt-deadlock-logger --config pt-deadlocks.conf
[root@VM_54_118_centos ~]# tail -f /tmp/pt_deadlock.log
server ts thread txn_id txn_time user hostname ip db tbl idx lock_type lock_mode wait_hold victim query
localhost 2019-03-11T18:05:50 100428 0 29 root localhost wjq wjqtab1 idxa RECORD X w 0 insert into wjqtab1(a,b) values(2,30)
localhost 2019-03-11T18:05:50 103064 0 6 root localhost wjq wjqtab1 idxa RECORD X w 1 delete from wjqtab1 where a=5
3、将检查到的死锁信息存储到表中
下面是存储死锁信息的表结构(可以提前创建也可以在使用命令的使用加上–create-dest-table自动创建)
代码语言:javascript复制root@localhost [3307][wjq]>CREATE TABLE deadlocks (
-> server char(20) NOT NULL,
-> ts timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
-> thread int unsigned NOT NULL,
-> txn_id bigint unsigned NOT NULL,
-> txn_time smallint unsigned NOT NULL,
-> user char(16) NOT NULL,
-> hostname char(20) NOT NULL,
-> ip char(15) NOT NULL, -- alternatively, ip int unsigned NOT NULL
-> db char(64) NOT NULL,
-> tbl char(64) NOT NULL,
-> idx char(64) NOT NULL,
-> lock_type char(16) NOT NULL,
-> lock_mode char(1) NOT NULL,
-> wait_hold char(1) NOT NULL,
-> victim tinyint unsigned NOT NULL,
-> query text NOT NULL,
-> PRIMARY KEY (server,ts,thread)
-> ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.05 sec)
下面的命令会自动创建存放死锁信息的表
代码语言:javascript复制[root@VM_54_118_centos ~]# pt-deadlock-logger h=localhost,u=root,p=qcloud@2018,S=/tmp/mysql3306.sock --dest h=localhost,u=root,p=qcloud@2018,S=/tmp/mysql3307.sock,D=wjq,t=deadlocks --create-dest-table
下面的命令不会自动创建存放死锁信息的表,需要提前创建
代码语言:javascript复制[root@VM_54_118_centos ~]# pt-deadlock-logger h=localhost,u=root,p=xxxxxxxx,S=/tmp/mysql3306.sock --dest h=localhost,u=root,p=qcloud@2018,S=/tmp/mysql3307.sock,D=wjq,t=deadlocks
server ts thread txn_id txn_time user hostname ip db tbl idx lock_type lock_mode wait_hold victim query
localhost 2019-03-11T18:05:50 100428 0 29 root localhost wjq wjqtab1 idxa RECORD X w 0 insert into wjqtab1(a,b) values(2,30)
localhost 2019-03-11T18:05:50 103064 0 6 root localhost wjq wjqtab1 idxa RECORD X w 1 delete from wjqtab1 where a=5
查看存放死锁信息的表,查看死锁有关的信息
代码语言:javascript复制root@localhost [3307][wjq]>select * from deadlocksG;
*************************** 1. row ***************************
server: localhost
ts: 2019-03-11 18:05:50
thread: 100428
txn_id: 0
txn_time: 29
user: root
hostname: localhost
ip:
db: wjq
tbl: wjqtab1
idx: idxa
lock_type: RECORD
lock_mode: X
wait_hold: w
victim: 0
query: insert into wjqtab1(a,b) values(2,30)
*************************** 2. row ***************************
server: localhost
ts: 2019-03-11 18:05:50
thread: 103064
txn_id: 0
txn_time: 6
user: root
hostname: localhost
ip:
db: wjq
tbl: wjqtab1
idx: idxa
lock_type: RECORD
lock_mode: X
wait_hold: w
victim: 1
query: delete from wjqtab1 where a=5
2 rows in set (0.00 sec)
关于pt-deadlock-logger的详细使用可参考官方文档:
https://www.percona.com/doc/percona-toolkit/3.0/pt-deadlock-logger.html