【Percona-toolkit系列】Percona Toolkit工具之pt-deadlock-logger

2021-03-03 10:11:44 浏览数 (1)

关于死锁,相信计算机专业的同学不止在一门课里面接触过,操作系统老师讲过死锁吧,数据库系统概念的老师也讲过死锁吧。至于死锁原理,这里不赘述了,有兴趣的同学可以参考下这两篇博文(http://hedengcheng.com/?p=844,http://www.cnblogs.com/LBSer/p/5183300.html)写的非常棒,本文纯粹讲述如何使用pt-deadlock-logger这个工具来记录死锁。

有同学可能有疑问了,pt-deadlock-logger仅仅只有一个死锁记录功能,这有啥用啊?又不能避免死锁,又不能打开死锁。这么讲吧,当线上系统出现死锁,那是相当严重的事情,业务积压,各种领导都蹦出来了,数据库咋回事,各种运维围到你身边,这是比较严重事情哦,但是死锁场景又很难抓出来,让开发修改。但是pt-deadlock-logger就帮我们解决了这个问题。

pt-deadlock-logger工具介绍

pt-deadlock-logger其实就是一个脚本连接到数据库服务器,用于探测死锁,一旦有死锁发现,就会从数据库中采集死锁相关信息,输出到你指定的地方。只要有死锁,我们就可以看到死锁SQL,死锁的线程ID等等相关信息。是不是超棒的功能呢。

模拟死锁场景

代码语言:javascript复制
root@localhost [3306][wjq]>CREATE TABLE `wjqtab1` (
    ->   `id` int(11) NOT NULL AUTO_INCREMENT,
    ->   `a` int(11) DEFAULT NULL,
    ->   `b` int(11) DEFAULT NULL,
    ->   PRIMARY KEY (`id`),
    ->   KEY `idxa` (`a`)
    -> ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.02 sec)
 
root@localhost [3306][wjq]>insert into wjqtab1(a,b) values(2,7),(5,10),(6,45);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

session1:

代码语言:javascript复制
[session1] root@localhost [wjq]begin;
Query OK, 0 rows affected (0.00 sec)
 
[session1] root@localhost [wjq]
[session1] root@localhost [wjq]delete from wjqtab1 where a=5;
Query OK, 1 row affected (0.00 sec)

session2:

代码语言:javascript复制
[session2] root@localhost [wjq] begin;
Query OK, 0 rows affected (0.00 sec)
 
[session2] root@localhost [wjq] delete from wjqtab1 where a=5;
等待。。。

session1:

代码语言:javascript复制
[session1] root@localhost [wjq]insert into wjqtab1(a,b) values(2,30);
Query OK, 1 row affected (0.00 sec)

session2:

代码语言:javascript复制
[session2] root@localhost [wjq] delete from wjqtab1 where a=5;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

产生了死锁,通常情况下我们通过show engine innodb status命令可以查看到死锁的信息,如下所示:

代码语言:javascript复制
------------------------
LATEST DETECTED DEADLOCK
------------------------
2019-03-11 18:05:50 0x7f213c188700
*** (1) TRANSACTION:
TRANSACTION 350213, ACTIVE 6 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 103064, OS thread handle 139780852061952, query id 9628 localhost root updating
delete from wjqtab1 where a=5
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1093 page no 4 n bits 72 index idxa of table `wjq`.`wjqtab1` trx id 350213 lock_mode X waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 80000005; asc     ;;
 1: len 4; hex 80000002; asc     ;;
 
*** (2) TRANSACTION:
TRANSACTION 350212, ACTIVE 29 sec inserting
mysql tables in use 1, locked 1
5 lock struct(s), heap size 1136, 4 row lock(s), undo log entries 2
MySQL thread id 100428, OS thread handle 139780718888704, query id 9629 localhost root update
insert into wjqtab1(a,b) values(2,30)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 1093 page no 4 n bits 72 index idxa of table `wjq`.`wjqtab1` trx id 350212 lock_mode X
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 80000005; asc     ;;
 1: len 4; hex 80000002; asc     ;;
 
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1093 page no 4 n bits 72 index idxa of table `wjq`.`wjqtab1` trx id 350212 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 80000005; asc     ;;
 1: len 4; hex 80000002; asc     ;;
 
*** WE ROLL BACK TRANSACTION (1)

1、直接使用pt-deadlock-logger检测数据库中产生的死锁

代码语言:javascript复制
[root@VM_54_118_centos ~]# pt-deadlock-logger h=localhost,u=root,p=xxxxxx,S=/tmp/mysql3306.sock
server ts thread txn_id txn_time user hostname ip db tbl idx lock_type lock_mode wait_hold victim query
localhost 2019-03-11T18:05:50 100428 0 29 root localhost  wjq wjqtab1 idxa RECORD X w 0 insert into wjqtab1(a,b) values(2,30)
localhost 2019-03-11T18:05:50 103064 0 6 root localhost  wjq wjqtab1 idxa RECORD X w 1 delete from wjqtab1 where a=5

输出参数相关说明: server:数据库服务器地址,即死锁产生的数据库主机 ts:检测到死锁的时间戳 thread:产生死锁的线程id,这个id和show processlist里面的线程id是一致的 txn_id:innodb的事务ID txd_time:死锁检查到前,事务执行时间 user:执行transcation的用户名 hostname:客户端主机名 ip:客户端ip db:发生死锁的DB名 tbl:死锁发生的表名 idx:产生死锁的索引名(在上面这个demo里面, 我们直接走的主键,加的记录锁) lock_type:锁的类型(记录锁,gap锁,next-key锁) lock_mode:锁模式(S,X) wait_hold:是否等着锁释放,一般死锁都是两个wait victim:该会话是否做了牺牲,终止了执行 query:造成死锁的SQL语句

2、将死锁的信息输出到日志文件中

代码语言:javascript复制
[root@VM_54_118_centos ~]# cat pt-deadlocks.conf 
#login to MySQL server#
host=localhost
port=3306
socket=/tmp/mysql3306.sock
user=root
password=qcloud@2018
charset=utf8mb4
 
#log to file
tab
log=/tmp/pt_deadlock.log
daemonize
interval=5

配置文件中参数说明 tab:代表,输出是带上制表符,格式化显示,更加清晰 log:指定日志文件用来存储死锁检测信息输出 daemonize:放到后台运行,守护进程模式 interval:每隔5秒检测一次 还有几个参数没有列出 run-time:pt-deadlock-logger运行时间 iterations:pt-deadlock-logger检测的次数 假如–run-time 2m –iterations 4 –interval 30,运行两分钟,每隔30s检测一次,一共检测四次

代码语言:javascript复制
[root@VM_54_118_centos ~]# pt-deadlock-logger --config pt-deadlocks.conf
 
[root@VM_54_118_centos ~]# tail -f /tmp/pt_deadlock.log 
server  ts      thread  txn_id  txn_time        user    hostname        ip      db      tbl     idx     lock_type       lock_mode       wait_hold  victim   query
localhost       2019-03-11T18:05:50     100428  0       29      root    localhost               wjq     wjqtab1 idxa    RECORD  X       w       0  insert into wjqtab1(a,b) values(2,30)
localhost       2019-03-11T18:05:50     103064  0       6       root    localhost               wjq     wjqtab1 idxa    RECORD  X       w       1  delete from wjqtab1 where a=5

3、将检查到的死锁信息存储到表中

下面是存储死锁信息的表结构(可以提前创建也可以在使用命令的使用加上–create-dest-table自动创建)

代码语言:javascript复制
root@localhost [3307][wjq]>CREATE TABLE deadlocks (
    ->   server char(20) NOT NULL,
    ->   ts timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
    ->   thread int unsigned NOT NULL,
    ->   txn_id bigint unsigned NOT NULL,
    ->   txn_time smallint unsigned NOT NULL,
    ->   user char(16) NOT NULL,
    ->   hostname char(20) NOT NULL,
    ->   ip char(15) NOT NULL, -- alternatively, ip int unsigned NOT NULL
    ->   db char(64) NOT NULL,
    ->   tbl char(64) NOT NULL,
    ->   idx char(64) NOT NULL,
    ->   lock_type char(16) NOT NULL,
    ->   lock_mode char(1) NOT NULL,
    ->   wait_hold char(1) NOT NULL,
    ->   victim tinyint unsigned NOT NULL,
    ->   query text NOT NULL,
    ->   PRIMARY KEY  (server,ts,thread)
    -> ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.05 sec)

下面的命令会自动创建存放死锁信息的表

代码语言:javascript复制
[root@VM_54_118_centos ~]# pt-deadlock-logger h=localhost,u=root,p=qcloud@2018,S=/tmp/mysql3306.sock --dest h=localhost,u=root,p=qcloud@2018,S=/tmp/mysql3307.sock,D=wjq,t=deadlocks --create-dest-table

下面的命令不会自动创建存放死锁信息的表,需要提前创建

代码语言:javascript复制
[root@VM_54_118_centos ~]# pt-deadlock-logger h=localhost,u=root,p=xxxxxxxx,S=/tmp/mysql3306.sock --dest h=localhost,u=root,p=qcloud@2018,S=/tmp/mysql3307.sock,D=wjq,t=deadlocks
server ts thread txn_id txn_time user hostname ip db tbl idx lock_type lock_mode wait_hold victim query
localhost 2019-03-11T18:05:50 100428 0 29 root localhost  wjq wjqtab1 idxa RECORD X w 0 insert into wjqtab1(a,b) values(2,30)
localhost 2019-03-11T18:05:50 103064 0 6 root localhost  wjq wjqtab1 idxa RECORD X w 1 delete from wjqtab1 where a=5

查看存放死锁信息的表,查看死锁有关的信息

代码语言:javascript复制
root@localhost [3307][wjq]>select * from deadlocksG;
*************************** 1. row ***************************
   server: localhost
       ts: 2019-03-11 18:05:50
   thread: 100428
   txn_id: 0
 txn_time: 29
     user: root
 hostname: localhost
       ip: 
       db: wjq
      tbl: wjqtab1
      idx: idxa
lock_type: RECORD
lock_mode: X
wait_hold: w
   victim: 0
    query: insert into wjqtab1(a,b) values(2,30)
*************************** 2. row ***************************
   server: localhost
       ts: 2019-03-11 18:05:50
   thread: 103064
   txn_id: 0
 txn_time: 6
     user: root
 hostname: localhost
       ip: 
       db: wjq
      tbl: wjqtab1
      idx: idxa
lock_type: RECORD
lock_mode: X
wait_hold: w
   victim: 1
    query: delete from wjqtab1 where a=5
2 rows in set (0.00 sec)

关于pt-deadlock-logger的详细使用可参考官方文档:

https://www.percona.com/doc/percona-toolkit/3.0/pt-deadlock-logger.html

0 人点赞