GreatSQL死锁案例分析及扩展解读

2024-05-18 09:05:36 浏览数 (2)

1.背景概述 客户业务发生死锁的报错,根据业务程序日志及业务流程,发现造成死锁的原因是:事务1 delete insert ,事务2 delete insert,这两个事务交替执行导致的死锁;由于GAP锁阻塞了插入意向锁,并且当delete的数据存在时死锁不会发生,当delete的数据不存在时,会发生死锁。 2.问题复现 本次测试基于 GreatSQL-8.0.32-24,隔离级别为 RR。 2.1 创建测试表 greatsql> create database test; greatsql> create table test(c1 int unique key, c2 int, c3 int, c4 int); greatsql> insert into test values (1,1,1,1),(3,3,3,3),(5,5,5,5),(9,9,9,9); greatsql> select * from test; ------ ------ ------ ------ | c1 | c2 | c3 | c4 | ------ ------ ------ ------ | 1 | 1 | 1 | 1 | | 3 | 3 | 3 | 3 | | 5 | 5 | 5 | 5 | | 9 | 9 | 9 | 9 | ------ ------ ------ ------ 4 rows in set (0.01 sec) 2.2 事务执行顺序 按以下事务执行顺序,如果要删除的数据存在,则不会发生死锁;如果要删除的数据不存,并且要删除的数据在同一个GAP锁的区间内则会发生死锁; 时间事务1事务2T1BEGIN;BEGIN;T2delete from test where c1=?; T3 delete from test where c1=?;T4insert into test value(?,?,?,?); T5 insert into test value(?,?,?,?); 2.3 当delete的数据存在时 事务1:delete greatsql> begin; Query OK, 0 rows affected (0.00 sec) greatsql> delete from test where c1=3; Query OK, 1 row affected (0.00 sec) 查看锁信息: greatsql> select THREAD_ID,OBJECT_SCHEMA,OBJECT_NAME,INDEX_NAME,LOCK_TYPE,LOCK_MODE,LOCK_STATUS,LOCK_DATA from performance_schema.data_locks; ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- | THREAD_ID | OBJECT_SCHEMA | OBJECT_NAME | INDEX_NAME | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA | ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- | 59 | test | test | NULL | TABLE | IX | GRANTED | NULL | | 59 | test | test | c1 | RECORD | X,REC_NOT_GAP | GRANTED | 3, 0x000000000201 | | 59 | test | test | GEN_CLUST_INDEX | RECORD | X,REC_NOT_GAP | GRANTED | 0x000000000201 | ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- 3 rows in set (0.00 sec) 此时事务1给 3, 0x000000000201 这条数据加了 记录锁 X,REC_NOT_GAP 事务2:delete greatsql> begin; Query OK, 0 rows affected (0.00 sec) greatsql> delete from test where c1=5; Query OK, 1 row affected (0.00 sec) 查看锁信息: greatsql> select THREAD_ID,OBJECT_SCHEMA,OBJECT_NAME,INDEX_NAME,LOCK_TYPE,LOCK_MODE,LOCK_STATUS,LOCK_DATA from performance_schema.data_locks; ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- | THREAD_ID | OBJECT_SCHEMA | OBJECT_NAME | INDEX_NAME | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA | ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- | 57 | test | test | NULL | TABLE | IX | GRANTED | NULL | | 57 | test | test | c1 | RECORD | X,REC_NOT_GAP | GRANTED | 5, 0x000000000202 | | 57 | test | test | GEN_CLUST_INDEX | RECORD | X,REC_NOT_GAP | GRANTED | 0x000000000202 | | 59 | test | test | NULL | TABLE | IX | GRANTED | NULL | | 59 | test | test | c1 | RECORD | X,REC_NOT_GAP | GRANTED | 3, 0x000000000201 | | 59 | test | test | GEN_CLUST_INDEX | RECORD | X,REC_NOT_GAP | GRANTED | 0x000000000201 | ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- 6 rows in set (0.00 sec) 此时事务2给 5, 0x000000000202 这条数据加了 记录锁 X,REC_NOT_GAP 事务1:insert greatsql> insert into test value(3,3,3,3); Query OK, 1 row affected (0.00 sec) 查看锁信息: greatsql> select THREAD_ID,OBJECT_SCHEMA,OBJECT_NAME,INDEX_NAME,LOCK_TYPE,LOCK_MODE,LOCK_STATUS,LOCK_DATA from performance_schema.data_locks; ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- | THREAD_ID | OBJECT_SCHEMA | OBJECT_NAME | INDEX_NAME | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA | ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- | 57 | test | test | NULL | TABLE | IX | GRANTED | NULL | | 57 | test | test | c1 | RECORD | X,REC_NOT_GAP | GRANTED | 5, 0x000000000202 | | 57 | test | test | GEN_CLUST_INDEX | RECORD | X,REC_NOT_GAP | GRANTED | 0x000000000202 | | 59 | test | test | NULL | TABLE | IX | GRANTED | NULL | | 59 | test | test | c1 | RECORD | X,REC_NOT_GAP | GRANTED | 3, 0x000000000201 | | 59 | test | test | GEN_CLUST_INDEX | RECORD | X,REC_NOT_GAP | GRANTED | 0x000000000201 | | 59 | test | test | c1 | RECORD | S,GAP | GRANTED | 3, 0x000000000201 | | 59 | test | test | c1 | RECORD | S,GAP | GRANTED | 5, 0x000000000202 | | 59 | test | test | c1 | RECORD | S,GAP | GRANTED | 3, 0x000000000206 | ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- 9 rows in set (0.00 sec) 此时事务1给 被delete删除的数据 3, 0x000000000201 ,插入的数据 3, 0x000000000202,以及相邻的下一条数据 5, 0x000000000202 加了 间隙锁 S,GAP 事务2:insert greatsql> insert into test value(5,5,5,5); Query OK, 1 row affected (0.01 sec) 查看锁信息: greatsql> select THREAD_ID,OBJECT_SCHEMA,OBJECT_NAME,INDEX_NAME,LOCK_TYPE,LOCK_MODE,LOCK_STATUS,LOCK_DATA from performance_schema.data_locks; ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- | THREAD_ID | OBJECT_SCHEMA | OBJECT_NAME | INDEX_NAME | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA | ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- | 57 | test | test | NULL | TABLE | IX | GRANTED | NULL | | 57 | test | test | c1 | RECORD | X,REC_NOT_GAP | GRANTED | 5, 0x000000000202 | | 57 | test | test | GEN_CLUST_INDEX | RECORD | X,REC_NOT_GAP | GRANTED | 0x000000000202 | | 57 | test | test | c1 | RECORD | S,GAP | GRANTED | 5, 0x000000000202 | | 57 | test | test | c1 | RECORD | S,GAP | GRANTED | 9, 0x000000000203 | | 57 | test | test | c1 | RECORD | S,GAP | GRANTED | 5, 0x000000000207 | | 59 | test | test | NULL | TABLE | IX | GRANTED | NULL | | 59 | test | test | c1 | RECORD | X,REC_NOT_GAP | GRANTED | 3, 0x000000000201 | | 59 | test | test | GEN_CLUST_INDEX | RECORD | X,REC_NOT_GAP | GRANTED | 0x000000000201 | | 59 | test | test | c1 | RECORD | S,GAP | GRANTED | 3, 0x000000000201 | | 59 | test | test | c1 | RECORD | S,GAP | GRANTED | 5, 0x000000000202 | | 59 | test | test | c1 | RECORD | S,GAP | GRANTED | 3, 0x000000000206 | ----------- --------------- ------------- ----------------- ----------- --------------- ------------- ------------------- 12 rows in set (0.00 sec) 此时事务2给 被delete删除的数据 5, 0x000000000202 ,插入的数据 5, 0x000000000207,以及相邻的下一条数据 9, 0x000000000203 加了 间隙锁 S,GAP 由于GAP锁之间是相互兼容的,所以没有发生锁等待及死锁,此时事务1,事务2都执行完成,可以正常提交。 2.4 当delete的数据不存在时 事务1:delete greatsql> begin; Query OK, 0 rows affected (0.01 sec) greatsql> delete from test where c1=6; Query OK, 0 rows affected (0.00 sec) 查看锁信息: greatsql> select THREAD_ID,OBJECT_SCHEMA,OBJECT_NAME,INDEX_NAME,LOCK_TYPE,LOCK_MODE,LOCK_STATUS,LOCK_DATA from performance_schema.data_locks; ----------- --------------- ------------- ------------ ----------- ----------- ------------- ------------------- | THREAD_ID | OBJECT_SCHEMA | OBJECT_NAME | INDEX_NAME | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA | ----------- --------------- ------------- ------------ ----------- ----------- ------------- ------------------- | 61 | test | test | NULL | TABLE | IX | GRANTED | NULL | | 61 | test | test | c1 | RECORD | X,GAP | GRANTED | 9, 0x000000000203 | ----------- --------------- ------------- ------------ ----------- ----------- ------------- ------------------- 2 rows in set (0.00 sec) 此时事务1给 9, 0x000000000203 这条数据加了 间隙锁 X,GAP 事务2:delete greatsql> begin; Query OK, 0 rows affected (0.00 sec) greatsql> delete from test where c1=7; Query OK, 0 rows affected (0.00 sec) 查看锁信息: greatsql> select THREAD_ID,OBJECT_SCHEMA,OBJECT_NAME,INDEX_NAME,LOCK_TYPE,LOCK_MODE,LOCK_STATUS,LOCK_DATA from performance_schema.data_locks; ----------- --------------- ------------- ------------ ----------- ----------- ------------- ------------------- | THREAD_ID | OBJECT_SCHEMA | OBJECT_NAME | INDEX_NAME | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA | ----------- --------------- ------------- ------------ ----------- ----------- ------------- ------------------- | 62 | test | test | NULL | TABLE | IX | GRANTED | NULL | | 62 | test | test | c1 | RECORD | X,GAP | GRANTED | 9, 0x000000000203 | | 61 | test | test | NULL | TABLE | IX | GRANTED | NULL | | 61 | test | test | c1 | RECORD | X,GAP | GRANTED | 9, 0x000000000203 | ----------- --------------- ------------- ------------ ----------- ----------- ------------- ------------------- 4 rows in set (0.00 sec) 此时事务2给 9, 0x000000000203 这条数据加了 间隙锁 X,GAP 间隙锁可以相互兼容,因此没有报错 事务1:insert greatsql> insert into test value(6,6,6,6); ---hang住,处于锁等待 查看锁信息: greatsql> select THREAD_ID,OBJECT_SCHEMA,OBJECT_NAME,INDEX_NAME,LOCK_TYPE,LOCK_MODE,LOCK_STATUS,LOCK_DATA from performance_schema.data_locks; ----------- --------------- ------------- ------------ ----------- ------------------------ ------------- ------------------- | THREAD_ID | OBJECT_SCHEMA | OBJECT_NAME | INDEX_NAME | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA | ----------- --------------- ------------- ------------ ----------- ------------------------ ------------- ------------------- | 62 | test | test | NULL | TABLE | IX | GRANTED | NULL | | 62 | test | test | c1 | RECORD | X,GAP | GRANTED | 9, 0x000000000203 | | 61 | test | test | NULL | TABLE | IX | GRANTED | NULL | | 61 | test | test | c1 | RECORD | X,GAP | GRANTED | 9, 0x000000000203 | | 61 | test | test | c1 | RECORD | X,GAP,INSERT_INTENTION | WAITING | 9, 0x000000000203 | ----------- --------------- ------------- ------------ ----------- ------------------------ ------------- ------------------- 5 rows in set (0.00 sec) 此时事务1,要申请给 9, 0x000000000203 这条数据加 间隙锁,插入意向锁 X,GAP,INSERT_INTENTION;由于事务2已经加了 间隙锁 X,GAP 间隙锁与插入意向锁并不兼容,因此事务1的insert处于锁等待状态 事务2:insert greatsql> insert into test value(7,7,7,7); ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction 查看锁信息: greatsql> select THREAD_ID,OBJECT_SCHEMA,OBJECT_NAME,INDEX_NAME,LOCK_TYPE,LOCK_MODE,LOCK_STATUS,LOCK_DATA from performance_schema.data_locks; ----------- --------------- ------------- ----------------- ----------- ------------------------ ------------- ------------------------ | THREAD_ID | OBJECT_SCHEMA | OBJECT_NAME | INDEX_NAME | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA | ----------- --------------- ------------- ----------------- ----------- ------------------------ ------------- ------------------------ | 61 | test | test | NULL | TABLE | IX | GRANTED | NULL | | 61 | test | test | c1 | RECORD | X,GAP | GRANTED | 9, 0x000000000203 | | 61 | test | test | c1 | RECORD | X,GAP | GRANTED | 6, 0x000000000213 | | 61 | test | test | GEN_CLUST_INDEX | RECORD | X | GRANTED | supremum pseudo-record | | 61 | test | test | GEN_CLUST_INDEX | RECORD | X,GAP | GRANTED | 0x000000000213 | | 61 | test | test | c1 | RECORD | X,GAP,INSERT_INTENTION | GRANTED | 9, 0x000000000203 | ----------- --------------- ------------- ----------------- ----------- ------------------------ ------------- ------------------------ 6 rows in set (0.01 sec) 事务2回滚,只有事务1的加锁信息。 由于此时事务2,要申请给 9, 0x000000000203 这条数据加 间隙锁,插入意向锁由于事务1已经加了 间隙锁 X,GAP 间隙锁与插入意向锁并不兼容,因此事务2的insert处于锁等待状态,2个事务相互等待锁导致死锁,此时事务2回滚。 3.总结 此次死锁的发生主要是GAP 锁 和 插入意向锁的冲突,建议让业务修改一下逻辑,先判断数据是否存在 select 一下,存在的话 delete 后在 insert ; 不存在的话直接 insert 不用delete了。

Enjoy GreatSQL :)


0 人点赞