作者:八怪(高鹏) 中亦科技数据库专家
一、问题抛出
最近遇到一个问题,得到栈如下(5.6.25):
出现这个问题的时候只存在一个读写事务,那就是本事务。对这里的红色部分比较感兴趣,但是这里不是所有的内容都和这个问题相关,主要还是围绕可见性判断和隐式锁判定进行,算是我的思考过程。但是对Innodb认知水平有限,如有误导请谅解。使用的源码版本5.7.29。
二、read view 简述
关于read view说明的文章已经很多了,我这里简单记录一下我学习的地方。一致性读取(consistent read),根据隔离级别的不同,会在不同的时机建立read view,如下:
- RR 事务的第一个select命令发起的时候建立read view,直到事务提交释放
- RC 事务的每一个select都会单独建立read view
有了read view 就能够对每行数据的可见性进行判断了,下面是read view中的关键属性
- m_up_limit_id:如果行的trx id 小于了m_up_limit_id则不可见。
- m_low_limit_id:如果行的trx id 大于了m_low_limit_id则可见。
- m_ids:是用于记录建立read view时刻的读写事务的vector数组,用于对于m_up_limit_id和m_low_limit_id之间的trx需要根据它来进行判定,是否处于活跃状态。
- m_low_limit_no则用于记录建立read view时刻的最小trx no,主要用于purge线程判断清理undo使用。
如何拿到值得具体可以参见附录,而对于可见性的判断我们可以参考如下函数:
代码语言:javascript复制/** Check whether the changes by id are visible.
@param[in] id transaction id to check against the view
@param[in] name table name
@return whether the view sees the modifications of id. */
bool changes_visible(
trx_id_t id,
const table_name_t& name) const
MY_ATTRIBUTE((warn_unused_result))
{
ut_ad(id > 0);
if (id < m_up_limit_id || id == m_creator_trx_id) { //小于 可见
return(true);
}
check_trx_id_sanity(id, name);
if (id >= m_low_limit_id) { //大于不可见
return(false);
} else if (m_ids.empty()) { //如果之间的 active 为空 则可见
return(true);
}
const ids_t::value_type* p = m_ids.data();
return(!std::binary_search(p, p m_ids.size(), id)); //否则比较本trx id 是否在这之中,如果在不可以见,反之可见
}
三、关于可见性判断的几个问题
1、有大量的删除行,且已经提交,但是没有被purge线程清理
这种情况由于大量删除行(或者update)并且已经提交,但是由于有长时间的select语句导致read view记录的状态也比较陈旧,因此根据m_low_limit_no的判断purge线程是不能清理一些比较老旧的undo的,因此这会导致一个问题,如果这些del flag的记录会存在于逻辑记录链表内部,因此其他select扫描的时候回根据next offset扫描到,但是根据可见性判断条件这些del flag的记录trx id小于本select语句的read view 的 m_up_limit_id,因此是可见的debug如下:
代码语言:javascript复制387 return(view->changes_visible(trx_id, index->table->name));
(gdb) p view->changes_visible(trx_id, index->table->name)
$14 = true
但是因为已经标记为del flag因此会做跳过处理如下:
代码语言:javascript复制row_search_mvcc:
if (rec_get_deleted_flag(rec, comp)) {
/* The record is delete-marked: we can skip it */
...
goto next_rec;
也就是实际上在长时间read view的“保护”下,我们的undo不能清理,并且del flag不能清理还保存在block的逻辑链表中,扫描的时候会实际扫描到,只是做了跳过处理。因此会出现如下现象
T1 | T2 | T3 |
---|---|---|
select sleep(1000) from test(模拟长时间查询) | ||
begin;delete from test10;commit; | ||
select * from test10;(时间还是很久) |
这就是上面说的原因,虽然没有数据了,但是查询依旧很慢。
2、大量删除,还未提交
那么select扫描的时候会根据next offset 扫描到,但是由于read view 判断这些数据的trx id 位于 m_up_limit_id和m_low_limit_id之间,需要根据事务是否活跃(read view的m_ids,显然这里是活跃的)通过undo构建其前印象,如下判断:
代码语言:javascript复制lock_clust_rec_cons_read_sees
trx_id_t trx_id = row_get_rec_trx_id(rec, index, offsets);
return(view->changes_visible(trx_id, index->table->name));
3、using index也可能回表
我们知道如果执行计划使用到using index那么不会回表去取主键的数据,使用整个二级索引即可。但是这里有一种特殊情况,这里进行描述。
对于二级索引而言,因为row记录不包含trx id和undo ptr两个伪列,那么其可见性判断和前的印象构建均需要回表获取主键的记录,当然可见性判断可以先根据本二级索引page的max trx id是否小于read view的m_up_limit_id来进行第一次粗略过滤,那么可见性判断的可能性就低很多,如果通过了这个比对,那么剩余精确判断还是需要回表通过主键来比对才行,如下:
- 对于二级索引回表操作来讲,精确的可见性判断放到了回表后的lock_clust_rec_cons_read_sees函数上,关于二级索引的回表,参考附录。
- 对于不回表访问(using index),通过了粗略判断后(lock_sec_rec_cons_read_sees),如果遇到需要精确的可见性判断,那么也是要回表的,原因前面解释了(row记录不包含trx id和undo ptr),参考附录。
对于这个问题我们可以简单的做如下的测试,当然需要打断点才行:
代码语言:javascript复制测试表如下:
mysql> show create table testimp4 G
*************************** 1. row ***************************
Table: testimp4
Create Table: CREATE TABLE `testimp4` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`a` int(11) DEFAULT NULL,
`b` int(11) DEFAULT NULL,
`d` varchar(200) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `b` (`b`),
KEY `d` (`d`)
) ENGINE=InnoDB AUTO_INCREMENT=10000 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from testimp4;
------ ------ ------ ------------------------------------
| id | a | b | d |
------ ------ ------ ------------------------------------
| 5 | 5 | 300 | NULL |
| 6 | 7000 | 7700 | 1124 |
| 11 | 7000 | 7700 | 1124 |
| 12 | 7000 | 7700 | 1124 |
| 13 | 2900 | 1800 | NULL |
| 14 | 2900 | 1800 | NULL |
| 1000 | 88 | 1499 | NULL |
| 4000 | 6000 | 5904 | iiiafsafasfihhhccccchhhigggofgo111 |
| 4001 | 7000 | 7700 | 1124454555 |
| 9999 | 9999 | 9999 | a |
------ ------ ------ ------------------------------------
10 rows in set (0.00 sec)
对于下列语句的执行话是:
代码语言:javascript复制
mysql> desc select b from testimp4 where b=300;
---- ------------- ---------- ------------ ------ --------------- ------ --------- ------- ------ ---------- -------------
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---- ------------- ---------- ------------ ------ --------------- ------ --------- ------- ------ ---------- -------------
| 1 | SIMPLE | testimp4 | NULL | ref | b | b | 5 | const | 1 | 100.00 | Using index |
---- ------------- ---------- ------------ ------ --------------- ------ --------- ------- ------ ---------- -------------
1 row in set, 1 warning (0.00 sec)
我们做如下语句:
T1 | T2 |
---|---|
begin;delete from testimp4 where id=5;(不提交) | |
select b from testimp4 where b=300;(这里是需要回表的) |
这里显然T2(5 ,5 ,300 ,NULL )的这条记录已经被T1删除了,但是没有提交,T2首先判断二级索引b上这行数据所在的page其max trx id是否小于本select语句的read view的m_up_limit_id,显然这不成立,因为T1还会处于活跃状态,然后就进入了回表判断流程。栈如下:
代码语言:javascript复制#0 lock_clust_rec_cons_read_sees (rec=0x7fff060980a8 "200", index=0x7ffec0499330, offsets=0x7fffe8399a70, view=0x33b1368)
at /home/mysql/soft/percona-server-5.7.29-32/storage/innobase/lock/lock0lock.cc:369
#1 0x0000000001afbca4 in Row_sel_get_clust_rec_for_mysql::operator() (this=0x7fffe839a2d0, prebuilt=0x7ffec80c97a0, sec_index=0x7ffec049a2c0, rec=0x7fff060a008c "200",
thr=0x7ffec80c9f88, out_rec=0x7fffe839a310, offsets=0x7fffe839a2e8, offset_heap=0x7fffe839a2f0, vrow=0x0, mtr=0x7fffe8399d90)
at /home/mysql/soft/percona-server-5.7.29-32/storage/innobase/row/row0sel.cc:3763
#2 0x0000000001b00a94 in row_search_mvcc (buf=0x7ffec80c8a00 <incomplete sequence 375>, mode=PAGE_CUR_GE, prebuilt=0x7ffec80c97a0, match_mode=1, direction=0)
at /home/mysql/soft/percona-server-5.7.29-32/storage/innobase/row/row0sel.cc:6051
4、关于page的max trx id
我们上面多次提到二级索引page的max trx id,这个max trx id实际就是PAGE_MAX_TRX_ID,它位于page的offset 56后的8个字节,实际上这个值只会存在于二级索引上,主键没有这个值,我们可以看到如下:
代码语言:javascript复制#define PAGE_MAX_TRX_ID 18 /* highest id of a trx which may have modified
a record on the page; trx_id_t; defined only
in secondary indexes and in the insert buffer
tree */
[root@mgr2 test]# ./bcview testimp2.ibd 16 56 8
******************************************************************
This Tool Is Uesed For Find The Data In Binary format(Hexadecimal)
Usage:./bcview file blocksize offset cnt-bytes!
file: Is Your File Will To Find Data!
blocksize: Is N kb Block.Eg: 8 Is 8 Kb Blocksize(Oracle)!
Eg: 16 Is 16 Kb Blocksize(Innodb)!
offset:Is Every Block Offset Your Want Start!
cnt-bytes:Is After Offset,How Bytes Your Want Gets!
Edtor QQ:22389860!
Used gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)
******************************************************************
----Current file size is :0.125000 Mb
----Current use set blockszie is 16 Kb
----Current file name is testimp2.ibd
current block:00000000--Offset:00056--cnt bytes:08--data is:0021000000060000
current block:00000001--Offset:00056--cnt bytes:08--data is:0000000000000000
current block:00000002--Offset:00056--cnt bytes:08--data is:0001000000000000
current block:00000003--Offset:00056--cnt bytes:08--data is:0000000000000000(主键没有这个值)
current block:00000004--Offset:00056--cnt bytes:08--data is:0000000000016903(二级索引)
current block:00000005--Offset:00056--cnt bytes:08--data is:0000000000016924(二级索引)
每次每行更新后会更新这个值,如果大于则修改,小于则不变。函数page_update_max_trx_id中有如下片段
代码语言:javascript复制if (page_get_max_trx_id(buf_block_get_frame(block)) < trx_id) { //是否本次事务的trx id大于page的max trx id
page_set_max_trx_id(block, page_zip, trx_id, mtr);
}
四、关于加锁的阶段
我们一般锁需要加锁的都是DML语句和select for update这样的语句,这里将加锁分为数据查找和数据修改两个阶段。
- 对于select for update:
主键访问数据:访问主键判断是否存在隐式锁,然后加显示锁。二级索引访问数据(需要回表的情况):访问二级索引判断是否存在隐式锁,然后加显示锁,接着回表主键判断是否存在隐式锁,然后加显示锁。
- 对于update/delete:
主键访问修改数据:数据查找阶段主键判断是否存在隐式锁,然后加显示锁。数据修改阶段涉及到了其他二级索引,那么维护相应的二级索引加隐含锁。
二级索引访问修改数据:数据查找阶段二级索引判断是否存在隐式锁(可能需要回表判断),二级索引加显示锁,数据修改阶段回表修改主键数据加显示锁,然后维护各个二级索引(修改字段涉及的二级索引或者修改主键则包含全部二级索引)加隐式锁。
- 对于insert而言如果没有堵塞(插入印象锁和gap lock堵塞),那么始终为隐式锁。
注意这里我们看到了隐式锁,隐式锁不会占用row的结构体,因此在show engine innodb status里面是看不到的,除非有其他事务显示将其转换为显示锁。我们来做几个例子如下(REPEATABLE READ隔离级别):
代码语言:javascript复制表结构和数据
mysql> show create table testimp4 G
*************************** 1. row ***************************
Table: testimp4
Create Table: CREATE TABLE `testimp4` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`a` int(11) DEFAULT NULL,
`b` int(11) DEFAULT NULL,
`d` varchar(200) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `b` (`b`),
KEY `d` (`d`)
) ENGINE=InnoDB AUTO_INCREMENT=10000 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
mysql> select *from testimp4;
------ ------ ------ ------------------------------------
| id | a | b | d |
------ ------ ------ ------------------------------------
| 5 | 5 | 300 | NULL |
| 6 | 7000 | 7700 | 1124 |
| 11 | 7000 | 7700 | 1124 |
| 12 | 7000 | 7700 | 1124 |
| 13 | 2900 | 1800 | NULL |
| 14 | 2900 | 1800 | NULL |
| 1000 | 88 | 1499 | NULL |
| 4000 | 6000 | 5904 | iiiafsafasfihhhccccchhhigggofgo111 |
| 4001 | 7000 | 7700 | 1124454555 |
| 9999 | 9999 | 9999 | a |
------ ------ ------ ------------------------------------
10 rows in set (0.00 sec)
4.1 插入数据
代码语言:javascript复制begin;insert into testimp4 values(10000,10000,10000,'gp');(不提交)
TIME | S1 | S2 | S3 | S4 |
---|---|---|---|---|
T1 | begin;insert into testimp4 values(10000,10000,10000,'gp');(不提交) | |||
T2 | select * from testimp4 where id=10000 for update | |||
T3 | select * from testimp4 where b=10000 for update | |||
T4 | select * from testimp4 where d='a' for update |
# T1时刻S1锁状态:
---TRANSACTION 94487, ACTIVE 5 sec
1 lock struct(s), heap size 1160, 0 row lock(s), undo log entries 1
MySQL thread id 11, OS thread handle 140737089492736, query id 482 localhost root starting
show engine innodb status
TABLE LOCK table `test`.`testimp4` trx id 94487 lock mode IX
# T2时刻S1锁状态:
---TRANSACTION 94487, ACTIVE 271 sec
2 lock struct(s), heap size 1160, 1 row lock(s), undo log entries 1
MySQL thread id 11, OS thread handle 140737089492736, query id 484 localhost root starting
show engine innodb status
TABLE LOCK table `test`.`testimp4` trx id 94487 lock mode IX
RECORD LOCKS space id 501 page no 3 n bits 80 index PRIMARY of table `test`.`testimp4` trx id 94487 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 4; hex 80002710; asc ' ;;
1: len 6; hex 000000017117; asc q ;;
2: len 7; hex d0000002c40110; asc ;;
3: len 4; hex 80002710; asc ' ;;
4: len 4; hex 80002710; asc ' ;;
5: len 2; hex 6770; asc gp;;
# T3时刻S1锁状态:
---TRANSACTION 94487, ACTIVE 337 sec
3 lock struct(s), heap size 1160, 2 row lock(s), undo log entries 1
MySQL thread id 11, OS thread handle 140737089492736, query id 521 localhost root starting
show engine innodb status
TABLE LOCK table `test`.`testimp4` trx id 94487 lock mode IX
RECORD LOCKS space id 501 page no 3 n bits 80 index PRIMARY of table `test`.`testimp4` trx id 94487 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 4; hex 80002710; asc ' ;;
1: len 6; hex 000000017117; asc q ;;
2: len 7; hex d0000002c40110; asc ;;
3: len 4; hex 80002710; asc ' ;;
4: len 4; hex 80002710; asc ' ;;
5: len 2; hex 6770; asc gp;;
RECORD LOCKS space id 501 page no 4 n bits 80 index b of table `test`.`testimp4` trx id 94487 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 80002710; asc ' ;;
1: len 4; hex 80002710; asc ' ;;
# T4时刻S1锁状态:
---TRANSACTION 94487, ACTIVE 408 sec
4 lock struct(s), heap size 1160, 3 row lock(s), undo log entries 1
MySQL thread id 11, OS thread handle 140737089492736, query id 559 localhost root starting
show engine innodb status
TABLE LOCK table `test`.`testimp4` trx id 94487 lock mode IX
RECORD LOCKS space id 501 page no 3 n bits 80 index PRIMARY of table `test`.`testimp4` trx id 94487 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 4; hex 80002710; asc ' ;;
1: len 6; hex 000000017117; asc q ;;
2: len 7; hex d0000002c40110; asc ;;
3: len 4; hex 80002710; asc ' ;;
4: len 4; hex 80002710; asc ' ;;
5: len 2; hex 6770; asc gp;;
RECORD LOCKS space id 501 page no 4 n bits 80 index b of table `test`.`testimp4` trx id 94487 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 80002710; asc ' ;;
1: len 4; hex 80002710; asc ' ;;
RECORD LOCKS space id 501 page no 5 n bits 80 index d of table `test`.`testimp4` trx id 94487 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 2; hex 6770; asc gp;;
1: len 4; hex 80002710; asc ' ;;
实际上我们看到这里insert语句后主键和各个索引都上了隐含锁只是看不到,通过其他S2,S3,S4我们逐步把这些隐式锁转换为了显示锁。
4.2 delete语句通过主键删除数据
TIME | S1 | S2 | S3 |
---|---|---|---|
T1 | begin;delete from testimp4 where id=9999;(不提交) | ||
T2 | select * from testimp4 where b=9999 for update | ||
T3 | select * from testimp4 where d='a' for update; |
# T1时刻S1锁状态:
---TRANSACTION 94493, ACTIVE 3 sec
2 lock struct(s), heap size 1160, 1 row lock(s), undo log entries 1
MySQL thread id 11, OS thread handle 140737089492736, query id 567 localhost root
TABLE LOCK table `test`.`testimp4` trx id 94493 lock mode IX
RECORD LOCKS space id 501 page no 3 n bits 80 index PRIMARY of table `test`.`testimp4` trx id 94493 lock_mode X locks rec but not gap
Record lock, heap no 12 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
0: len 4; hex 8000270f; asc ' ;;
1: len 6; hex 00000001711d; asc q ;;
2: len 7; hex 550000003b071b; asc U ; ;;
3: len 4; hex 8000270f; asc ' ;;
4: len 4; hex 8000270f; asc ' ;;
5: len 1; hex 61; asc a;;
# T2时刻S1锁状态:
---TRANSACTION 94493, ACTIVE 112 sec
4 lock struct(s), heap size 1160, 3 row lock(s), undo log entries 1
MySQL thread id 11, OS thread handle 140737089492736, query id 567 localhost root
TABLE LOCK table `test`.`testimp4` trx id 94493 lock mode IX
RECORD LOCKS space id 501 page no 3 n bits 80 index PRIMARY of table `test`.`testimp4` trx id 94493 lock_mode X locks rec but not gap
Record lock, heap no 12 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
0: len 4; hex 8000270f; asc ' ;;
1: len 6; hex 00000001711d; asc q ;;
2: len 7; hex 550000003b071b; asc U ; ;;
3: len 4; hex 8000270f; asc ' ;;
4: len 4; hex 8000270f; asc ' ;;
5: len 1; hex 61; asc a;;
RECORD LOCKS space id 501 page no 4 n bits 80 index b of table `test`.`testimp4` trx id 94493 lock_mode X locks rec but not gap
Record lock, heap no 12 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 8000270f; asc ' ;;
1: len 4; hex 8000270f; asc ' ;;
# T3时刻S1锁状态:
---TRANSACTION 94493, ACTIVE 133 sec
4 lock struct(s), heap size 1160, 3 row lock(s), undo log entries 1
MySQL thread id 11, OS thread handle 140737089492736, query id 567 localhost root
TABLE LOCK table `test`.`testimp4` trx id 94493 lock mode IX
RECORD LOCKS space id 501 page no 3 n bits 80 index PRIMARY of table `test`.`testimp4` trx id 94493 lock_mode X locks rec but not gap
Record lock, heap no 12 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
0: len 4; hex 8000270f; asc ' ;;
1: len 6; hex 00000001711d; asc q ;;
2: len 7; hex 550000003b071b; asc U ; ;;
3: len 4; hex 8000270f; asc ' ;;
4: len 4; hex 8000270f; asc ' ;;
5: len 1; hex 61; asc a;;
RECORD LOCKS space id 501 page no 4 n bits 80 index b of table `test`.`testimp4` trx id 94493 lock_mode X locks rec but not gap
Record lock, heap no 12 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 8000270f; asc ' ;;
1: len 4; hex 8000270f; asc ' ;;
RECORD LOCKS space id 501 page no 5 n bits 80 index d of table `test`.`testimp4` trx id 94493 lock_mode X locks rec but not gap
Record lock, heap no 12 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 1; hex 61; asc a;;
1: len 4; hex 8000270f; asc ' ;;
实际上我们看到这里delete语句后,主键加了显示锁,这是因为数据查找阶段需要加显示锁,但是各个二级索引是由于维护而加的是隐式锁,我们通过S2,S3将其转换为了显示锁。
4.3 delete语句通过二级索引删除数据
TIME | S1 | S2 |
---|---|---|
T1 | begin;delete from testimp4 where b=9999;(不提交) | |
T2 | select * from testimp4 where d='a' for update |
#T1时刻S1锁状态:
---TRANSACTION 94501, ACTIVE 109 sec
3 lock struct(s), heap size 1160, 3 row lock(s), undo log entries 1
MySQL thread id 11, OS thread handle 140737089492736, query id 576 localhost root
TABLE LOCK table `test`.`testimp4` trx id 94501 lock mode IX
RECORD LOCKS space id 501 page no 4 n bits 80 index b of table `test`.`testimp4` trx id 94501 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
Record lock, heap no 12 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 8000270f; asc ' ;;
1: len 4; hex 8000270f; asc ' ;;
RECORD LOCKS space id 501 page no 3 n bits 80 index PRIMARY of table `test`.`testimp4` trx id 94501 lock_mode X locks rec but not gap
Record lock, heap no 12 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
0: len 4; hex 8000270f; asc ' ;;
1: len 6; hex 000000017125; asc q%;;
2: len 7; hex 5a0000002518ea; asc Z % ;;
3: len 4; hex 8000270f; asc ' ;;
4: len 4; hex 8000270f; asc ' ;;
5: len 1; hex 61; asc a;;
# T2时刻S1锁状态:
---TRANSACTION 94501, ACTIVE 119 sec
4 lock struct(s), heap size 1160, 4 row lock(s), undo log entries 1
MySQL thread id 11, OS thread handle 140737089492736, query id 576 localhost root
TABLE LOCK table `test`.`testimp4` trx id 94501 lock mode IX
RECORD LOCKS space id 501 page no 4 n bits 80 index b of table `test`.`testimp4` trx id 94501 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
Record lock, heap no 12 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 8000270f; asc ' ;;
1: len 4; hex 8000270f; asc ' ;;
RECORD LOCKS space id 501 page no 3 n bits 80 index PRIMARY of table `test`.`testimp4` trx id 94501 lock_mode X locks rec but not gap
Record lock, heap no 12 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
0: len 4; hex 8000270f; asc ' ;;
1: len 6; hex 000000017125; asc q%;;
2: len 7; hex 5a0000002518ea; asc Z % ;;
3: len 4; hex 8000270f; asc ' ;;
4: len 4; hex 8000270f; asc ' ;;
5: len 1; hex 61; asc a;;
RECORD LOCKS space id 501 page no 5 n bits 80 index d of table `test`.`testimp4` trx id 94501 lock_mode X locks rec but not gap
Record lock, heap no 12 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 1; hex 61; asc a;;
1: len 4; hex 8000270f; asc ' ;;
实际上我们看到这里delete语句后,显示二级索引加了显示锁,然后主键加了显示锁,这是因为数据查找阶段先查找的二级索引然后回表查的主键,但是对于二级索引d来讲是由于维护而加的是隐式锁,我们通过S2将其转换为了显示锁。
4.4 update语句通过主键修改数据
这里要特别注意一下,对于二级索引的更新通常是进行了删除和插入,因此这里有2行数据都有隐式锁
TIME | S1 | S2 | S3 |
---|---|---|---|
T1 | begin;update testimp4 set b=10000 where id=9999;(不提交) | ||
T2 | select * from testimp4 where b=9999 for update | ||
T3 | select * from testimp4 where b=10000 for update |
# T1时刻S1锁状态
---TRANSACTION 94553, ACTIVE 7 sec
2 lock struct(s), heap size 1160, 1 row lock(s), undo log entries 1
MySQL thread id 11, OS thread handle 140737089492736, query id 730 localhost root
TABLE LOCK table `test`.`testimp4` trx id 94553 lock mode IX
RECORD LOCKS space id 501 page no 3 n bits 80 index PRIMARY of table `test`.`testimp4` trx id 94553 lock_mode X locks rec but not gap
Record lock, heap no 12 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 4; hex 8000270f; asc ' ;;
1: len 6; hex 000000017159; asc qY;;
2: len 7; hex 770000002a187f; asc w * ;;
3: len 4; hex 8000270f; asc ' ;;
4: len 4; hex 80002710; asc ' ;;
5: len 1; hex 61; asc a;;
# T2时刻S1锁状态
---TRANSACTION 94553, ACTIVE 62 sec
3 lock struct(s), heap size 1160, 2 row lock(s), undo log entries 1
MySQL thread id 11, OS thread handle 140737089492736, query id 730 localhost root
TABLE LOCK table `test`.`testimp4` trx id 94553 lock mode IX
RECORD LOCKS space id 501 page no 3 n bits 80 index PRIMARY of table `test`.`testimp4` trx id 94553 lock_mode X locks rec but not gap
Record lock, heap no 12 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 4; hex 8000270f; asc ' ;;
1: len 6; hex 000000017159; asc qY;;
2: len 7; hex 770000002a187f; asc w * ;;
3: len 4; hex 8000270f; asc ' ;;
4: len 4; hex 80002710; asc ' ;;
5: len 1; hex 61; asc a;;
RECORD LOCKS space id 501 page no 4 n bits 80 index b of table `test`.`testimp4` trx id 94553 lock_mode X locks rec but not gap
Record lock, heap no 12 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 8000270f; asc ' ;;
1: len 4; hex 8000270f; asc ' ;;
# T3时刻S1锁状态
---TRANSACTION 94553, ACTIVE 128 sec
3 lock struct(s), heap size 1160, 3 row lock(s), undo log entries 1
MySQL thread id 11, OS thread handle 140737089492736, query id 730 localhost root
TABLE LOCK table `test`.`testimp4` trx id 94553 lock mode IX
RECORD LOCKS space id 501 page no 3 n bits 80 index PRIMARY of table `test`.`testimp4` trx id 94553 lock_mode X locks rec but not gap
Record lock, heap no 12 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 4; hex 8000270f; asc ' ;;
1: len 6; hex 000000017159; asc qY;;
2: len 7; hex 770000002a187f; asc w * ;;
3: len 4; hex 8000270f; asc ' ;;
4: len 4; hex 80002710; asc ' ;;
5: len 1; hex 61; asc a;;
RECORD LOCKS space id 501 page no 4 n bits 80 index b of table `test`.`testimp4` trx id 94553 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 80002710; asc ' ;;
1: len 4; hex 8000270f; asc ' ;;
Record lock, heap no 12 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 8000270f; asc ' ;;
1: len 4; hex 8000270f; asc ' ;;
这里由于对表的二级索引b通过主键进行了修改,那么二级索引包含了2条数据,一条标记为del flag,另外一条为插入如下:
代码语言:javascript复制(11) normal record offset:266 heapno:12 n_owned 0,delflag:Y minflag:0 rectype:0
(12) normal record offset:126 heapno:2 n_owned 0,delflag:N minflag:0 rectype:0
(13) SUPREMUM record offset:112 heapno:1 n_owned 8,delflag:N minflag:0 rectype:3
因此这两行都上了隐式锁,这是由于二级索引维护而上的,值得注意的是二级索引d不会上隐式锁,因为update语句的修改不会涉及到d列索引,因此不会维护。如果查询d列上的值(for update),会获取d列上的锁成功,然后会堵塞在主键id上如下:
代码语言:javascript复制---TRANSACTION 94565, ACTIVE 4 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1160, 2 row lock(s)
MySQL thread id 16, OS thread handle 140737086228224, query id 748 localhost root Sending data
select * from testimp4 where d='a' for update
------- TRX HAS BEEN WAITING 4 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 501 page no 3 n bits 80 index PRIMARY of table `test`.`testimp4` trx id 94565 lock_mode X locks rec but not gap waiting
Record lock, heap no 12 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 4; hex 8000270f; asc ' ;;
1: len 6; hex 000000017161; asc qa;;
2: len 7; hex 7c0000002d25eb; asc | -% ;;
3: len 4; hex 8000270f; asc ' ;;
4: len 4; hex 80002710; asc ' ;;
5: len 1; hex 61; asc a;;
------------------
TABLE LOCK table `test`.`testimp4` trx id 94565 lock mode IX
RECORD LOCKS space id 501 page no 5 n bits 80 index d of table `test`.`testimp4` trx id 94565 lock_mode X
Record lock, heap no 12 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 1; hex 61; asc a;;
1: len 4; hex 8000270f; asc ' ;;
RECORD LOCKS space id 501 page no 3 n bits 80 index PRIMARY of table `test`.`testimp4` trx id 94565 lock_mode X locks rec but not gap waiting
Record lock, heap no 12 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 4; hex 8000270f; asc ' ;;
1: len 6; hex 000000017161; asc qa;;
2: len 7; hex 7c0000002d25eb; asc | -% ;;
3: len 4; hex 8000270f; asc ' ;;
4: len 4; hex 80002710; asc ' ;;
5: len 1; hex 61; asc a;;
情况还有很多不在一一列举,Innodb行锁一直都是一个令人头疼的问题。
五、关于锁的判定
5.1 lock_sec_rec_read_check_and_lock函数
主要用于二级索引数据查找段阶段加显示锁,,对于update/delete而言,首先是需要找到需要修改的数据,加锁前需要判断本记录是否存在隐式锁,由于二级索引行数据不包含trx id,因此先用page的max trx id和当前活跃的最小读写事务进行比对判断,如果大于等于则可能存在显示锁,然后需要回表通过主键进行精细化判断。而精细化回表判断行是否存在隐式锁,那么代价就比较大了,因此这需要一个判断流程如下
代码语言:javascript复制lock_sec_rec_read_check_and_lock:
if ((page_get_max_trx_id(block->frame) >= trx_rw_min_trx_id()
|| recv_recovery_is_on())
&& !page_rec_is_supremum(rec)) {
lock_rec_convert_impl_to_expl(block, rec, index, offsets);//如果符合前面的条件才调入 lock_rec_convert_impl_to_expl
}
如下调入:
代码语言:javascript复制 ->lock_rec_convert_impl_to_expl
->lock_sec_rec_some_has_impl
->row_vers_impl_x_locked 此处会进行聚集索引的回表,同样是通过二级索引进行定位返回btr_cur_search_to_nth_level
->row_vers_impl_x_locked_low 最后会调入 row_vers_impl_x_locked_low函数进行核心判断
栈如下:
代码语言:javascript复制#0 row_vers_impl_x_locked_low (clust_rec=0x7fff39a21226 "200", clust_index=0x7ffeb5092680, rec=0x7fff39a2ac30 "200", index=0x7ffeb5093610, offsets=0x7fffe8461730, mtr=0x7fffe8460e90)
at /home/mysql/soft/percona-server-5.7.29-32/storage/innobase/row/row0vers.cc:101
#1 0x0000000001b2c84e in row_vers_impl_x_locked (rec=0x7fff39a2ac30 "200", index=0x7ffeb5093610, offsets=0x7fffe8461730)
at /home/mysql/soft/percona-server-5.7.29-32/storage/innobase/row/row0vers.cc:390
#2 0x00000000019e8448 in lock_sec_rec_some_has_impl (rec=0x7fff39a2ac30 "200", index=0x7ffeb5093610, offsets=0x7fffe8461730)
at /home/mysql/soft/percona-server-5.7.29-32/storage/innobase/lock/lock0lock.cc:1276
#3 0x00000000019f339a in lock_rec_convert_impl_to_expl (block=0x7fff38d94ca0, rec=0x7fff39a2ac30 "200", index=0x7ffeb5093610, offsets=0x7fffe8461730)
at /home/mysql/soft/percona-server-5.7.29-32/storage/innobase/lock/lock0lock.cc:6124
#4 0x00000000019f3dd2 in lock_sec_rec_read_check_and_lock (flags=0, block=0x7fff38d94ca0, rec=0x7fff39a2ac30 "200", index=0x7ffeb5093610, offsets=0x7fffe8461730, mode=LOCK_X,
gap_mode=1024, thr=0x7ffeb4c89358) at /home/mysql/soft/percona-server-5.7.29-32/storage/innobase/lock/lock0lock.cc:6357
#5 0x0000000001af7271 in sel_set_rec_lock (pcur=0x7ffeb4c887d8, rec=0x7fff39a2ac30 "200", index=0x7ffeb5093610, offsets=0x7fffe8461730, mode=3, type=1024, thr=0x7ffeb4c89358,
mtr=0x7fffe8461a50) at /home/mysql/soft/percona-server-5.7.29-32/storage/innobase/row/row0sel.cc:1278
#6 0x0000000001b00049 in row_search_mvcc (buf=0x7ffeb4977070 "370211 37", mode=PAGE_CUR_GE, prebuilt=0x7ffeb4c885b0, match_mode=1, direction=1)
at /home/mysql/soft/percona-server-5.7.29-32/storage/innobase/row/row0sel.cc:5710
但是需要注意的是,max trx id只会在二级索引上更新,并且每次更新一行都会更新掉,那么引起的一个问题就是如果连续多次删除同一个二级索引上的记录**(delete from testimp4 where b=7700;),除第一次以外都会调入row_vers_impl_x_locked_low这个函数,因为这是查询一行加锁一行修改一行(每行都会修改page的max trx id)的。但是update却不同,update如果修改本二级索引的值一般会进入(如:update testimp4 set b=1500 where b=1800;)Searching rows for update状态**,先建立一个临时文件来先存储需要更改的行记录,然后进行批量更改进入updating状态,那么则不会出现这种问题,因为这是在数据查找阶段进行的判断,而不是数据修改阶段。又比如**(如:update testimp2 set c='a' where b=1800)这样的语句也不会触发,这是因为b索引的行记录一直没有改变,因此不会修改b索引page的max trx id。因此update很好的规避了这个问题不会频繁的进入函数row_vers_impl_x_locked_low**进行判定,但是delete却不行。
关于row_vers_impl_x_locked_low函数对于二级索引是否存在隐式锁的判定,比较复杂分为好多种情况,不再描述。因此最开始我们看到的问题,这个过程已经进入了row_vers_impl_x_locked_low函数,那么可以判断这个delete语句可能更新了多行(但是从代码行数上判断不是这种情况),或者有可能本语句事务做过修改本语句修改记录的其他语句,需要进行精细化判断。
5.2 lock_sec_rec_modify_check_and_lock
主要用于数据修改阶段加隐式锁,二级索引由于行数据的修改(update修改了本二级索引字包含段值或者尾部的主键)而被动维护的加锁。注意如果是select for update where条件是主键则不会加判断二级索引是否包含隐含锁,如果出现冲突会堵塞在主键上。
5.3 lock_clust_rec_read_check_and_lock
数据查找阶段加显示锁,主要用于主键查找数据加显示锁或者二级索引访问后的回表主键加显示锁,加锁前需要判断是否存在隐含锁。由于主键行中包含了trx id伪列,因此可以简单的用本行trx id的事务是否还活跃进行判定了,这个过程代价很小,因此每行加锁总是会有这个过程,也就是每次都会调用lock_rec_convert_impl_to_expl函数进行判断,如下:
代码语言:javascript复制lock_clust_rec_read_check_and_lock
->lock_rec_convert_impl_to_expl
->lock_clust_rec_some_has_impl (主键判断非常简单)
栈如下:
代码语言:javascript复制#0 lock_clust_rec_some_has_impl (rec=0x7fff05ad40db "200", index=0x7ffe8802ce70, offsets=0x7fffe8461660)
at /home/mysql/soft/percona-server-5.7.29-32/storage/innobase/include/lock0priv.ic:69
#1 0x00000000019f3333 in lock_rec_convert_impl_to_expl (block=0x7fff050a0950, rec=0x7fff05ad40db "200", index=0x7ffe8802ce70, offsets=0x7fffe8461660)
at /home/mysql/soft/percona-server-5.7.29-32/storage/innobase/lock/lock0lock.cc:6118
#2 0x00000000019f418d in lock_clust_rec_read_check_and_lock (flags=0, block=0x7fff050a0950, rec=0x7fff05ad40db "200", index=0x7ffe8802ce70, offsets=0x7fffe8461660, mode=LOCK_X,
gap_mode=1024, thr=0x7ffeb49903c8) at /home/mysql/soft/percona-server-5.7.29-32/storage/innobase/lock/lock0lock.cc:6430
#3 0x0000000001af7193 in sel_set_rec_lock (pcur=0x7ffeb498fe38, rec=0x7fff05ad40db "200", index=0x7ffe8802ce70, offsets=0x7fffe8461660, mode=3, type=1024, thr=0x7ffeb49903c8,
mtr=0x7fffe8461980) at /home/mysql/soft/percona-server-5.7.29-32/storage/innobase/row/row0sel.cc:1263
#4 0x0000000001b00049 in row_search_mvcc (buf=0x7ffeb498f380 "371 05", mode=PAGE_CUR_GE, prebuilt=0x7ffeb498fc10, match_mode=1, direction=0)
at /home/mysql/soft/percona-server-5.7.29-32/storage/innobase/row/row0sel.cc:5710
5.4 lock_clust_rec_modify_check_and_lock
主键数据修改阶段加隐式锁,当前发现为在直接update主键值或者delete操作的时候,但是这种情况下实际上主键已经在数据查询阶段加了显示锁。
六、update不完全等同于delete&&insert
直接区分如下:
- 主键更新,接口row_upd_clust_step
row_upd_changes_ord_field_binary 判断是否更新了聚集索引的值
如果更新了
-> row_upd_clust_rec_by_insert 进行主键删除插入(设置del flag)
如果没有更新
->row_upd_clust_rec
->btr_cur_optimistic_update 只考虑乐观update
->row_upd_changes_field_size_or_external 判断新记录是否超过本行现有大小
如果否
->btr_cur_update_in_place 原地更新
如果是
->page_cur_delete_rec 则需要进行主键删除(实际删除非设置del falg)
->btr_cur_insert_if_possible 插入
- 二级索引更新,接口row_upd_sec_step 始终为删除插入(设置del flag)
七、关于History list length 的单位
实际上History list length 就是当一个update undo log (非insert)的计数器,一个事务只有一个undo log 。来源为trx_sys->rseg_history_len,这个值会在事务提交的时候更新,无论事务大小。但是由于很多内部事务的存在,这个值会远大于可观测的事务个数。栈如下:
代码语言:javascript复制#0 trx_purge_add_update_undo_to_history (trx=0x7fffeac7df50, undo_ptr=0x7fffeac7e370, undo_page=0x7fff2837c000 "373252223T", update_rseg_history_len=true, n_added_logs=1,
mtr=0x7fffe8399830) at /home/mysql/soft/percona-server-5.7.29-32/storage/innobase/trx/trx0purge.cc:354
#1 0x0000000001b9c064 in trx_undo_update_cleanup (trx=0x7fffeac7df50, undo_ptr=0x7fffeac7e370, undo_page=0x7fff2837c000 "373252223T", update_rseg_history_len=true, n_added_logs=1,
mtr=0x7fffe8399830) at /home/mysql/soft/percona-server-5.7.29-32/storage/innobase/trx/trx0undo.cc:1970
#2 0x0000000001b8b639 in trx_write_serialisation_history (trx=0x7fffeac7df50, mtr=0x7fffe8399830) at /home/mysql/soft/percona-server-5.7.29-32/storage/innobase/trx/trx0trx.cc:1684
#3 0x0000000001b8c9b0 in trx_commit_low (trx=0x7fffeac7df50, mtr=0x7fffe8399830) at /home/mysql/soft/percona-server-5.7.29-32/storage/innobase/trx/trx0trx.cc:2184
到这里,杂七杂八记录了一大堆,记录于此以备后用。
附录1函数接口
1、read view
- MVCC::view_open:建立read view
- ReadView::prepare:准备read view中的值
- ReadView::complete:写入read view中的值
- MVCC::view_close:释放read view
2、可见性判断
- lock_clust_rec_cons_read_sees:主键可见性判断
- lock_sec_rec_cons_read_sees:二级索引可见性判断
附录 2具体函数
1、read view
代码语言:javascript复制 /** The read should not see any transaction with trx id >= this
value. In other words, this is the "high water mark". */
trx_id_t m_low_limit_id;
/** The read should see all trx ids which are strictly
smaller (<) than this value. In other words, this is the
low water mark". */
trx_id_t m_up_limit_id;
/** trx id of creating transaction, set to TRX_ID_MAX for free
views. */
trx_id_t m_creator_trx_id;
/** Set of RW transactions that was active when this snapshot
was taken */
ids_t m_ids;
/** The view does not need to see the undo logs for transactions
whose transaction number is strictly smaller (<) than this value:
they can be removed in purge if not needed by other views */
trx_id_t m_low_limit_no;
代码语言:javascript复制void
ReadView::prepare(trx_id_t id)
{
ut_ad(!m_cloned);
ut_ad(mutex_own(&trx_sys->mutex));
m_creator_trx_id = id;
m_low_limit_no = m_low_limit_id = trx_sys->max_trx_id;
if (!trx_sys->rw_trx_ids.empty()) {
copy_trx_ids(trx_sys->rw_trx_ids);
} else {
m_ids.clear();
}
if (UT_LIST_GET_LEN(trx_sys->serialisation_list) > 0) {
const trx_t* trx;
trx = UT_LIST_GET_FIRST(trx_sys->serialisation_list);
if (trx->no < m_low_limit_no) {
m_low_limit_no = trx->no;
}
}
}
void
ReadView::complete()
{
ut_ad(!m_cloned);
/* The first active transaction has the smallest id. */
m_up_limit_id = !m_ids.empty() ? m_ids.front() : m_low_limit_id;
ut_ad(m_up_limit_id <= m_low_limit_id);
m_closed = false;
}
2、可见性判断
代码语言:javascript复制二级索引回表判断可见性
Row_sel_get_clust_rec_for_mysql::operator()
->lock_clust_rec_cons_read_sees (回表后根据主键判断其可见性)
->row_sel_build_prev_vers_for_mysql(构建前版本)
->row_vers_build_for_consistent_read
本函数循环构建,直到条件满足,或者前版本为NULL
if (view->changes_visible(trx_id, index->table->name)) {
/* The view already sees this version: we can copy
it to in_heap and return */
buf = static_cast<byte*>(
mem_heap_alloc(
in_heap, rec_offs_size(*offsets)));
*old_vers = rec_copy(buf, prev_version, *offsets);
rec_offs_make_valid(*old_vers, index, *offsets);
if (vrow && *vrow) {
*vrow = dtuple_copy(*vrow, in_heap);
dtuple_dup_v_fld(*vrow, in_heap);
}
break;
最终会将前版本的主键值根据需求取字段返回给MySQL层
关于using index 也需要回表流程
代码语言:javascript复制row_search_mvcc:
if (!srv_read_only_mode
&& !lock_sec_rec_cons_read_sees( // 如果二级索引记录判断为不可见
rec, index, trx->read_view)) {
/* We should look at the clustered index.
However, as this is a non-locking read,
we can skip the clustered index lookup if
the condition does not match the secondary
index entry. */
switch (row_search_idx_cond_check(
buf, prebuilt, rec, offsets)) {
case ICP_NO_MATCH:
goto next_rec;
case ICP_OUT_OF_RANGE:
err = DB_RECORD_NOT_FOUND;
goto idx_cond_failed;
case ICP_MATCH:
goto requires_clust_rec; //走这里就进入了回表判断流程
}
lock_sec_rec_cons_read_sees:
trx_id_t max_trx_id = page_get_max_trx_id(page_align(rec));//获取页的max trx id
ut_ad(max_trx_id > 0);
return(view->sees(max_trx_id));
全文完。