阅读顺序
《Postgresql源码(30)Postgresql索引基础B-linked-tree》
《Postgresql源码(31)Btree索引相关系统表和整体结构》
《Postgresql源码(32)Btree索引分裂前后结构差异对比》
《Postgresql源码(33)Btree索引读——整体流程&_bt_first》
《Postgresql源码(34)Btree索引读——_bt_first搜索部分分析》
《Postgresql源码(36)Btree索引读——_bt_next搜索部分分析》
总结
分析流程在后面,这里总结便于查询
场景一:root分裂为branch的前后对比(level1–>2)
《level 1 到 2 的分裂过程》
场景二:root分裂为leaf的前后对比(level0–>1)
《level 0 到 1 的分裂过程》
- 分裂前后对比,root能保存407个元素,小于这个数字可以不需要leaf,root直接做leaf(level=0)
- 超出后root提升到level=1,两个leaf可以保存最大容量的90%(366个)。第一个leaf存到90%,第二个leaf保存剩下的10%和新值。
这里也解决了一个疑问,为什么新索引root节点经常在第三个页面:因为root永远在最后构造,第一次分裂leaf占据1、2页面,root就在第三位了。
场景三:leaf分裂前后对比(level1–>1)
leaf页面可以保存407个元素,分裂后保存90%的元素数量366,剩下的10%搬移到新leaf中。
表
代码语言:javascript复制create table t0(id int primary key, info text);
insert into t0 select generate_series(1,10000), md5(random()::text);
postgres=# d t4
Table "public.t4"
Column | Type | Collation | Nullable | Default
-------- --------- ----------- ---------- ---------
id | integer | | not null |
info | text | | |
Indexes:
"t4_pkey" PRIMARY KEY, btree (id)
10000条integer索引结构
代码语言:javascript复制postgres=# insert into t4 select generate_series(1,10000), md5(random()::text);
(1)meta
level = 1 只有一层leaf
代码语言:javascript复制postgres=# select * from bt_metap('t4_pkey');
magic | version | root | level | fastroot | fastlevel
-------- --------- ------ ------- ---------- -----------
340322 | 2 | 3 | 1 | 3 | 1
(2)root
btpo_flags = 2 说明是root btpo = 1 说明在第一层,0层是leaf
代码语言:javascript复制postgres=# select * from bt_page_stats('t4_pkey',3);
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
------- ------ ------------ ------------ --------------- ----------- ----------- ----------- ----------- ------ ------------
3 | r | 28 | 0 | 15 | 8192 | 7596 | 0 | 0 | 1 | 2
root指向28个leaf页面
代码语言:javascript复制postgres=# select * from bt_page_items('t4_pkey', 3);
itemoffset | ctid | itemlen | nulls | vars | data
------------ -------- --------- ------- ------ -------------------------
1 | (1,1) | 8 | f | f |
2 | (2,1) | 16 | f | f | 6f 01 00 00 00 00 00 00
3 | (4,1) | 16 | f | f | dd 02 00 00 00 00 00 00
4 | (5,1) | 16 | f | f | 4b 04 00 00 00 00 00 00
5 | (6,1) | 16 | f | f | b9 05 00 00 00 00 00 00
6 | (7,1) | 16 | f | f | 27 07 00 00 00 00 00 00
7 | (8,1) | 16 | f | f | 95 08 00 00 00 00 00 00
8 | (9,1) | 16 | f | f | 03 0a 00 00 00 00 00 00
9 | (10,1) | 16 | f | f | 71 0b 00 00 00 00 00 00
10 | (11,1) | 16 | f | f | df 0c 00 00 00 00 00 00
11 | (12,1) | 16 | f | f | 4d 0e 00 00 00 00 00 00
12 | (13,1) | 16 | f | f | bb 0f 00 00 00 00 00 00
13 | (14,1) | 16 | f | f | 29 11 00 00 00 00 00 00
14 | (15,1) | 16 | f | f | 97 12 00 00 00 00 00 00
15 | (16,1) | 16 | f | f | 05 14 00 00 00 00 00 00
16 | (17,1) | 16 | f | f | 73 15 00 00 00 00 00 00
17 | (18,1) | 16 | f | f | e1 16 00 00 00 00 00 00
18 | (19,1) | 16 | f | f | 4f 18 00 00 00 00 00 00
19 | (20,1) | 16 | f | f | bd 19 00 00 00 00 00 00
20 | (21,1) | 16 | f | f | 2b 1b 00 00 00 00 00 00
21 | (22,1) | 16 | f | f | 99 1c 00 00 00 00 00 00
22 | (23,1) | 16 | f | f | 07 1e 00 00 00 00 00 00
23 | (24,1) | 16 | f | f | 75 1f 00 00 00 00 00 00
24 | (25,1) | 16 | f | f | e3 20 00 00 00 00 00 00
25 | (26,1) | 16 | f | f | 51 22 00 00 00 00 00 00
26 | (27,1) | 16 | f | f | bf 23 00 00 00 00 00 00
27 | (28,1) | 16 | f | f | 2d 25 00 00 00 00 00 00
28 | (29,1) | 16 | f | f | 9b 26 00 00 00 00 00 00
(3)branch
无
(4)leaf
一个leaf页面存367-1=366条数据
代码语言:javascript复制postgres=# select * from bt_page_items('t4_pkey', 1);
itemoffset | ctid | itemlen | nulls | vars | data
------------ --------- --------- ------- ------ -------------------------
1 | (3,7) | 16 | f | f | 6f 01 00 00 00 00 00 00
2 | (0,1) | 16 | f | f | 01 00 00 00 00 00 00 00
3 | (0,2) | 16 | f | f | 02 00 00 00 00 00 00 00
4 | (0,3) | 16 | f | f | 03 00 00 00 00 00 00 00
5 | (0,4) | 16 | f | f | 04 00 00 00 00 00 00 00
6 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00
7 | (0,6) | 16 | f | f | 06 00 00 00 00 00 00 00
8 | (0,7) | 16 | f | f | 07 00 00 00 00 00 00 00
9 | (0,8) | 16 | f | f | 08 00 00 00 00 00 00 00
...
360 | (2,119) | 16 | f | f | 67 01 00 00 00 00 00 00
361 | (2,120) | 16 | f | f | 68 01 00 00 00 00 00 00
362 | (3,1) | 16 | f | f | 69 01 00 00 00 00 00 00
363 | (3,2) | 16 | f | f | 6a 01 00 00 00 00 00 00
364 | (3,3) | 16 | f | f | 6b 01 00 00 00 00 00 00
365 | (3,4) | 16 | f | f | 6c 01 00 00 00 00 00 00
366 | (3,5) | 16 | f | f | 6d 01 00 00 00 00 00 00
367 | (3,6) | 16 | f | f | 6e 01 00 00 00 00 00 00
从root页面的数据来看,有28个leaf页面,前27个页面每个页面保存366条,第28个页面保存118行数据,正好10000条。
代码语言:javascript复制27 x 366 118 = 10000
20000条integer索引结构
10000到20000页面结构的变化
代码语言:javascript复制-- 10000条
root(3)
|
leaf(1) | leaf(2) | leaf(4) | leaf(5) | leaf(6) | ... | leaf(29)
-- 20000条
root(412) (btpo_flags=2)
|
branch(3) | beanch(411) (btpo_flags=0)
|
leaf(287) | leaf(1) | leaf(2) | ... | leaf(286) || leaf(287) | leaf(288) | ... | leaf(550) (btpo_flags=1)
-- root分裂的两个branch元素数量:285个、262个
场景一:root分裂为branch的前后对比(level1–>2)
149370条记录时level从1到2,root分裂成branch。看下分裂前后树的情况对比。
代码语言:javascript复制create table t7(id int primary key, info text);
truncate t7;
insert into t7 select generate_series(1,149369), md5(random()::text);
select * from bt_metap('t7_pkey');
magic | version | root | level | fastroot | fastlevel
-------- --------- ------ ------- ---------- -----------
340322 | 2 | 3 | 1 | 3 | 1
create table t8(id int primary key, info text);
truncate t8;
insert into t8 select generate_series(1,149370), md5(random()::text);
select * from bt_metap('t8_pkey');
magic | version | root | level | fastroot | fastlevel
-------- --------- ------ ------- ---------- -----------
340322 | 2 | 412 | 2 | 412 | 2
--------
postgres=# select * from bt_page_stats('t7_pkey',3);
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
------- ------ ------------ ------------ --------------- ----------- ----------- ----------- ----------- ------ ------------
3 | r | 408 | 0 | 15 | 8192 | 0 | 0 | 0 | 1 | 2
(1 row)
postgres=# select * from bt_page_stats('t8_pkey',412);
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
------- ------ ------------ ------------ --------------- ----------- ----------- ----------- ----------- ------ ------------
412 | r | 2 | 0 | 12 | 8192 | 8116 | 0 | 0 | 2 | 2
postgres=# select * from bt_page_items('t7_pkey', 3);
itemoffset | ctid | itemlen | nulls | vars | data
------------ --------- --------- ------- ------ -------------------------
1 | (1,1) | 8 | f | f |
2 | (2,1) | 16 | f | f | 6f 01 00 00 00 00 00 00
3 | (4,1) | 16 | f | f | dd 02 00 00 00 00 00 00
4 | (5,1) | 16 | f | f | 4b 04 00 00 00 00 00 00
5 | (6,1) | 16 | f | f | b9 05 00 00 00 00 00 00
6 | (7,1) | 16 | f | f | 27 07 00 00 00 00 00 00
7 | (8,1) | 16 | f | f | 95 08 00 00 00 00 00 00
8 | (9,1) | 16 | f | f | 03 0a 00 00 00 00 00 00
9 | (10,1) | 16 | f | f | 71 0b 00 00 00 00 00 00
10 | (11,1) | 16 | f | f | df 0c 00 00 00 00 00 00
...
403 | (404,1) | 16 | f | f | bd 3e 02 00 00 00 00 00
404 | (405,1) | 16 | f | f | 2b 40 02 00 00 00 00 00
405 | (406,1) | 16 | f | f | 99 41 02 00 00 00 00 00
406 | (407,1) | 16 | f | f | 07 43 02 00 00 00 00 00
407 | (408,1) | 16 | f | f | 75 44 02 00 00 00 00 00
408 | (409,1) | 16 | f | f | e3 45 02 00 00 00 00 00
postgres=# select * from bt_page_items('t8_pkey', 412);
itemoffset | ctid | itemlen | nulls | vars | data
------------ --------- --------- ------- ------ -------------------------
1 | (3,1) | 8 | f | f |
2 | (411,1) | 16 | f | f | 77 97 01 00 00 00 00 00
结构差异
场景二:root分裂为leaf的前后对比(level0–>1)
代码语言:javascript复制create table tx1(id int primary key, info text);
truncate tx1;
insert into tx1 select generate_series(1,407), md5(random()::text);
select * from bt_metap('tx1_pkey');
magic | version | root | level | fastroot | fastlevel
-------- --------- ------ ------- ---------- -----------
340322 | 2 | 1 | 0 | 1 | 0
select * from bt_page_items('tx1_pkey', 1);
create table tx2(id int primary key, info text);
truncate tx2;
insert into tx2 select generate_series(1,408), md5(random()::text);
select * from bt_metap('tx2_pkey');
magic | version | root | level | fastroot | fastlevel
-------- --------- ------ ------- ---------- -----------
340322 | 2 | 3 | 1 | 3 | 1
select * from bt_page_items('tx2_pkey', 3);
分裂前后对比,root能保存407个int元素,小于这个数字可以不需要leaf,root的就是level=0。
超出后root提升到level=1,两个leaf保存366个int元素。
这里也解决了一个疑问,为什么root节点经常在第三个页面。
场景三:leaf分裂前后对比(level1–>1)
代码语言:javascript复制create table td1(id int primary key, info text);
truncate td1;
insert into td1 select generate_series(1,773), md5(random()::text);
select * from bt_page_items('td1_pkey', 3);
create table td2(id int primary key, info text);
truncate td2;
insert into td2 select generate_series(1,774), md5(random()::text);
select * from bt_page_items('td2_pkey', 3);