Postgresql源码(31)Btree索引相关系统表和整体结构

2022-07-14 13:44:54 浏览数 (1)

0 总结

总结

  • leaf page
    • 0级页面(最底层页面)有右兄弟的话,第一条是右兄弟的第一条(应该是本页的最后一条,注意这条不属于本页,是下一页最小值得复制)注意:0级页面的最左页面也有这个copy
    • 0级页面(最底层页面)没有右兄弟的话,第一条就是起始数据
  • root / branch
    • root/branch页面的data存的是指向页面的最小值(正常是第二条,第一条一般是指向页面的右兄弟的第二条“当前页面最小值”的copy)。特殊的是root/branch指向最左页面的data不记录信息。
  • 怎么找到中间节点(branch/root)的最左和最右?
    • 最左页面的data是空的
    • 最右节点的指向的leaf的第一条就是当前页面的最小值(其他页面第一条保存右sibling最小值的copy)
代码语言:javascript复制
--------
| meta |
--------
| root |
--------
| left branch | branch | branch | ..... | branch |
--------------------------------------------------------
| leaf | leaf | leaf | leaf | leaf | right leaf   |
--------------------------------------------------------

! left branch的data为空
! branch的data是指向页面的最小值
! right leaf的第一条是当前页面的最小值
! leaf的第一条是右sibling的最小值得copy

1 相关系统表

支持哪些索引类型?

代码语言:javascript复制
postgres=# d pg_am
               Table "pg_catalog.pg_am"
  Column   |  Type   | Collation | Nullable | Default 
----------- --------- ----------- ---------- ---------
 amname    | name    |           | not null | 
 amhandler | regproc |           | not null | 
 amtype    | "char"  |           | not null |
 
postgres=# select * from pg_am;
 amname |  amhandler  | amtype 
-------- ------------- --------
 btree  | bthandler   | i
 hash   | hashhandler | i
 gist   | gisthandler | i
 gin    | ginhandler  | i
 spgist | spghandler  | i
 brin   | brinhandler | i

ps. 9.6前这张系统表有很多method的属性信息,新版本这些属性保存在C代码中了。

amhandler的相关函数在pg_proc中,都是C函数。

代码语言:javascript复制
select proname, prosrc from pg_proc where oid in (330,331,332,333,334,335);
   proname   |   prosrc    
------------- -------------
 bthandler   | bthandler
 hashhandler | hashhandler
 gisthandler | gisthandler
 ginhandler  | ginhandler
 spghandler  | spghandler
 brinhandler | brinhandler

例子:

代码语言:javascript复制
postgres=# d t2
                 Table "public.t2"
 Column |  Type   | Collation | Nullable | Default 
-------- --------- ----------- ---------- ---------
 id     | integer |           | not null | 
 info   | integer |           |          | 
Indexes:
    "pk_t2_a" PRIMARY KEY, btree (id)

系统表含义:

代码语言:javascript复制
postgres=# select * from pg_index where indexrelid='pk_t2_a'::regclass;
-[ RECORD 1 ]-- -------
indexrelid     | 126817
indrelid       | 126814
indnatts       | 1          -- 列数
indisunique    | t          -- 唯一索引
indisprimary   | t          -- 主键索引
indisexclusion | f          -- 支持exclusion constraint
indimmediate   | t          -- 强制每次insert时检查唯一性
indisclustered | f          -- 根据这个索引做了聚簇
indisvalid     | t          -- 是否用于查询
indcheckxmin   | f          -- 查询时需要检查索引元组xmin<事务的xmin才能用
indisready     | t          -- 索引是否可以进行插入、更新
indislive      | t          -- f表示索引正在被drop,其他操作应该忽略这个索引
indisreplident | f          -- ALTER TABLE ... REPLICA IDENTITY USING INDEX
indkey         | 1          -- 用哪几个列建的
indcollation   | 0
indclass       | 1978
indoption      | 0
indexprs       | 
indpred        | 

-- indisreplident订阅了update和delete,需要指定REPLICA IDENTITY才能执行更新删除

postgres=# select oid,* from pg_class order by oid desc limit 1;
-[ RECORD 1 ]------- --------
oid                 | 126817
relname             | pk_t2_a
relnamespace        | 2200
reltype             | 0
reloftype           | 0
relowner            | 10
relam               | 403
relfilenode         | 126817
reltablespace       | 0
relpages            | 1
reltuples           | 0
relallvisible       | 0
reltoastrelid       | 0
relhasindex         | f
relisshared         | f
relpersistence      | p
relkind             | i
relnatts            | 1
relchecks           | 0
relhasoids          | f
relhaspkey          | f
relhasrules         | f
relhastriggers      | f
relhassubclass      | f
relrowsecurity      | f
relforcerowsecurity | f
relispopulated      | t
relreplident        | n
relispartition      | f
relfrozenxid        | 0
relminmxid          | 0
relacl              | 
reloptions          | 
relpartbound        |

2 数据结构

代码语言:javascript复制
|-----------------------------------------------------|
| linp0(page header) | linp1 | linp2 |                |
|-----------------------------------------------------|
|                                                     |
|                                                     |
|-----------------------------------------------------|
|      | itup2 | itup1 | special space(BTMetaPageData)|
|-----------------------------------------------------|

BTPageOpaqueData在special space中,维护B-linked-tree的结构

代码语言:javascript复制
typedef struct BTPageOpaqueData
{
	BlockNumber btpo_prev;		/* 左兄弟的块号,便于反向扫描 */
	BlockNumber btpo_next;		/* 右兄弟的块号,正向扫描*/
	union
	{
		uint32		level;		/* 在索引树中的层级level*/
		TransactionId xact;		/* 删除无用的ID */
	}			btpo;
	uint16		btpo_flags;		/* 页面类型 */
	BTCycleId	btpo_cycleid;	/* vacuum cycle ID of latest split */
} BTPageOpaqueData;


/* Bits defined in btpo_flags */
#define BTP_LEAF		(1 << 0)	/* 有该标志表明有叶子页面 */
#define BTP_ROOT		(1 << 1)	/* 根页面*/
#define BTP_DELETED		(1 << 2)	/* 从索引树中删除的页面 */
#define BTP_META		(1 << 3)	/* 元页面*/
#define BTP_HALF_DEAD	(1 << 4)	/* 空页面,但仍保留*/
#define BTP_SPLIT_END	(1 << 5)	/* rightmost page of split group */
#define BTP_HAS_GARBAGE (1 << 6)	/* page has LP_DEAD tuples,LP——DEAD元组 */
#define BTP_INCOMPLETE_SPLIT (1 << 7)	/* right sibling's downlink is missing */

// BTMetaPageData
typedef struct BTMetaPageData
{
	uint32		btm_magic;		/* should contain BTREE_MAGIC */
	uint32		btm_version;	/* should contain BTREE_VERSION */
	BlockNumber btm_root;		/* current root location */
	uint32		btm_level;		/* tree level of the root page */
	BlockNumber btm_fastroot;	/* current "fast" root location */
	uint32		btm_fastlevel;	/* tree level of the "fast" root page */
} BTMetaPageData;

3 观察存储结构

总结

  • 0级页面(最底层页面)有右兄弟的话,第一条是右兄弟的第一条(应该是本页的最后一条,注意这条不属于本页,是下一页最小值得复制)注意:0级页面的最左页面也有这个copy
  • 0级页面(最底层页面)没有右兄弟的话,第一条就是起始数据
  • root页面的data存的是指向页面的最小值,(正常是第二条,第一条存右slibing最小值copy),这个第一条一般是指向页面的右兄弟的第二条(当前页面最小值)的copy。特殊的是指向最左页面的data不记录信息。

观察存储结构:

代码语言:javascript复制
----------------
|  meta page   |
----------------
| root page    |
----------------

创建测试表

代码语言:javascript复制
create table t3(id int primary key, info text);

meta page

测试表的第0个页面是meta page,指向root page

代码语言:javascript复制
postgres=# d t3
                 Table "public.t3"
 Column |  Type   | Collation | Nullable | Default 
-------- --------- ----------- ---------- ---------
 id     | integer |           | not null | 
 info   | text    |           |          | 
Indexes:
    "t3_pkey" PRIMARY KEY, btree (id)


postgres=#  select * from bt_page_stats('t3_pkey',0);
ERROR:  block 0 is a meta page

postgres=# select * from bt_page_items('t3_pkey',0);
ERROR:  block 0 is a meta page

postgres=# select * from bt_metap('t3_pkey');
 magic  | version | root | level | fastroot | fastlevel 
-------- --------- ------ ------- ---------- -----------
 340322 |       2 |    1 |     0 |        1 |         0

场景一:100条|有meta page|有root|无branch|无leaf

代码语言:javascript复制
create table t3(id int primary key, info text);
insert into t3 select generate_series(1,100), md5(random()::text);

level = 0

代码语言:javascript复制
select * from bt_metap('t3_pkey');

btpo = 等级 btpo_flags = 类型

代码语言:javascript复制
postgres=#  select * from bt_page_stats('t3_pkey',1);
 blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags 
------- ------ ------------ ------------ --------------- ----------- ----------- ----------- ----------- ------ ------------
     1 | l    |        100 |          0 |            16 |      8192 |      6148 |         0 |         0 |    0 |          3
(1 row)

bt_page_stats数据都是从SpecialSpage的BTPageOpaque中取的,含义可以参考BTPageOpaque数据结构

代码语言:javascript复制
bt_page_stats
  GetBTPageStatistics
    BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
    ...
    stat->btpo_prev = opaque->btpo_prev;
	stat->btpo_next = opaque->btpo_next;
	stat->btpo.level = opaque->btpo.level;
	stat->btpo_flags = opaque->btpo_flags;
	stat->btpo_cycleid = opaque->btpo_cycleid;
	...

https://developer.aliyun.com/article/53701 层次可以从bt_page_stats的btpo得到,代表当前index page所处的层级。 注意层级并不是唯一的,例如btpo=3的层级,可能有分几个档。 btpo=0级表示最底层,处于这个层级的index pages存储的items(ctid)是指向heap page的。 类别和层级不挂钩,类别里面又可以有多个层级,但是只有层级=0的index page存储的ctid内容才是指向heap page的 其他层级index page存储的ctid内容都是指向同层级其他index page(双向链表),或者指下级的index page。 0层结构,只有meta和root页。 root页最多可以存储的item数,取决于索引字段数据的长度、以及索引页的大小。

当前 btpo = 等级 = 0 btpo_flags = 类型 = 3 = BTP_LEAF | BTP_ROOT

代码语言:javascript复制
#define BTP_LEAF		(1 << 0)	/* leaf page, i.e. not internal page */
#define BTP_ROOT		(1 << 1)	/* root page (has no parent) */
#define BTP_DELETED		(1 << 2)	/* page has been deleted from tree */
#define BTP_META		(1 << 3)	/* meta-page */
#define BTP_HALF_DEAD	(1 << 4)	/* empty, but still in tree */
#define BTP_SPLIT_END	(1 << 5)	/* rightmost page of split group */
#define BTP_HAS_GARBAGE (1 << 6)	/* page has LP_DEAD tuples */
#define BTP_INCOMPLETE_SPLIT (1 << 7)	/* right sibling's downlink is missing */

btpo知道这是一个0级节点,0级节点指向的是heap tuple

代码语言:javascript复制
postgres=# select * from bt_page_items('t3_pkey',1) limit 10;
 itemoffset |  ctid  | itemlen | nulls | vars |          data           
------------ -------- --------- ------- ------ -------------------------
          1 | (0,1)  |      16 | f     | f    | 01 00 00 00 00 00 00 00
          2 | (0,2)  |      16 | f     | f    | 02 00 00 00 00 00 00 00
          3 | (0,3)  |      16 | f     | f    | 03 00 00 00 00 00 00 00
          4 | (0,4)  |      16 | f     | f    | 04 00 00 00 00 00 00 00
          5 | (0,5)  |      16 | f     | f    | 05 00 00 00 00 00 00 00
          6 | (0,6)  |      16 | f     | f    | 06 00 00 00 00 00 00 00
          7 | (0,7)  |      16 | f     | f    | 07 00 00 00 00 00 00 00
          8 | (0,8)  |      16 | f     | f    | 08 00 00 00 00 00 00 00
          9 | (0,9)  |      16 | f     | f    | 09 00 00 00 00 00 00 00
         10 | (0,10) |      16 | f     | f    | 0a 00 00 00 00 00 00 00
         
postgres=# select * from t3 where ctid='(0,10)';
 id |               info               
---- ----------------------------------
 10 | 986dd76dcec6d1b705ea163fe6e11f37

场景二:1000条|有meta|有root|无branch|有leaf

代码语言:javascript复制
create table t4(id int primary key, info text);
insert into t4 select generate_series(1,1000), md5(random()::text);

level = 1(只有leaf、meta和root不计入)

代码语言:javascript复制
select * from bt_metap('t4_pkey');

1、root页面ID=3

代码语言:javascript复制
postgres=# select * from bt_metap('t4_pkey');
 magic  | version | root | level | fastroot | fastlevel 
-------- --------- ------ ------- ---------- -----------
 340322 |       2 |    3 |     1 |        3 |         1

2、页面stats

代码语言:javascript复制
-- btpo=0:已经是最底层了
-- btpo_flags=1:BTP_LEAF
postgres=# select * from bt_page_stats('t4_pkey',1);
 blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags 
------- ------ ------------ ------------ --------------- ----------- ----------- ----------- ----------- ------ ------------
     1 | l    |        367 |          0 |            16 |      8192 |       808 |         0 |         2 |    0 |          1

-- btpo=0:已经是最底层了
-- btpo_flags=1:BTP_LEAF
postgres=# select * from bt_page_stats('t4_pkey',2);
 blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags 
------- ------ ------------ ------------ --------------- ----------- ----------- ----------- ----------- ------ ------------
     2 | l    |        367 |          0 |            16 |      8192 |       808 |         1 |         4 |    0 |          1

-- btpo=1:不是最底层
-- btpo_flags=2:BTP_ROOT
postgres=# select * from bt_page_stats('t4_pkey',3);
 blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags 
------- ------ ------------ ------------ --------------- ----------- ----------- ----------- ----------- ------ ------------
     3 | r    |          3 |          0 |            13 |      8192 |      8096 |         0 |         0 |    1 |          2

-- btpo=0:已经是最底层了
-- btpo_flags=1:BTP_LEAF
postgres=# select * from bt_page_stats('t4_pkey',4);
 blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags 
------- ------ ------------ ------------ --------------- ----------- ----------- ----------- ----------- ------ ------------
     4 | l    |        268 |          0 |            16 |      8192 |      2788 |         2 |         0 |    0 |          1

postgres=# select * from bt_page_stats('t4_pkey',5);
ERROR:  block number out of range

3、root页面items

指向3个页面,data存储的是这个leaf page存储的最小值

代码语言:javascript复制
postgres=# select * from bt_page_items('t4_pkey',3);
 itemoffset | ctid  | itemlen | nulls | vars |          data           
------------ ------- --------- ------- ------ -------------------------
          1 | (1,1) |       8 | f     | f    | 
          2 | (2,1) |      16 | f     | f    | 6f 01 00 00 00 00 00 00
          3 | (4,1) |      16 | f     | f    | dd 02 00 00 00 00 00 00

第一个0级页面:

代码语言:javascript复制
postgres=# select * from bt_page_items('t4_pkey',1);
 itemoffset |  ctid   | itemlen | nulls | vars |          data           
------------ --------- --------- ------- ------ -------------------------
          1 | (3,7)   |      16 | f     | f    | 6f 01 00 00 00 00 00 00
          2 | (0,1)   |      16 | f     | f    | 01 00 00 00 00 00 00 00
          3 | (0,2)   |      16 | f     | f    | 02 00 00 00 00 00 00 00
          4 | (0,3)   |      16 | f     | f    | 03 00 00 00 00 00 00 00
...
        365 | (3,4)   |      16 | f     | f    | 6c 01 00 00 00 00 00 00
        366 | (3,5)   |      16 | f     | f    | 6d 01 00 00 00 00 00 00
        367 | (3,6)   |      16 | f     | f    | 6e 01 00 00 00 00 00 00

第二个0级页面:

代码语言:javascript复制
postgres=# select * from bt_page_items('t4_pkey',2);
 itemoffset |  ctid   | itemlen | nulls | vars |          data           
------------ --------- --------- ------- ------ -------------------------
          1 | (6,13)  |      16 | f     | f    | dd 02 00 00 00 00 00 00
          2 | (3,7)   |      16 | f     | f    | 6f 01 00 00 00 00 00 00
          3 | (3,8)   |      16 | f     | f    | 70 01 00 00 00 00 00 00
          4 | (3,9)   |      16 | f     | f    | 71 01 00 00 00 00 00 00
...
        365 | (6,10)  |      16 | f     | f    | da 02 00 00 00 00 00 00
        366 | (6,11)  |      16 | f     | f    | db 02 00 00 00 00 00 00
        367 | (6,12)  |      16 | f     | f    | dc 02 00 00 00 00 00 00

第四个0级页面:

代码语言:javascript复制
postgres=# select * from bt_page_items('t4_pkey',4);
 itemoffset |  ctid   | itemlen | nulls | vars |          data           
------------ --------- --------- ------- ------ -------------------------
          1 | (6,13)  |      16 | f     | f    | dd 02 00 00 00 00 00 00
          2 | (6,14)  |      16 | f     | f    | de 02 00 00 00 00 00 00
          3 | (6,15)  |      16 | f     | f    | df 02 00 00 00 00 00 00
          4 | (6,16)  |      16 | f     | f    | e0 02 00 00 00 00 00 00
...
        266 | (8,38)  |      16 | f     | f    | e6 03 00 00 00 00 00 00
        267 | (8,39)  |      16 | f     | f    | e7 03 00 00 00 00 00 00
        268 | (8,40)  |      16 | f     | f    | e8 03 00 00 00 00 00 00

场景三:10000000条|有meta|有root|有branch|有leaf

总结:

  • ctid的第一个数字表示页面ID,第二个数字是偏移
  • btpo_flags = 0是branch page、btpo_flags=2是root page
代码语言:javascript复制
create table t5(id int primary key, info text);  
insert into t5 select trunc(random()*10000000), md5(random()::text) from generate_series(1,1000000) on conflict on constraint t5_pkey do nothing;

level = 2(有branch和leaf)root是412

代码语言:javascript复制
select * from bt_metap('t5_pkey');
 magic  | version | root | level | fastroot | fastlevel 
-------- --------- ------ ------- ---------- -----------
 340322 |       2 |  412 |     2 |      412 |         2

查看root(data存的是左兄弟的最大值) btpo = 2(在第二层) btpo_flags = 2 = BTP_ROOT

代码语言:javascript复制
postgres=# select * from bt_page_stats('t5_pkey', 412);
 blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags 
------- ------ ------------ ------------ --------------- ----------- ----------- ----------- ----------- ------ ------------
   412 | r    |         11 |          0 |            15 |      8192 |      7936 |         0 |         0 |    2 |          2
   
postgres=# select * from bt_page_items('t5_pkey', 412);
 itemoffset |   ctid   | itemlen | nulls | vars |          data           
------------ ---------- --------- ------- ------ -------------------------
          1 | (3,1)    |       8 | f     | f    | 
          2 | (2714,1) |      16 | f     | f    | 66 07 0a 00 00 00 00 00
          3 | (1233,1) |      16 | f     | f    | a0 8c 16 00 00 00 00 00
          4 | (2381,1) |      16 | f     | f    | 0c 62 23 00 00 00 00 00
          5 | (583,1)  |      16 | f     | f    | 4d 71 30 00 00 00 00 00
          6 | (2286,1) |      16 | f     | f    | 6b 80 3d 00 00 00 00 00
          7 | (1062,1) |      16 | f     | f    | 59 fb 4b 00 00 00 00 00
          8 | (2046,1) |      16 | f     | f    | de b2 5a 00 00 00 00 00
          9 | (411,1)  |      16 | f     | f    | d6 df 6a 00 00 00 00 00
         10 | (2006,1) |      16 | f     | f    | 06 6d 79 00 00 00 00 00
         11 | (1380,1) |      16 | f     | f    | 8c 34 8a 00 00 00 00 00

这样看起来几个branch page是3、2714、1233、2381等11个页面。

代码语言:javascript复制
postgres=# select * from bt_page_stats('t5_pkey', 3);
 blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags 
------- ------ ------------ ------------ --------------- ----------- ----------- ----------- ----------- ------ ------------
     3 | i    |        214 |          0 |            15 |      8192 |      3876 |         0 |      2714 |    1 |          0

postgres=# select * from bt_page_stats('t5_pkey', 2714);
 blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags 
------- ------ ------------ ------------ --------------- ----------- ----------- ----------- ----------- ------ ------------
  2714 | i    |        273 |          0 |            15 |      8192 |      2696 |         3 |      1233 |    1 |          0

下面看一下branch的内容,例如3页面:

代码语言:javascript复制
postgres=# select * from bt_page_items('t5_pkey', 3);
 itemoffset |   ctid   | itemlen | nulls | vars |          data           
------------ ---------- --------- ------- ------ -------------------------
          1 | (1852,1) |      16 | f     | f    | 66 07 0a 00 00 00 00 00
          2 | (1,1)    |       8 | f     | f    | 
          3 | (3176,1) |      16 | f     | f    | 5c 08 00 00 00 00 00 00
          4 | (1612,1) |      16 | f     | f    | ef 11 00 00 00 00 00 00
          5 | (3045,1) |      16 | f     | f    | 2d 1a 00 00 00 00 00 00
          6 | (724,1)  |      16 | f     | f    | e2 24 00 00 00 00 00 00
          7 | (2822,1) |      16 | f     | f    | ae 2d 00 00 00 00 00 00
          8 | (1369,1) |      16 | f     | f    | 96 38 00 00 00 00 00 00
          9 | (2612,1) |      16 | f     | f    | ac 42 00 00 00 00 00 00
         10 | (315,1)  |      16 | f     | f    | 0d 4e 00 00 00 00 00 00
         11 | (2458,1) |      16 | f     | f    | d6 58 00 00 00 00 00 00
...
...

branch data保存的是什么?

查看leaf页面1852的内容,第一条是右兄弟最小值的复制;第二条是当前最小值。 所以data是保存指向页面的最小数据。

代码语言:javascript复制
postgres=# select * from bt_page_items('t5_pkey', 1852);
 itemoffset |    ctid    | itemlen | nulls | vars |          data           
------------ ------------ --------- ------- ------ -------------------------
          1 | (151,77)   |      16 | f     | f    | 2b 18 0a 00 00 00 00 00
          2 | (1231,33)  |      16 | f     | f    | 66 07 0a 00 00 00 00 00     <-----------
          3 | (4098,24)  |      16 | f     | f    | 67 07 0a 00 00 00 00 00
          4 | (2373,32)  |      16 | f     | f    | 73 07 0a 00 00 00 00 00
          5 | (7888,95)  |      16 | f     | f    | 9e 07 0a 00 00 00 00 00

怎么找到最小值?

查看root发现只有(3,1)没有data,data本应保存右兄弟的最小值。没有说明(3,1)是第一个页面。

代码语言:javascript复制
postgres=# select * from bt_page_items('t5_pkey', 412);
 itemoffset |   ctid   | itemlen | nulls | vars |          data           
------------ ---------- --------- ------- ------ -------------------------
          1 | (3,1)    |       8 | f     | f    | 
          2 | (2714,1) |      16 | f     | f    | 66 07 0a 00 00 00 00 00
          3 | (1233,1) |      16 | f     | f    | a0 8c 16 00 00 00 00 00
          4 | (2381,1) |      16 | f     | f    | 0c 62 23 00 00 00 00 00

查看branch

代码语言:javascript复制
postgres=# select * from bt_page_items('t5_pkey', 3);
 itemoffset |   ctid   | itemlen | nulls | vars |          data           
------------ ---------- --------- ------- ------ -------------------------
          1 | (1852,1) |      16 | f     | f    | 66 07 0a 00 00 00 00 00
          2 | (1,1)    |       8 | f     | f    | 
          3 | (3176,1) |      16 | f     | f    | 5c 08 00 00 00 00 00 00
          4 | (1612,1) |      16 | f     | f    | ef 11 00 00 00 00 00 00
          5 | (3045,1) |      16 | f     | f    | 2d 1a 00 00 00 00 00 00
          6 | (724,1)  |      16 | f     | f    | e2 24 00 00 00 00 00 00
          7 | (2822,1) |      16 | f     | f    | ae 2d 00 00 00 00 00 00
          8 | (1369,1) |      16 | f     | f    | 96 38 00 00 00 00 00 00
          9 | (2612,1) |      16 | f     | f    | ac 42 00 00 00 00 00 00

查看leaf,第一条是右sibling的最小值,第二条是当前页面的最小值。

代码语言:javascript复制
postgres=# select * from bt_page_items('t5_pkey', 1);
 itemoffset |    ctid    | itemlen | nulls | vars |          data           
------------ ------------ --------- ------- ------ -------------------------
          1 | (212,57)   |      16 | f     | f    | 5c 08 00 00 00 00 00 00
          2 | (7238,62)  |      16 | f     | f    | 2a 00 00 00 00 00 00 00
          3 | (430,12)   |      16 | f     | f    | 2b 00 00 00 00 00 00 00
          4 | (5535,81)  |      16 | f     | f    | 3e 00 00 00 00 00 00 00
          5 | (1278,47)  |      16 | f     | f    | 3f 00 00 00 00 00 00 00
          6 | (4255,45)  |      16 | f     | f    | 46 00 00 00 00 00 00 00

查看数据

代码语言:javascript复制
postgres=# select * from bt_page_items('t5_pkey', 1);
 itemoffset |    ctid    | itemlen | nulls | vars |          data           
------------ ------------ --------- ------- ------ -------------------------
          1 | (212,57)   |      16 | f     | f    | 5c 08 00 00 00 00 00 00
          2 | (7238,62)  |      16 | f     | f    | 2a 00 00 00 00 00 00 00
          3 | (430,12)   |      16 | f     | f    | 2b 00 00 00 00 00 00 00
          4 | (5535,81)  |      16 | f     | f    | 3e 00 00 00 00 00 00 00
          5 | (1278,47)  |      16 | f     | f    | 3f 00 00 00 00 00 00 00
          6 | (4255,45)  |      16 | f     | f    | 46 00 00 00 00 00 00 00

postgres=# select * from t5 where ctid='(7238,62)';
 id |               info               
---- ----------------------------------
 42 | faca6c2f77cfb6eea5e9ee6bf2740578

postgres=# select min(id) from t5;
 min 
-----
  42

ref

https://developer.aliyun.com/article/53701

下一篇开始翻源码

0 人点赞