Postgresql源码(58)元组拼接heap_form_tuple剖析

2022-06-30 15:19:52 浏览数 (1)

版本:14 相关: 《Postgresql源码(51)变长类型实现(valena.c)》 《Postgresql源码(56)可扩展类型分析ExpandedObject/ExpandedRecord》

1 背景

  • PG中元组的表现有两种格式:expanded格式(便于计算)和flatten格式(便于保存)
  • 前文《Postgresql源码(56)可扩展类型分析ExpandedObject/ExpandedRecord》中说明了元组的扩展格式
  • 本篇介绍元组更通用的flatten格式HeapTupleData
  • expanded格式和flatten格式是可以互相转换的(flatten_into函数指针,参考Postgresql源码(56))

2 总结

代码语言:javascript复制
typedef struct HeapTupleData
{
	uint32		t_len;			/* length of *t_data */
	ItemPointerData t_self;		/* SelfItemPointer */
	Oid			t_tableOid;		/* table the tuple came from */
	HeapTupleHeader t_data;		/* -> tuple header and data */
} HeapTupleData;
  • t_len来看,这是一个很明显的4B头变长结构(参考《Postgresql源码(51)变长类型实现(valena.c)》),变长类型使用4B头遵循PG内部约定。

3 HeapTuple的构造函数heap_form_tuple

HeapTuple结构在heap_form_tuple函数中拼接,后文重点分析这个函数:

这里已插入5列数据为例:三定长、二变长

代码语言:javascript复制
drop table t21;
create table t21(i1 int, v10 varchar(10), n1 numeric, c2 char(2), t1 text);
insert into t21 values (1, 'mylen=7', 5.5, '22', 'hi12345');

3.1 heap_form_tuple入参

构造函数heap_form_tuple

代码语言:javascript复制
HeapTuple
heap_form_tuple(TupleDesc tupleDescriptor, Datum *values, bool *isnull)

注意入参是一个元组描述符、值数组、isnull数组,值数组里面记的是int值或datum数据指针

代码语言:javascript复制
(gdb) p *tupleDescriptor
$9 = {natts = 5, tdtypeid = 2249, tdtypmod = -1, tdrefcount = -1, constr = 0x0, attrs = 0x199ce90}
(gdb) p values[0]
$11 = 1            : int的值
(gdb) p values[1]
$12 = 27157600     : datum数据指针
(gdb) p values[2]
$13 = 27153160     : datum数据指针
(gdb) p values[3]
$14 = 27158432     : datum数据指针
(gdb) p values[4]
$15 = 27154592     : datum数据指针
(gdb) p isnull[0] 
$17 = false
(gdb) p isnull[1]
$18 = false
(gdb) p isnull[2]
$19 = false
(gdb) p isnull[3]
$20 = false
(gdb) p isnull[4]
$21 = false

3.2 heap_form_tuple执行流程

  • 注意:hoff的位置是HeapTupleHeaderData往后多少能偏移到数据
  • 注意:tuple->t_data的位置是HeapTupleData往后偏移多少能到HeapTupleHeaderData头的位置
  • 内存结构是:HeapTupleData HeapTupleHeaderData 数据
代码语言:javascript复制
heap_form_tuple
...
    len = offsetof(HeapTupleHeaderData, t_bits)        : 计算出头的大小len = 23,t_bits是柔性数组指针
    hoff = len = MAXALIGN(len);                        : 对齐hoff = len = 24
    data_len = heap_compute_data_size(...)             : 计算出数据需要的长度见3.3,共data_len = 30字节
    len  = data_len;                                   : len = 24   30 = 54

    tuple = (HeapTuple) palloc0(HEAPTUPLESIZE   len)   : 申请HeapTupleData   HeapTupleHeaderData   数据30字节
    tuple->t_data = td = (HeapTupleHeader) ((char *) tuple   HEAPTUPLESIZE)
                                                       : t_data指向的是HeapTupleData后,HeapTupleHeaderData头的位置
    ...
    // 配置tuple的值
    ...
    heap_fill_tuple                                    : 根据数据类型开始添加数据,见3.4

3.3 heap_compute_data_size

计算数据长度heap_compute_data_size,已下面SQL为例

代码语言:javascript复制
drop table t21;
create table t21(i1 int, v10 varchar(10), n1 numeric, c2 char(2), t1 text);
insert into t21 values (1, 'mylen=7', 5.5, '22', 'hi12345');

函数对每个列单独处理,主要处理逻辑走三个分支:

3.3.1 三个分支的进入逻辑

分支一: atti->attlen != -1atti->attstorage != 'p' 且 当前是4B头 且 数据很短能换成1B头

分支二: atti->attlen != -1 且 当前是1B_E头 且 1B_E是RO类型VARTAG_EXPANDED_RO

分支三: 其他情况

代码语言:javascript复制
		if (ATT_IS_PACKABLE(atti) &&
			VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
		{
			/*
			 * we're anticipating converting to a short varlena header, so
			 * adjust length and don't count any alignment
			 */
			data_length  = VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val));
		}
		else if (atti->attlen == -1 &&
				 VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
		{
			/*
			 * we want to flatten the expanded value so that the constructed
			 * tuple doesn't depend on it
			 */
			data_length = att_align_nominal(data_length, atti->attalign);
			data_length  = EOH_get_flat_size(DatumGetEOHP(val));
		}
		else
		{
			data_length = att_align_datum(data_length, atti->attalign,
										  atti->attlen, val);
			data_length = att_addlength_datum(data_length, atti->attlen,
											  val);
		}

对于五列测试数据

int类型:走分支三(长度4)
代码语言:javascript复制
(gdb) p atti->attlen
$30 = 4
(gdb) p atti->attstorage
$31 = 112 'p'

计算流程

代码语言:javascript复制
// 第一步:对齐data_length=0,对齐后还是0
			data_length = att_align_datum(data_length, atti->attalign,
										  atti->attlen, val);
// 第二步:加上长度atti->attlen,data_length=4
			data_length = att_addlength_datum(data_length, atti->attlen,
											  val);

长度增加4

varchar类型:走分支一(长度8)
代码语言:javascript复制
(gdb) p atti->attlen
$38 = -1
(gdb) p atti->attstorage
$39 = 120 'x'

计算流程

代码语言:javascript复制
// 能1B就能装下了, 后面会把4B转成1B头,这里按1B计算长度即可
data_length  = VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val))

长度增加8

numeric类型:走分支一(长度7)
char类型:走分支一(长度3)

1B头加上自己2个字节,一共三字节

text类型:走分支一(长度8)

1B头加上自己7个字节,一共8字节

3.4 heap_fill_tuple

heap_fill_tuple对每一列调用fill_val填入数据

代码语言:javascript复制
heap_fill_tuple
  for (i = 0; i < numberOfAttributes; i  )
    fill_val(...)  

fill_val的分支就比较多了,对于每一列都进入下面4个分支来处理

代码语言:javascript复制
    if (att->attbyval)
	{
		/* pass-by-value */
		data = (char *) att_align_nominal(data, att->attalign);
		store_att_byval(data, datum, att->attlen);
		data_length = att->attlen;
	}
	else if (att->attlen == -1)
	{
		/* varlena */
		Pointer		val = DatumGetPointer(datum);

		*infomask |= HEAP_HASVARWIDTH;
		if (VARATT_IS_EXTERNAL(val))
		{
			if (VARATT_IS_EXTERNAL_EXPANDED(val))
			{
				/*
				 * we want to flatten the expanded value so that the
				 * constructed tuple doesn't depend on it
				 */
				ExpandedObjectHeader *eoh = DatumGetEOHP(datum);

				data = (char *) att_align_nominal(data,
												  att->attalign);
				data_length = EOH_get_flat_size(eoh);
				EOH_flatten_into(eoh, data, data_length);
			}
			else
			{
				*infomask |= HEAP_HASEXTERNAL;
				/* no alignment, since it's short by definition */
				data_length = VARSIZE_EXTERNAL(val);
				memcpy(data, val, data_length);
			}
		}
		else if (VARATT_IS_SHORT(val))
		{
			/* no alignment for short varlenas */
			data_length = VARSIZE_SHORT(val);
			memcpy(data, val, data_length);
		}
		else if (VARLENA_ATT_IS_PACKABLE(att) &&
				 VARATT_CAN_MAKE_SHORT(val))
		{
			/* convert to short varlena -- no alignment */
			data_length = VARATT_CONVERTED_SHORT_SIZE(val);
			SET_VARSIZE_SHORT(data, data_length);
			memcpy(data   1, VARDATA(val), data_length - 1);
		}
		else
		{
			/* full 4-byte header varlena */
			data = (char *) att_align_nominal(data,
											  att->attalign);
			data_length = VARSIZE(val);
			memcpy(data, val, data_length);
		}
	}
	else if (att->attlen == -2)
	{
		/* cstring ... never needs alignment */
		*infomask |= HEAP_HASVARWIDTH;
		Assert(att->attalign == TYPALIGN_CHAR);
		data_length = strlen(DatumGetCString(datum))   1;
		memcpy(data, DatumGetPointer(datum), data_length);
	}
	else
	{
		/* fixed-length pass-by-reference */
		data = (char *) att_align_nominal(data, att->attalign);
		Assert(att->attlen > 0);
		data_length = att->attlen;
		memcpy(data, DatumGetPointer(datum), data_length);
	}

分支:

  1. att->attbyval == true 值是直接传递的,就直接赋值就好了
  2. att->attlen == -1变长头类型,要走valena按4B、1B、1B_E分别处理
  3. att->attlen == -2直接拷贝cstring类型
  4. 其他:直接拷贝

对于五列测试数据

int类型:走分支一:值拷贝

传值的数据保存在栈内存上,直接赋值即可

varchar类型:走分支二:数据4B转换为1B后内存拷贝

数据足够小,可以不用4B头存储,转换为1B头保存后拷贝

numeric类型:走分支二:数据4B转换为1B后内存拷贝

数据足够小,可以不用4B头存储,转换为1B头保存后拷贝

char类型:走分支二:数据4B转换为1B后内存拷贝

数据足够小,可以不用4B头存储,转换为1B头保存后拷贝

text类型:走分支二:数据4B转换为1B后内存拷贝

数据足够小,可以不用4B头存储,转换为1B头保存后拷贝

0 人点赞