Postgresql源码(87)数组构造与计算(Flat格式与Expand格式)

2022-10-31 17:51:19 浏览数 (1)

相关 《Postgresql源码(51)变长类型实现(valena.c)》 《Postgresql源码(56)可扩展类型分析ExpandedObject/ExpandedRecord》 《Postgresql源码(87)数组构造与计算(Flat格式与Expand格式)》

总结

一句话总结

数组的标准构造函数会生成紧凑的flat结构ArrayType,像元组一样数据跟在后面;pl中会把flat紧凑结构解析到expand数组结构中,并加上mxct内存上下文归属关系,便于计算。


基础概念:一维'{1,2,3,4,5,6}'::int[]

代码语言:javascript复制
ndims = 1 表示一维
p eah->dims[0] = 6 表示有6个元素
p eah->lbound[0] = 1 表示一维的下标左值

基础概念:二维'{{1,2,3,4},{3,4,5,6},{5,6,7,8}}'::int[]

代码语言:javascript复制
ndims = 2 表示二维
p eah->dims[0] = 3 表示3行
p eah->dims[1] = 4 表示4列
p eah->lbound[0] = 1 表示下标左值,切片用
p eah->lbound[1] = 1

数组flat结构

  • 数组flat结构即下图中的结构(一维数组'{1,2,3,4,5,6}'::int[]),也可以叫做紧凑结构、存储结构;便于存储,不便于计算。

数组expand结构

  • 即下图中的数据结构ExpandedArrayHeader
  • 标准EOH头加上数组特有的变量
  • 函数expand_array负责将flat结构解析出来,挂到下面结构体对应的变量上
  • 在pl内数组计算时,都是用的expand数组结构,注意:expand数组结构传值时,传递的是EOH的eoh_rw_ptr指针,指向1be结构,1be内部记录了EOH头部指针。(1be结构参考:《Postgresql源码(51)变长类型实现(valena.c)》)

EOH复习

《Postgresql源码(56)可扩展类型分析ExpandedObject/ExpandedRecord》

每一次复习都会对设计多一些认识:

  1. EOH结构:例如数组、记录等复杂数据类型通常都有紧凑的磁盘格式,不便于修改,因为修改的时候必须把剩下的全部拷贝一遍。PG提供了"expended"表示,这种表示只在内存中使用,并且针对计算做了更多优化。
  2. EOH结构:头部放了4个字节的控制位,为了适配PG的valena变长头结构。
  3. EOH结构:尾部两个10字节的数组eoh_rw_ptr、eoh_ro_ptr,两个指针记录的内容都是一样的,都是指向一个1be结构,为什么用两个指针呢? 因为EOH结构自带一些处理函数,例如下面两个函数。这些操作需要调用者拿着eoh_rw_ptr指针进来,如果用eoh_ro_ptr指针会core(只有Assert限制)。
    • TransferExpandedObject:更新EOH的父mct
    • DeleteExpandedObject:删除EOHmct内容
代码语言:javascript复制
struct ExpandedObjectHeader
{
	/* Phony varlena header */
	int32		vl_len_;		/* always EOH_HEADER_MAGIC, see below */

	/* Pointer to methods required for object type */
	const ExpandedObjectMethods *eoh_methods;

	/* Memory context containing this header and subsidiary data */
	MemoryContext eoh_context;

	/* Standard R/W TOAST pointer for this object is kept here */
	char		eoh_rw_ptr[EXPANDED_POINTER_SIZE];

	/* Standard R/O TOAST pointer for this object is kept here */
	char		eoh_ro_ptr[EXPANDED_POINTER_SIZE];
};

EOH扩展数组:ExpandedArrayHeader

数据结构:

代码语言:javascript复制
typedef struct ExpandedArrayHeader
{
	/* Standard header for expanded objects */
	ExpandedObjectHeader hdr;

	/* Magic value identifying an expanded array (for debugging only) */
	int			ea_magic;

	/* Dimensionality info (always valid) */
	int			ndims;			/* # of dimensions */
	int		   *dims;			/* array dimensions */
	int		   *lbound;			/* index lower bounds for each dimension */

	/* Element type info (always valid) */
	Oid			element_type;	/* element type OID */
	int16		typlen;			/* needed info about element datatype */
	bool		typbyval;
	char		typalign;

	/*
	 * If we have a Datum-array representation of the array, it's kept here;
	 * else dvalues/dnulls are NULL.  The dvalues and dnulls arrays are always
	 * palloc'd within the object private context, but may change size from
	 * time to time.  For pass-by-ref element types, dvalues entries might
	 * point either into the fstartptr..fendptr area, or to separately
	 * palloc'd chunks.  Elements should always be fully detoasted, as they
	 * are in the standard flat representation.
	 *
	 * Even when dvalues is valid, dnulls can be NULL if there are no null
	 * elements.
	 */
	Datum	   *dvalues;		/* array of Datums */
	bool	   *dnulls;			/* array of is-null flags for Datums */
	int			dvalueslen;		/* allocated length of above arrays */
	int			nelems;			/* number of valid entries in above arrays */

	/*
	 * flat_size is the current space requirement for the flat equivalent of
	 * the expanded array, if known; otherwise it's 0.  We store this to make
	 * consecutive calls of get_flat_size cheap.
	 */
	Size		flat_size;

	/*
	 * fvalue points to the flat representation if it is valid, else it is
	 * NULL.  If we have or ever had a flat representation then
	 * fstartptr/fendptr point to the start and end 1 of its data area; this
	 * is so that we can tell which Datum pointers point into the flat
	 * representation rather than being pointers to separately palloc'd data.
	 */
	ArrayType  *fvalue;			/* must be a fully detoasted array */
	char	   *fstartptr;		/* start of its data area */
	char	   *fendptr;		/* end 1 of its data area */
} ExpandedArrayHeader;

测试SQL

代码语言:javascript复制
DO $$
DECLARE
  arr int[] = ARRAY[1,2,3,4,5,6];
BEGIN
  raise notice '%', arr[3];
END;
$$;

第一步:数组构造:construct_md_array

执行到 arr int[] = ARRAY[1,2,3]; 由优化器解析常量表达式时进入construct_md_array。

代码语言:javascript复制
plpgsql_inline_handler
  plpgsql_exec_function
    ...
    exec_assign_expr
      exec_prepare_plan
        exec_simple_check_plan
          ...
          BuildCachedPlan
            pg_plan_queries
              pg_plan_query
                planner
                  ...
                  eval_const_expressions
                    ...
                    ExecInterpExpr
                      ExecEvalArrayExpr
                        construct_md_array

construct_md_array函数

代码语言:javascript复制
ArrayType *
construct_md_array(Datum *elems,
				   bool *nulls,
				   int ndims,
				   int *dims,
				   int *lbs,
				   Oid elmtype, int elmlen, bool elmbyval, char elmalign)
{

入参

代码语言:javascript复制
(elems=0x2b130d8, 
 nulls=0x2b13128, 
 ndims=1,                 --> 几维?ndims = 1
 dims=0x7ffdcf177ae0,     --> 每个维度有多大? dims[0] = 6
 lbs=0x7ffdcf177ac0,      --> 下标限制:lbs[0] = 1; 当前数组下标是从1开始的
 elmtype=23, 
 elmlen=4, 
 elmbyval=true, 
 elmalign=105 'i')

这里的lbs要特意提一下,因为PG数组支持这种用法:

代码语言:javascript复制
 postgres=# select f1[2] from (select '[2:3]={1,2}'::int[] as f1);
 f1 
----
  1
(1 row)

所以在构造时,可能也会提供下标,上面例子中的左下标是2开始的,所以ArrayCheckBounds时第三个参数:int *lb会给{2}

,

代码语言:javascript复制
	ArrayType  *result;
	bool		hasnulls;
	int32		nbytes;
	int32		dataoffset;
	int			i;
	int			nelems;

	/* This checks for overflow of the array dimensions */
	nelems = ArrayGetNItems(ndims, dims);

每个维度检查一下给的左下标是不是太大了,这里的情况是:

dims=1

只需要检查lab0即可,lab[0]=1<2147483640符合要求

如果dims=2,需要继续检查lab1

代码语言:javascript复制
	ArrayCheckBounds(ndims, dims, lbs);

现在是有数据传入的nelems=6,不能构造空数组

代码语言:javascript复制
	/* if ndims <= 0 or any dims[i] == 0, return empty array */
	if (nelems <= 0)
		return construct_empty_array(elmtype);

	nbytes = 0;
	hasnulls = false;

att_addlength_datum算长度

att_align_nominal算对齐长度,这里elmalign='i’表示整形,长度4不用对齐

最后6个数字总共需要nbytes=6x4=24字节

代码语言:javascript复制
	for (i = 0; i < nelems; i  )
	{
		if (nulls && nulls[i])
		{
			hasnulls = true;
			continue;
		}
		nbytes = att_addlength_datum(nbytes, elmlen, elems[i]);
		nbytes = att_align_nominal(nbytes, elmalign);
	}

	/* Allocate and initialize result array */
	if (hasnulls)
	{
		dataoffset = ARR_OVERHEAD_WITHNULLS(ndims, nelems);
		nbytes  = dataoffset;
	}
	else
	{
		dataoffset = 0;			/* marker for no null bitmap */
		nbytes  = ARR_OVERHEAD_NONULLS(ndims);
	}
	result = (ArrayType *) palloc0(nbytes);
	SET_VARSIZE(result, nbytes);

查看长度?

nbytes = 48

(gdb) p ((varattrib_4b*)result)->va_4byte->va_header>>2

$116 = 48

代码语言:javascript复制
	result->ndim = ndims;
	result->dataoffset = dataoffset;
	result->elemtype = elmtype;
	memcpy(ARR_DIMS(result), dims, ndims * sizeof(int));
	memcpy(ARR_LBOUND(result), lbs, ndims * sizeof(int));

	CopyArrayEls(result,
				 elems, nulls, nelems,
				 elmlen, elmbyval, elmalign,
				 false);

	return result;
}

最终内存结构

第二步:赋值前调用expand_array转换ArrayType为ExpandedArray

arr int[] = ARRAY[1,2,3,4,5,6];等号右侧执行完会构造出ArrayType上图中的数据结构,现在需要将ArrayType结构包装成Expand Array结构来使用,使数组结构拥有父mcxt,增加归属。

代码语言:javascript复制
Datum
expand_array(Datum arraydatum, MemoryContext parentcontext,
			 ArrayMetaState *metacache)
{
	ArrayType  *array;
	ExpandedArrayHeader *eah;
	MemoryContext objcxt;
	MemoryContext oldcxt;
	ArrayMetaState fakecache;

创建"expanded array"挂在入参提供"SPI Proc"下。

代码语言:javascript复制
	objcxt = AllocSetContextCreate(parentcontext,
								   "expanded array",
								   ALLOCSET_START_SMALL_SIZES);

	/* Set up expanded array header */
	eah = (ExpandedArrayHeader *)
		MemoryContextAlloc(objcxt, sizeof(ExpandedArrayHeader));

初始化EOH结构

  • eah->hdr:array是EOH的子结构,给出eah->hdr指向EOH
  • EA_methods:给数组专用转换函数EA_get_flat_size、EA_flatten_into用于将expanded结构转换为存储结构,这里的存储结构就是指的ArrayType上图中的紧凑结构
  • objcxt:配置上下文
代码语言:javascript复制
	EOH_init_header(&eah->hdr, &EA_methods, objcxt);
	eah->ea_magic = EA_MAGIC;

下面开始把紧凑结构展开到ExpandedArrayHeader结构体重

先切到"expanded array"把flat array数据拷贝过来

代码语言:javascript复制
	oldcxt = MemoryContextSwitchTo(objcxt);
	array = DatumGetArrayTypePCopy(arraydatum);
	MemoryContextSwitchTo(oldcxt);

p eah->ndims = 1

p eah->dims0 = 6

p eah->lbound0 = 1

p eah->element_type = 23

p eah->typlen = 4

p eah->typbyval = true

p eah->typalign = ‘i’

代码语言:javascript复制
	eah->ndims = ARR_NDIM(array);
	/* note these pointers point into the fvalue header! */
	eah->dims = ARR_DIMS(array);
	eah->lbound = ARR_LBOUND(array);
	eah->element_type = ARR_ELEMTYPE(array);
	...
		get_typlenbyvalalign(eah->element_type,
							 &eah->typlen,
							 &eah->typbyval,
							 &eah->typalign);
	...

	/* we don't make a deconstructed representation now */
	eah->dvalues = NULL;
	eah->dnulls = NULL;
	eah->dvalueslen = 0;
	eah->nelems = 0;
	eah->flat_size = 0;

flat头位置由eah->fvalue指向

flat数据位置由fstartptr指向

flat整体结尾位置由fendptr指向

代码语言:javascript复制
	/* remember we have a flat representation */
	eah->fvalue = array;
	eah->fstartptr = ARR_DATA_PTR(array);
	eah->fendptr = ((char *) array)   ARR_SIZE(array);

注意,返回的是EOH的eoh_rw_ptr指针(再复习:eoh_rw_ptr指针指向1be数据部分放了个EOH头指针)

代码语言:javascript复制
	/* return a R/W pointer to the expanded array */
	return EOHPGetRWDatum(&eah->hdr);
}

0 人点赞