PG中的blob cleanup

PG提供了一个很好的BLOB接口，得到了广泛应用。然而最近我们遇到了各种客户遇到的问题，有必要对PG如何处理blob进行一次思考，尤其是BLOB清理。

PG的BLOB接口

PG中可以使用各种方式存储二进制数据，最简单的方式是定义一个“bytea”（=byte array）数据类型。也就是定义一个bytea列：

test=# CREATE TABLE t_image (id int, name text, image bytea);

CREATE TABLE

test=# d t_image

Table "public.t_image"

Column | Type | Collation | Nullable | Default

------- --------- ----------- ---------- ---------

id | integer | | |

name | text | | |

image | bytea | | |

如你所见，这是一个普通列，可以像普通列一样使用。唯一值得一提的是必须在SQL级别使用的编码。PG使用一个变量来配置此行为：

test=# SHOW bytea_output;

bytea_output

--------------

hex

(1 row)

bytea_output有两个值，hex:表示PG以16进制格式发送数据，escape:表示以8进制发送。除了每个字段最大大小限制1GB外，应用程序在这里不需要担心太多。

PG还有第二种接口处理二进制数据：BLOB接口。例子：

test=# SELECT lo_import('/etc/hosts');

lo_import

-----------

80343

(1 row)

这个例子中，“/etc/hosts”内容存储到了数据库。PG有这个数据的副本而不是到文件系统的链接。需要注意，数据库将返回新条目的OID（存储到了系统表），为了跟踪这些OID，一些开发人员执行以下操作：

test=# CREATE TABLE t_file (

id int,

name text,

object_id oid

);

CREATE TABLE

test=# INSERT INTO t_file

VALUES (1, 'some_name', lo_import('/etc/hosts'))

RETURNING *;

id | name | object_id

---- --------------- -----------

1 | some_name | 80350

(1 row)

INSERT 0 1

除了做下面动作，其他方面都还好：

test=# DELETE FROM t_file WHERE id = 1;

DELETE 1

问题是对象的OID被遗忘了，虽然删除了t_file里的一个记录，但是二进制数据仍然存在。PG通过pg_largeobject系统表存储二进制数据。所有的lo_functions为了处理下面事情会和该系统表交互：

test=# x

Expanded display is on.

test=# SELECT * FROM pg_largeobject WHERE loid = 80350;

-[ RECORD 1 ]------------------------------------------

loid | 80350

pageno | 0

data | ##12# Host Database12#12# localhost ...

问题是什么？原因很简单，数据库会膨胀，因此删除BLOB条目正确的方法是：

test=# x

Expanded display is off.

test=#

test=# SELECT lo_unlink(80350);

lo_unlink

-----------

(1 row)

test=# SELECT * FROM pg_largeobject WHERE loid = 80350;

loid | pageno | data

------ -------- ------

(0 rows)

如果忘记删除object，我们经常会遇到这样的事情：你将遭受膨胀带来的痛苦。如果使用BLOB接口，这是一个主要问题。

vacuumlo:清理死大对象

然而，一旦积累了成千上万的死blob，如何解决这个问题呢？可以通过vacuumlo工具：

test=# SELECT lo_import('/etc/hosts');

lo_import

-----------

80351

(1 row)

iMac:~ hs$ vacuumlo -h localhost -v test

Connected to database "test"

Checking object_id in public.t_file

Successfully removed 2 large objects from database "test".

可以看到，死大对象被清理了。

额外函数

PG提供了不止lo_import和lo_unlink函数：

test=# df lo_*

List of functions

Schema | Name | Result data type | Argument data types | Type

------------ --------------- ------------------ --------------------------- ------

pg_catalog | lo_close | integer | integer | func

pg_catalog | lo_creat | oid | integer | func

pg_catalog | lo_create | oid | oid | func

pg_catalog | lo_export | integer | oid, text | func

pg_catalog | lo_from_bytea | oid | oid, bytea | func

pg_catalog | lo_get | bytea | oid | func

pg_catalog | lo_get | bytea | oid, bigint, integer | func

pg_catalog | lo_import | oid | text | func

pg_catalog | lo_import | oid | text, oid | func

pg_catalog | lo_lseek | integer | integer, integer, integer | func

pg_catalog | lo_lseek64 | bigint | integer, bigint, integer | func

pg_catalog | lo_open | integer | oid, integer | func

pg_catalog | lo_put | void | oid, bigint, bytea | func

pg_catalog | lo_tell | integer | integer | func

pg_catalog | lo_tell64 | bigint | integer | func

pg_catalog | lo_truncate | integer | integer, integer | func

pg_catalog | lo_truncate64 | integer | integer, bigint | func

pg_catalog | lo_unlink | integer | oid | func

(18 rows)

由于历史原因，还有2个函数没有遵循命名约定：lowrite和loread：

pg_catalog | loread | bytea | integer, integer | func

pg_catalog | lowrite | integer | integer, bytea | func

最后

PG的BLOB接口非常有用。美妙之处在于它完全是事务性的，因此二进制内容和元数据sync后不会丢失。

原文

https://www.cybertec-postgresql.com/en/blob-cleanup-in-postgresql/

数据库 sql 云推荐引擎

0 人点赞