PostgreSQL 14中提升Nested Loop Joins性能的enable_memoize
最近在PG14中发现新增一个配置参数enable_memoize,通过此参数可以提升嵌套循环连接的性能,有人测试性能竟然能提升1000倍!
将查询提升1000倍暗示整个语句非常烂,而memoize可能有很大帮助,那么对于普通join是否也有用呢?
什么是memoization?
memoization表示:给定y=f(x)关系,可以在任何计算中使用y替代f(x)。例如,不论计算多少此upper(x),总会得到X。如果整个函数计算代价非常高,并且输入值仅几种,那么为什么不维护一个映射输入值的哈希表,并通过它来查找已知值而不是计算他们呢?
正如我之前的博文https://blog.jooq.org/oracle-scalar-subquery-caching/,Oracle 11已经引入了一个标量子查询缓存的特性,可以在jOOQ中激活,避免代价高昂的PL/SQL上下文切换。
enable_memoize在PostgreSQL嵌套循环join中非常有用,参考上述推文,lateral join通常由嵌套循环连接来执行。
开启/关闭该参数
创建一个schema:
代码语言:javascript复制CREATE TABLE t AS
SELECT i, i % 5 AS j
FROM generate_series(1, 100000) AS t(i);
CREATE TABLE u AS
SELECT i, i % 20000 as j
FROM generate_series(1, 100000) AS t(i);
CREATE INDEX uj ON u(j);
总之,表t和u有100000条记录;t.j仅有5个不同的记录,每个重复20000次;u.j有20000个不同记录,每个重复5次。在PG14中查看enable_memoize:
代码语言:javascript复制SELECTcurrent_setting('enable_memoize');
|current_setting|
|---------------|
|on |
整个特性已开启,看下explain:
代码语言:javascript复制EXPLAIN
SELECT *
FROM t JOIN u ON t.j = u.j;
|QUERY PLAN |
|----------------------------------------------------------------------|
|Nested Loop (cost=0.30..8945.41 rows=496032 width=16) |
| -> Seq Scan on t (cost=0.00..1443.00 rows=100000 width=8) |
| -> Memoize (cost=0.30..0.41 rows=5 width=8) |
| Cache Key: t.j |
| -> Index Scan using uj on u (cost=0.29..0.40 rows=5 width=8)|
| Index Cond: (j = t.j) |
如果没有memoization,进行join时,对于t中的100000行记录,需要执行100000次匹配u表中的这5个值,但使用memoization后,查询仅需要执行5次,因为t.j仅有5个不同的值。
关闭后,PG似乎在我的机器上选择hash join或merge join(多次执行,执行计划可能会变)
代码语言:javascript复制|QUERY PLAN |
|-------------------------------------------------------------------|
|Hash Join (cost=3084.00..11568.51 rows=499351 width=16) |
| Hash Cond: (t.j = u.j) |
| -> Seq Scan on t (cost=0.00..1443.00 rows=100000 width=8) |
| -> Hash (cost=1443.00..1443.00 rows=100000 width=8) |
| -> Seq Scan on u (cost=0.00..1443.00 rows=100000 width=8)|
代码语言:javascript复制|QUERY PLAN |
|------------------------------------------------------------------------|
|Merge Join (cost=9748.11..763846.11 rows=50000000 width=16) |
| Merge Cond: (u.j = t.j) |
| -> Index Scan using uj on u (cost=0.29..3848.29 rows=100000 width=8)|
| -> Sort (cost=9747.82..9997.82 rows=100000 width=8) |
| Sort Key: t.j |
| -> Seq Scan on t (cost=0.00..1443.00 rows=100000 width=8) |
使用benchmark
1)在模式A和模式B中重复操作25次并进行比较
2)重复上述5次以减轻预热和其他缓存效果
代码语言:javascript复制DO $$
DECLARE
v_ts TIMESTAMP;
v_repeat CONSTANT INT := 25;
rec RECORD;
BEGIN
-- Repeat the whole benchmark several times to avoid warmup penalty
FOR r IN 1..5 LOOP
v_ts := clock_timestamp();
SET enable_memoize = OFF;
FOR i IN 1..v_repeat LOOP
FOR rec IN (
SELECT t.*
FROM t JOIN u ON t.j = u.j
) LOOP
NULL;
END LOOP;
END LOOP;
RAISE INFO 'Run %, Statement 1: %', r, (clock_timestamp() - v_ts);
v_ts := clock_timestamp();
SET enable_memoize = ON;
FOR i IN 1..v_repeat LOOP
FOR rec IN (
SELECT t.*
FROM t JOIN u ON t.j = u.j
) LOOP
NULL;
END LOOP;
END LOOP;
RAISE INFO 'Run %, Statement 2: %', r, (clock_timestamp() - v_ts);
RAISE INFO '';
END LOOP;
END$$;
在我机器上,结果一致:
代码语言:javascript复制Run 1, Statement 1: 00:00:03.763426
Run 1, Statement 2: 00:00:03.401346
Run 2, Statement 1: 00:00:03.769419
Run 2, Statement 2: 00:00:03.375677
Run 3, Statement 1: 00:00:03.771465
Run 3, Statement 2: 00:00:03.374413
Run 4, Statement 1: 00:00:03.769136
Run 4, Statement 2: 00:00:03.398734
Run 5, Statement 1: 00:00:03.772544
Run 5, Statement 2: 00:00:03.375272
即有10%的加速。在整个系统中,仅此一项就已经值得。
优化LATERAL
使用LATERAL替代,执行下面查询:
代码语言:javascript复制SELECT *
FROM
t,
LATERAL (
SELECT count(*)
FROM u
WHERE t.j = u.j
) AS u(j)
Explain的结果:
代码语言:javascript复制|QUERY PLAN |
|---------------------------------------------------------------------------------|
|Nested Loop (cost=4.40..3969.47 rows=100000 width=16) |
| -> Seq Scan on t (cost=0.00..1443.00 rows=100000 width=8) |
| -> Memoize (cost=4.40..4.42 rows=1 width=8) |
| Cache Key: t.j |
| -> Aggregate (cost=4.39..4.40 rows=1 width=8) |
| -> Index Only Scan using uj on u (cost=0.29..4.38 rows=5 width=0)|
| Index Cond: (j = t.j) |
因此,对于5个不同的t.j值作为入参,可以再次缓存COUNT(*)值,而不是每次都重复计算。
代码语言:javascript复制DO $$
DECLARE
v_ts TIMESTAMP;
v_repeat CONSTANT INT := 25;
rec RECORD;
BEGIN
-- Repeat the whole benchmark several times to avoid warmup penalty
FOR r IN 1..5 LOOP
v_ts := clock_timestamp();
SET enable_memoize = OFF;
FOR i IN 1..v_repeat LOOP
FOR rec IN (
SELECT *
FROM
t,
LATERAL (
SELECT count(*)
FROM u
WHERE t.j = u.j
) AS u(j)
) LOOP
NULL;
END LOOP;
END LOOP;
RAISE INFO 'Run %, Statement 1: %', r, (clock_timestamp() - v_ts);
v_ts := clock_timestamp();
SET enable_memoize = ON;
FOR i IN 1..v_repeat LOOP
FOR rec IN (
SELECT *
FROM
t,
LATERAL (
SELECT count(*)
FROM u
WHERE t.j = u.j
) AS u(j)
) LOOP
NULL;
END LOOP;
END LOOP;
RAISE INFO 'Run %, Statement 2: %', r, (clock_timestamp() - v_ts);
RAISE INFO '';
END LOOP;
END$$;
这次有了巨大提升:
代码语言:javascript复制Run 1, Statement 1: 00:00:03.419728
Run 1, Statement 2: 00:00:01.083941
Run 2, Statement 1: 00:00:03.404954
Run 2, Statement 2: 00:00:01.098404
Run 3, Statement 1: 00:00:03.425725
Run 3, Statement 2: 00:00:01.093883
Run 4, Statement 1: 00:00:03.441691
Run 4, Statement 2: 00:00:01.127837
Run 5, Statement 1: 00:00:03.420172
Run 5, Statement 2: 00:00:01.097943
这真是一个好消息,这也适用普通的相关子查询吗?因为上面的LATERAL相关子查询也可以改写成:
代码语言:javascript复制SELECT
t.*,
(
SELECT count(*)
FROM u
WHERE t.j = u.j
) j
FROM t;
遗憾的是,该计划并没有显示使用memoization:
代码语言:javascript复制|QUERY PLAN |
|-----------------------------------------------------------------------------|
|Seq Scan on t (cost=0.00..441693.00 rows=100000 width=16) |
| SubPlan 1 |
| -> Aggregate (cost=4.39..4.40 rows=1 width=8) |
| -> Index Only Scan using uj on u (cost=0.29..4.38 rows=5 width=0)|
| Index Cond: (j = t.j) |
代码语言:javascript复制Run 1, Statement 1: 00:00:03.617562
Run 1, Statement 2: 00:00:03.605765
Run 2, Statement 1: 00:00:03.610084
Run 2, Statement 2: 00:00:03.682064
Run 3, Statement 1: 00:00:03.725952
Run 3, Statement 2: 00:00:03.705622
Run 4, Statement 1: 00:00:03.672669
Run 4, Statement 2: 00:00:03.644612
Run 5, Statement 1: 00:00:03.645741
Run 5, Statement 2: 00:00:03.642717
似乎有了整个新功能,将来可以将相关子查询重写成嵌套循环外连接?其他优化器已经这么做了,我们在这里将拥有和Oracle标量子查询缓存相同的功能。
结论
该功能在PG14中开启,除了一些额外的内存消耗,看不出其他任何缺点。相关子查询是一个函数,他的入参是outer查询列的谓词和引用。相关子查询的结果可以缓存和memoized。如上所示,这对您的SQL查询将带来巨大影响。只需升级到PG14即可从中获利。
原文
https://blog.jooq.org/postgresql-14s-enable_memoize-for-improved-performance-of-nested-loop-joins/