新特性总结
快速拥有GP7环境
参考:https://www.xmmup.com/zaidockerzhongkuaisutiyangreenplum-7-0-0.html
代码语言:javascript复制docker rm -f gpdb7
docker run -itd --name gpdb7 -h gpdb7
-p 5437:5432 -p 28087:28080
-v /sys/fs/cgroup:/sys/fs/cgroup
--privileged=true lhrbest/greenplum:7.0.0
/usr/sbin/init
docker exec -it gpdb7 bash
su - gpadmin
gpstart -a
[gpadmin@gpdb7 ~]$ psql
psql (12.12)
Type "help" for help.
postgres=# select version();
version
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 12.12 (Greenplum Database 7.0.0 build commit:0a7a3566873325aca1789ae6f818c80f17a9402d Open Source) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-18), 64-bit compiled on Sep 20 2023 23:29:19 Bhuvnesh C.
(1 row)
postgres=# select * from gp_segment_configuration ;
dbid | content | role | preferred_role | mode | status | port | hostname | address | datadir
------ --------- ------ ---------------- ------ -------- ------ ---------- --------- --------------------------------------------
1 | -1 | p | p | n | u | 5432 | gpdb7 | gpdb7 | /opt/greenplum/data/master/gpseg-1
6 | -1 | m | m | s | u | 5433 | gpdb7 | gpdb7 | /opt/greenplum/data/master_standby/gpseg-1
2 | 0 | p | p | n | u | 6000 | gpdb7 | gpdb7 | /opt/greenplum/data/primary/gpseg0
4 | 0 | m | m | n | d | 7000 | gpdb7 | gpdb7 | /opt/greenplum/data/mirror/gpseg0
3 | 1 | p | p | n | u | 6001 | gpdb7 | gpdb7 | /opt/greenplum/data/primary/gpseg1
5 | 1 | m | m | n | d | 7001 | gpdb7 | gpdb7 | /opt/greenplum/data/mirror/gpseg1
(6 rows)
重要的新特性
内核
Greenplum 7的内核从9.4.26升级到了12.12,其性能、功能均领先于Greenplum 6
自动启用vacuuum
Automatic Vacuum is now enabled by default for all databases, which automatically performs VACUUM
and ANALYZE
operations against all catalog tables, as well as runs ANALYZE
for all users tables in those databases.
- autovacuum
- autovacuum_analyze_scale_factor
- autovacuum_analyze_threshold
- autovacuum_freeze_max_age
- autovacuum_max_workers
- autovacuum_multixact_freeze_max_age
- autovacuum_naptime
- autovacuum_vacuum_cost_delay
- autovacuum_vacuum_cost_limit
- autovacuum_vacuum_scale_factor
- autovacuum_vacuum_threshold
- gp_autovacuum_scope
参数含义参考: https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/ref_guide-config_params-guc-list.html#autovacuum
代码语言:javascript复制[gpadmin@gpdb7 ~]$ gpconfig -s autovacuum
Values on all segments are consistent
GUC : autovacuum
Coordinator value: on
Segment value: on
[gpadmin@gpdb7 ~]$ gpconfig -s autovacuum_analyze_threshold
Values on all segments are consistent
GUC : autovacuum_analyze_threshold
Coordinator value: 50
Segment value: 50
[gpadmin@gpdb7 ~]$
[gpadmin@gpdb7 ~]$ gpconfig -s autovacuum_max_workers
Values on all segments are consistent
GUC : autovacuum_max_workers
Coordinator value: 3
Segment value: 3
[gpadmin@gpdb7 ~]$ gpconfig -s gp_autovacuum_scope
Values on all segments are consistent
GUC : gp_autovacuum_scope
Coordinator value: catalog
Segment value: catalog
[gpadmin@gpdb7 ~]$
[gpadmin@gpdb7 ~]$ gpconfig -s autovacuum_naptime
Values on all segments are consistent
GUC : autovacuum_naptime
Coordinator value: 1min
Segment value: 1min
[gpadmin@gpdb7 ~]$
[gpadmin@gpdb7 ~]$ ps -ef|grep cuum
gpadmin 1252 1236 0 12:17 ? 00:00:00 postgres: 6000, autovacuum launcher
gpadmin 1259 1241 0 12:17 ? 00:00:00 postgres: 6001, autovacuum launcher
gpadmin 1279 1273 0 12:17 ? 00:00:00 postgres: 5432, autovacuum launcher
gpadmin 2903 461 0 12:41 pts/1 00:00:00 grep --color=auto cuum
[gpadmin@gpdb7 ~]$
gpconfig -c gp_autostats_mode -v on_change
gpconfig -c gp_autostats_on_change_threshold -v 100000
PostgreSQL中的autovacuum我们已经十分熟悉,会定期针对达到阈值的表执行vacuum,在Greenplum中有所变化,仅对template0会执行,预防可能的事务ID回卷问题
“ PostgreSQL has a separate optional server process called the autovacuum daemon, whose purpose is to automate the execution of VACUUM and ANALYZE commands. Greenplum Database enables the autovacuum daemon to perform VACUUM operations only on the Greenplum Database template database template0. Autovacuum is enabled for template0 because connections are not allowed to template0. The autovacuum daemon performs VACUUM operations on template0 to manage transaction IDs (XIDs) and help avoid transaction ID wraparound issues in template0.
所以Greenplum没有worker这个说法。在以前,执行了大量的增删改之后要手动执行vacuum/analyze,同样系统表也会膨胀,比如频繁使用了临时表,会在pg_class/pg_attribute中留下垃圾,并且系统表膨胀的影响是全局性的,所以系统表也要经常性vacuum,假如系统表遇到不得不做vacuum full的场景就会很头疼。在Greenplum7中对此进行了很大改进,Greenplum会自动清理系统表以及根据可选参数清理一些辅助表,自动收集普通用户表的统计信息。
代码语言:javascript复制create table t8 as select * from pg_class limit 5;
create table t9 as select * from pg_class limit 50;
select * from pg_stat_all_tables where schemaname='public' and relname = 't9';
SELECT * from pg_stats d where d.tablename='t9';
支持仅索引扫描和覆盖索引
增加了对覆盖索引(covering index)的支持。覆盖索引允许用户使用INCLUDE 子句将额外的列添加到索引中,对于执行索引 (index-only)扫描非常有用,特别是对于那些不能被B-tree所索引的数据类型。 Greenplum 查询优化器部分支持索引扫描和覆盖索引。
Index-only scans can answer queries from an index alone without accessing the table’s heap, which significantly improves query performance. In addition, covering indexes allow you to add additional columns to an index using the INCLUDE
clause, in order to make the use of index-only scans more effective. See Understanding Index-Only Scans and Covering Indexes for more details.
create table t1 (id int, c1 int, c2 int, c3 int, info text, crt_time timestamp);
create table t2(like t1);
create table t3(like t1);
create index idx_t1_1 on t1 (id) include(c1,c2,c3,info,crt_time);
create index idx_t2_1 on t2 (id,c1,c2,c3,info,crt_time);
create index idx_t3_1 on t3(id);
timing on
insert into t1 select (1000*random())::int,1,1,1,'test',now() from generate_series(1,10000000);
insert into t2 select (1000*random())::int,1,1,1,'test',now() from generate_series(1,10000000);
insert into t3 select (1000*random())::int,1,1,1,'test',now() from generate_series(1,10000000);
explain (analyze,verbose,timing,costs,buffers) select id,c1,c2,c3,info,crt_time from t1 where id=10000;
explain (analyze,verbose,timing,costs,buffers) select id,c1,c2,c3,info,crt_time from t2 where id=10000;
explain (analyze,verbose,timing,costs,buffers) select id,c1,c2,c3,info,crt_time from t3 where id=10000;
analyze t1;
analyze t2;
analyze t3;
select id,c1,c2,c3,info,crt_time from t1 where id=1000;
select id,c1,c2,c3,info,crt_time from t2 where id=1000;
select id,c1,c2,c3,info,crt_time from t3 where id=1000;
create table t66(id int PRIMARY key, c1 text, crt_time timestamp);
insert into t66
SELECT id, md5(id::text),now()
FROM generate_series(1, 2000000) AS id;
create index idx_t66_1 on t66 (id) include(c1);
set optimizer=0;
select id,c1 from t66 where id =100;
视频:https://www.youtube.com/watch?v=J7_xLoq3E20&t=1383s
代码语言:javascript复制lhrgpdb=# set optimizer=0;
SET
时间:76.123 ms
lhrgpdb=# explain select id from t67 where id =100;
QUERY PLAN
-------------------------------------------------------------------------------
Gather Motion 1:1 (slice1; segments: 1) (cost=0.16..8.20 rows=1 width=4)
-> Index Only Scan using t67_pkey on t67 (cost=0.16..8.18 rows=1 width=4)
Index Cond: (id = 100)
Optimizer: Postgres-based planner
(4 行记录)
时间:53.847 ms
lhrgpdb=# set optimizer=1;
SET
时间:73.323 ms
lhrgpdb=# explain select id from t67 where id =100;
QUERY PLAN
----------------------------------------------------------------------------
Gather Motion 1:1 (slice1; segments: 1) (cost=0.00..6.00 rows=1 width=4)
-> Index Scan using t67_pkey on t67 (cost=0.00..6.00 rows=1 width=4)
Index Cond: (id = 100)
Optimizer: GPORCA
(4 行记录)
时间:63.131 ms
lhrgpdb=#
增加列不再需要重写表
Greenplum Database no longer rewrites the table when a column is added to a table (ALTER TABLE ... ADD COLUMN ...
).
ALTER TABLE … ADD COLUMN …不再需要重写表:这个功能是PostgreSQL11引入的特性,通过在pg_attribute系统表中增加atthasmissing和attmissingval两个额外字段,所以GP7自然也支持了该功能,可以有效降低加字段所带来的影响。
Operation | AO Columnar (AOCO) | AO Row (AO) | Heap |
---|---|---|---|
DROP COLUMN | No | No | No |
ALTER COLUMN TYPE [1] | No [2] | Yes | Yes |
ADD COLUMN (w/ non-volative default [3]) | No [2] | No | No |
ADD COLUMN (w/ volative default) | No [2] | Yes | Yes |
ALTER COLUMN SET ENCODING | No [2] | N/A | N/A |
SET ()[4] | Yes | Yes | Yes |
SET ACCESS METHOD | Yes | Yes | Yes |
[1] If the new type is not binary coercible with the old. Otherwise, no rewrite for all cases.
[2] But write/rewrite single column data.
[3] Including NULL
.
[4] If the options differ from the current table. Otherwise, no rewrite for all cases.
参考:https://greenplum.org/alter-table-in-greenplum-7-avoiding-table-rewrite/
分区表性能与功能提升
PostgreSQL declarative table partitioning syntaxis now supported. See About Changes to Table Partitioning in Greenplum 7 for more details.
分区表性能与功能提升:关于PostgreSQL自身分区表的问题我已经写过诸多案例,因为严格来说,从12以后的分区表才能真正算得上是"可用",Greenplum7引入了所有用于表分区的原生PostgreSQL语法,同时保留了旧语法,诸如split partition(拆分分区)、exchange partition(交换分区),另外需要提及的是,12的attach partition只需要在父表上添加ShareUpdateExclusiveLock了,这使得在Greenplum7中使用分区表更加便捷,当然也合并了一些pg_partition_ancestors、pg_partition_root等使用分区表函数
代码语言:javascript复制-- Assuming there's long-running insert
INSERT INTO sales SELECT * FROM ext_sales_data;
-- This will be blocked
ALTER TABLE sales ADD PARTITION march_sales START ('2023-03-01') END ('2023-04-01');
-- This will go through
ALTER TABLE sales ATTACH PARTITION march_sales FOR VALUES FROM ('2023-03-01') TO ('2023-04-01'); -- 这种方式添加分区不会被阻塞
参考:
https://greenplum.org/20-examples-of-greenplum-partition-commands/
https://greenplum.org/partition-in-greenplum-7-whats-new/
https://greenplum.org/partition-in-greenplum-7-recursion-and-inheritance/
Example 1: Creating a Range Partitioned Table
This example creates a sales table partitioned by range using the sale_date column.
代码语言:javascript复制-- (GP7 only)
CREATE TABLE sales(
sale_id SERIAL,
sale_date DATE,
amount DECIMAL
) PARTITION BY RANGE (sale_date);
Then adds a new partition to the sales table for the first half of 2023.
代码语言:javascript复制-- (GP7 only)
CREATE TABLE sales_p1
PARTITION OF sales FOR VALUES FROM ('2023-01-01') TO ('2023-06-30');
Example 2: Classic Range Partitioned Table
For convenience, a single command can be used.
代码语言:javascript复制CREATE TABLE sales(
sale_id SERIAL,
sale_date DATE,
amount DECIMAL
) PARTITION BY RANGE (sale_date)
(
PARTITION p1 START ('2023-01-01') END ('2023-06-30')
);
Example 3: Creating a List Partitioned Table
Create an orders table partitioned by a list of product categories.
代码语言:javascript复制-- (GP7 only)
CREATE TABLE orders (
order_id SERIAL,
product_category TEXT,
order_date DATE
)
PARTITION BY LIST (product_category);
Then add list partitions for the specified product categories.
代码语言:javascript复制-- (GP7 only)
CREATE TABLE orders_p1
PARTITION OF orders FOR VALUES IN ('Electronics', 'Clothing');
Example 4: Classic List Partitioned Table
For convenience, a single command can be used.
代码语言:javascript复制CREATE TABLE orders (
order_id SERIAL,
product_category TEXT,
order_date DATE
) PARTITION BY LIST (product_category)
(
PARTITION p1 VALUES ('Electronics', 'Clothing')
);
Example 5: Creating a Hash Partitioned Table
Create a sensor_data table partitioned by hash on the sensor_id column.
代码语言:javascript复制-- (GP7 only)
CREATE TABLE sensor_data (
sensor_id SERIAL PRIMARY KEY,
timestamp TIMESTAMP,
reading FLOAT
)
PARTITION BY HASH (sensor_id);
Example 6: Adding Hash Partitions
Add hash partitions to the sensor_data table using a modulus-based approach. Note: The classic convenience single command is not supported for hash type partitions
代码语言:javascript复制-- (GP7 only)
CREATE TABLE sensor_data_p1
PARTITION OF sensor_data FOR VALUES WITH (MODULUS 4, REMAINDER 0);
Example 7: Creating a Composite Partitioned Table
Create a composite partitioned table based on both range and list partitioning.
代码语言:javascript复制-- (GP7 only)
CREATE TABLE logs (
log_id SERIAL,
log_date DATE,
log_type TEXT
)
DISTRIBUTED BY (log_id)
PARTITION BY RANGE (log_date) SUBPARTITION BY LIST (log_type);
Example 8: Adding Composite Partitions
Add composite partitions to the logs table for a specific date range and log types.
代码语言:javascript复制-- (GP7 only)
CREATE TABLE logs_p1
PARTITION OF logs FOR VALUES FROM ('2023-01-01') TO ('2023-03-31')
PARTITION BY LIST (log_type);
Example 9: Adding Default Partition
代码语言:javascript复制ALTER TABLE sales ADD DEFAULT PARTITION other;
Example 10: Splitting Default Partition
Split the default partition of the sales table into two partitions.
代码语言:javascript复制ALTER TABLE sales SPLIT DEFAULT PARTITION
START ('2023-12-01') INCLUSIVE
END ('2023-12-31') INCLUSIVE
INTO (PARTITION dec2023, PARTITION other);
Example 11: Detaching Partitions
代码语言:javascript复制--- (GP7 only)
ALTER TABLE sensor_data
DETACH PARTITION sensor_data_p1;
Example 12: Attaching Existing Table
Reattach a detached partition from another table back into the sensor_data table.
代码语言:javascript复制CREATE TABLE sensor_data_old(LIKE sensor_data);
-- add stuff to sensor_data_old... then attach
-- (GP7 only)
ALTER TABLE sensor_data
ATTACH PARTITION sensor_data_old FOR VALUES WITH (MODULUS 3, REMAINDER 0);
Example 13: Checking Partition Information
Retrieve information about partitions and their rules in the database.
代码语言:javascript复制d logs
Partitioned table "public.logs"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
---------- --------- ----------- ---------- -------------------------------------- ---------- -------------- -------------
log_id | integer | | not null | nextval('logs_log_id_seq'::regclass) | plain | |
log_date | date | | | | plain | |
log_type | text | | | | extended | |
Partition key: RANGE (log_date)
Partitions: logs_p1 FOR VALUES FROM ('2023-01-01') TO ('2023-03-31'), PARTITIONED
Distributed by: (log_id)
Access method: heap
代码语言:javascript复制d logs_p1
Partitioned table "public.logs_p1"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
---------- --------- ----------- ---------- -------------------------------------- ---------- -------------- -------------
log_id | integer | | not null | nextval('logs_log_id_seq'::regclass) | plain | |
log_date | date | | | | plain | |
log_type | text | | | | extended | |
Partition of: logs FOR VALUES FROM ('2023-01-01') TO ('2023-03-31')
Partition constraint: ((log_date IS NOT NULL) AND (log_date >= '2023-01-01'::date) AND (log_date < '2023-03-31'::date))
Partition key: LIST (log_type)
Number of partitions: 0
Distributed by: (log_id)
Access method: heap
Example 14: Getting Partition Statistics
Analyze the sales table to update partition statistics for query optimization.
代码语言:javascript复制ANALYZE sales;
Example 15: Truncate a Partition
Remove all rows from the specified partition while keeping its structure intact.
代码语言:javascript复制--(GP7 only)
ALTER TABLE sensor_data TRUNCATE PARTITION FOR (42);
Example 16: Exchange Partition Data
Exchange data between a partition in the main table and a corresponding partition in an archive table.
代码语言:javascript复制CREATE TABLE sensor_data_archive (LIKE sensor_data);
-- add stuff to sensor_data_archive... then exchange
ALTER TABLE sensor_data
EXCHANGE PARTITION FOR (42) WITH TABLE sensor_data_archive;
Example 17: Applying Constraints on Partitions
Apply a check constraint on a partitioned table to enforce data integrity within partitions.
代码语言:javascript复制ALTER TABLE logs
ADD CONSTRAINT check_date CHECK (log_date != '2023-01-01');
Example 18: Creating an indexed Partitioned Tables
Create an index on all the tables in a partitioned table.
代码语言:javascript复制CREATE INDEX index_logs_check_date ON logs(log_date);
Example 19: Rename Partitioned Tables
Rename a partitioned table is same as normal table and will be reflected in the partition heirarchy.
代码语言:javascript复制-- GP7 syntax:
ALTER TABLE sales_p1 RENAME TO sales_p1_backup;
-- GP6 syntax:
ALTER TABLE sales RENAME PARTITION "p1" TO "p1_backup";
Example 20: Dropping Partitions
Remove the specified partitions from the sales table.
代码语言:javascript复制ALTER TABLE sales
DROP PARTITION FOR ('2023-01-01');
Or
代码语言:javascript复制ALTER TABLE sales DROP PARTITION other;
支持了MCV
Multi-column most-common-value (MCV) extended statistics compute the correlation ratio and number of distinct values to generate better plans for queries that test several non-uniformly-distributed columns. This feature is helpful in estimating query memory usage and when combining the statistics from individual columns. See CREATE STATISTICS for more details.
支持了MCV,默认情况下会将多列的选择率进行相乘,通过扩展统计信息可以改进优化器对于此类查询的评估
支持Brin索引
BRIN indexes (Block Range INdexes) use much less space in disk compared to a standard b-tree index for very large tables whose columns have some natural correlation with their physical location within the table.
支持Brin索引,Brin特别适用于物联网、车联网等数据高速持续写入,并需要按时间区间进行分析的时序场景,索引中存储的是这些最小单位的统计信息(最大值,最小值,记录条数,SUM,NULL值条数等),并且大小相较传统Btree要小得多
支持JIT
Just-in-Time (JIT) compilation allows you compile otherwise interpreted queries into compiled code at run time, which provides a performance improvement for long running CPU bound queries, like analytical queries.
支持JIT,即时编译是PostgreSQL11中的新功能之一,这可以减少多余的逻辑操作和虚函数调用,在TPC-H场景下部分查询性能提升明显
hash索引
Hash indexes are supported with the Postgres-based planner and GPORCA.
支持原生PostgreSQL优化器,也支持ORCA(standard_planner 是 PostgreSQL 缺省的优化器),对于GPORCA不支持的特性,GPORCA会自动回到Planner。可以看到,Greenplum7对于两种引擎,都支持了哈希索引。
引入监控视图可以监控长运行的操作
引入了pg_stat_progress_vacuum/copy/create_index等进展视图,执行此类维护性操作时不再是黑盒,有迹可循
Greenplum Database can report the progress of ANALYZE
, CLUSTER
, COPY
, CREATE INDEX
, REINDEX
, and VACUUM
commands during command execution. Greenplum can also report the progress of a running base backup (initiated during gprecoverseg -F) command invocation, allowing you to monitor the progress of these possibly long-running operations.
- gp_stat_progress_analyze
- gp_stat_progress_basebackup
- gp_stat_progress_cluster
- gp_stat_progress_copy
- gp_stat_progress_create_index
- gp_stat_progress_vacuum
参考:https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/admin_guide-managing-progress_reporting.html
https://greenplum.org/progress-reporting-views-in-greenplum-7/
https://greenplum.org/partition-in-greenplum-7-recursion-and-inheritance/
支持生成列
支持了generate column,符合SQL标准,也可以解决序列的诸多"黑洞"
Generated Columns are table columns whose content is computed from other expressions, including references to other columns in the same table, remove the need to use the INSERT
or UPDATE
commands.
CREATE TABLE t (w real, h real, area real GENERATED ALWAYS AS (w*h) STORED);
INSERT INTO t (w, h) VALUES (10, 20);
SELECT * FROM t;
w | h | area
---- ---- ------
10 | 20 | 200
(1 row)
CREATE TABLE foo (a int, b_generated int GENERATED ALWAYS AS (a*2) STORED, c int);
INSERT INTO foo VALUES(1, DEFAULT, 1);
INSERT INTO foo (a,c) VALUES(2, 2);
SELECT * FROM foo;
a | b_generated | c
--- ------------- ---
2 | 4 | 2
1 | 2 | 1
(2 rows)
参考:https://greenplum.org/generated-columns-in-greenplum-7/
AO表特性
AO表分析可以更快
Fast ANALYZE
improves the speed of ANALYZE
for append-optimized tables. You do not need to enable fast ANALYZE
, this is the default and only behaviour when you analyze an append-optimized table.
AO表支持唯一索引、唯一约束和主键
Unique indexes, unique constraints, and primary keys are now supported on append-optimized tables.
引入gppkg v2
Greenplum package utility, gppkg v2
, allows you to install Greenplum Database extensions even when the database is not running.
引入了pgvector
引入了pgvector,分布式 向量化多专场景
Greenplum Database 7 introduces the pgvector module, which provides vector similarity search capabilities for Greenplum Database that enable searching, storing, and querying machine language-generated embeddings at large scale.
更多信息可以参考 https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/relnotes-release-notes.html
UPSERT操作符
UPSERT
operations turn INSERT
operations that would violate constraints into an UPDATE
, or ignore them. See INSERT for more details.
多数据中心灾难恢复解决方案
作为多数据中心灾难恢复解决方案的一部分,数据通过事务日志归档进行复制,从而实现了比以前版本的Greenplum更高效、更低的恢复点目标(RPO)和恢复时间目标(RTO)灾难恢复解决方案。
Multi-data center disaster recovery solution: As a part of the multi-data center disaster recovery solution, data is replicated via transaction log archiving, enabling more efficient and lower recovery point objective (RPO) and recovery time objective (RTO) disaster recovery solutions than previous versions of Greenplum.
资源管理组
Greenplum Database introduces substantial improvements to resource group-based resource management, such as support for Linux Control Groups v2, simplified memory management, and support for disk I/O limits per resource group. Refer to About Changes to Resource Groups for more information about what has changed in resource groups.
资源管理组可针对每一个组别单独设置其特定的CPU、内存和并发限制,提供数据库内部更加灵活和有效的资源管理手段,可以看到Greenplum7也支持了对IO进行限制。(Greenplum社区建议在CentOS 7.x系统中使用资源组管理资源能获得更好性能,如果是CentOS 6.x系统,则会导致明显的数据库性能下降现象,建议升级内核到 2.6.32-696或更高版本)。
改进的文本搜索
The built-in Full Text Search functionality provides data types, functions, operators, index types, and configurations for querying natural language documents. You may also search for phrases (multiple adjacent words) that appear next to each other in a specific order, or with a specified distance between the words.
改进的文本搜索:支持词汇和人工智能驱动的语义搜索,以提供更准确的搜索结果,提供了用于查询自然语言文档的数据类型、函数、运算符、索引类型和配置。
行级策略
Row-leval security allows database administrators to set security policies that filter which rows particular users are permitted to update or view. Refer to About Configuring Row-Level Security Policies for more information.
行级策略:基于角色的安全模型,限制用户查看表数据的权限,使得不同的用户访问一个表时可以看到不同的数据。
数据库维护
VACUUM
operations now clean up any dead ranges from BRIN indexes on append-optimized tables.VACUUM
can now identify pages containing only already frozen tuples in the table’s visibility map and skips these pages, hence reducing the cost of maintaining large tables which contain mostly unchanging data. The newVACUUM
parameterDISABLE_PAGE_SKIPPING
forcesVACUUM
to run against all frozen pages in case the contents of the visibility map are suspect, which should happen only if there is a hardware or software issue causing database corruption.- The new option
SKIP_LOCKED
allowsVACUUM
andANALYZE
to skip relations that cannot lock immediately due to conflicting locks. - The new option
INDEX_CLEANUP
allowsVACUUM
to skip index cleanup. Setting the option tofalse
will makeVACUUM
run as quickly as possible, for example, to avoid imminent transaction ID wraparound. VACUUM
can now avoid unnecessary heap table truncation attempts that require taking an exclusive table lock even when no truncation is possible. This enhancement avoids unnecessary query cancellations on the standby servers.- The new configuration parameter
vacuum_cleanup_index_scale_factor
helps minimize unnecessary index scans duringVACUUM
. - The new table and partition storage parameter
vacuum_index_cleanup
lets you control whether, for a given table ot partition,VACUUM
attempts to remove index entries pointing to dead tuples.
性能
- When the encoding of a table column changes (
ALTER TABLE … ALTER COLUMN … SET ENCODING
), Greenplum Database rewrites only the column data, it no longer rewrites the table. - Greenplum Database optimizes the use of snapshots when using immutable functions. It avoids taking a distributed snapshot and uses the local snapshot, resulting in improved performance for OLTP.
- The sorting speed of
varchar
,text,
andnumeric
fields via “abbreviated” keys has been improved. - Greenplum partitions the shared hash table freelist to reduce contention on multi-CPU-socket servers.
- There are new performance improvements when using atomic operations, rather than a spinlock, to protect an LWLock’s wait queue.
- Greenplum reduces the WAL overhead when building a GiST, GIN, or SP-GiST index; less space on disk is now required for these WAL records and the data replays faster during crash recovery or point-in-time recovery.
- You may optionally use the ICU library for collation support.
- Partitioned tables now support indexes.
- The default size of the sequence cache is changed from
1
(no cache) to20
to increase the performance of insert operations on tables that are defined with aserial
data type using a sequence value.
优化器 Greenplum Query Optimizer (GPORCA)
- The Greenplum Query Optimizer (GPORCA) supports index-only scans on append-optimized and append-optimized, column-oriented tables.
- GPORCA supports backwards index scans.
- GPORCA adds support for new server configuration parameters optimizer_enable_dynamicindexonlyscan and optimizer_enable_push_join_below_union_all.
- GPORCA does not support planning or executing queries on multi-level partitioned tables.
- GPORCA partially supports index-only scans and covering indexes. Refer to the
GPORCA Limitations
Unsupported SQL Query Features topic for a list of unsupported features in this area. - GPORCA now supports Dynamic Partition Elimination (DPE) for right joins
- GPORCA now supports planning queries that involve foreign tables. Queries on foreign tables and queries on partitioned tables that include a foreign table or external table leaf partition can now be planned by GPORCA.
- GPORCA now supports the
CUBE
grouping set result set. - GPORCA now supports planning and running queries that you specify with multiple grouping sets.
AO表 Append-Optimized Tables
- Altering a column type for AO/CO tables requires only rewriting the column files for the specified column instead of the whole table.
VACUUM
can now run against all auxiliary tables of an append-optimized table with the optionAO_AUX_ONLY
.- The catalog table
pg_attribute_encoding
now includes a new columnfilenum
that helps improve efficiency when altering column type for AO/CO tables. - You may now fetch a subset of columns when using the command
COPY TO
from a AOCO table. - The table
pg_appendonly
no longer records append-only storage options, they are now only listed underpg_class.reloptions
, which significantly reduces the size ofpg_appendonly
catalog table. - You may now dynamically update an AOCO table’s column encodings, using the
ALTER TABLE
command. - You many now alter a heap table with a unique index to an append-optimized table with a unique index.
GP7移除的功能
VMware Greenplum Database 7.0 removes these features:
- The previously-deprecated
createlang
anddroplang
utilities. - The Greenplum R Client (GreenplumR).
- Greenplum MapReduce.
- The PL/Container 3.0 Beta extension.
- The
analyzedb
option--skip_root_stats
. - The
gpsys1
utility. - The
gpperfmon
data collection agents, database, and views. - The
ARRAY_NAME
variable. - The
CREATEUSER/NOCREATEUSER
options fromCREATE ROLE
and allied commands. - The
gp_percentile_agg
extension. - VMware Greenplum Database 7 removes support for the QuickLZ compression algorithm. Use the new
gp_quicklz_fallback
server configuration parameter to ensure backward compatibility.
GP7系统视图
https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/ref_guide-system_catalogs-catalog_ref-views.html
- Summary Views aggregate statistics across the Greenplum cluster which display the metrics reported by their corresponding
gp_ view
. - Progress reporting system views for Monitoring Long-Running Operations during the execution of the commands
ANALYZE
,CLUSTER
,CREATE INDEX
,VACUUM
,COPY
andBASE_BACKUP
. - Progress reporting for Monitoring Long-Running Operations during the execution of the commands
ANALYZE
,CLUSTER
,CREATE INDEX
,VACUUM
,COPY
andBASE_BACKUP
. - New function
gp_toolkit.get_column_size(oid)
and viewsgp_toolkit.gp_column_size
, andgp_toolkit.gp_column_size_summary
allow you to view column size and compression ratio for a given AO/AOCO table. - The
pg_stat_` and `pg_statio_
system views now provide information for append-optimized tables and their auxiliary tables. - VMware Greenplum now includes a system catalog table called
pg_sequence
, which contains information about sequences. Note that some information about sequences, such as the name and the schema, is stored in thepg_class
system table. - The new
gp_
system views are cluster-wide views that display from every primary segment the information reported by its correspondingpg_
system view. - The new catalog views pg_stat_wal and pg_stat_slru display WAL information and track simple least-recently-used (SLRU) caches.
- The pg_backend_memory_contexts system view and supporting administration functions report memory contexts and usage for arbitrary backends. See Viewing and Logging Per-Process Memory Usage Information for more information.
- Greenplum Database 7 removes the following system catalog tables and views:
pg_partition_columns
pg_partition_encoding
pg_partition_rule
pg_partition_template
pg_stat_partition_operations
- In the
gp_configuration_history
catalog table, thedesc
column has been renamed todescription
. gp_read_error_log()
is enhanced to detect division by zero, JSON mapping, and unsupported unicode errors encountered during foreign scans of external tables. This feature is not supported for a single segment Greenplum Database cluster when the Greenplum Query Optimizer (GPORCA) is enabled.- The functions
array_position()
andarray_positions()
are now included. - Window functions now support all framing options shown in the SQL:2011 standard, including
RANGE
distancePRECEDING/FOLLOWING
,GROUPS
mode, and frame exclusion options enhancement. - The new function
gp_toolkit.__gp_aoblkdir(regclass)
helps you obtain each block directory entry for a given AO/AOCO table that had or has an index. - The pattern matching behaviour of the
substring()
function is changed. In cases where the pattern can be matched in more than one way, the initial sub-pattern is now treated as matching the least possible amount of text rather than the greatest. For example, a pattern such as%#"aa*#"%
now selects the first group ofa
’s from the input, not the last group.
GP7调优参数
https://greenplum.org/commonly-tuned-parameters-in-gp7/
参考
https://tanzu.vmware.com/content/blog/vmware-greenplum-7-release
https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/relnotes-release-notes.html
https://greenplum.org/alter-table-in-greenplum-7-avoiding-table-rewrite/
https://mp.weixin.qq.com/s/vUitZgjNxqC_cu6OuCjp1w