Greenplum 7 新特性整理

2023-10-16 20:37:17 浏览数 (2)

新特性总结

快速拥有GP7环境

参考:https://www.xmmup.com/zaidockerzhongkuaisutiyangreenplum-7-0-0.html

代码语言:javascript复制
docker rm -f gpdb7
docker run -itd --name gpdb7 -h gpdb7 
  -p 5437:5432 -p 28087:28080  
  -v /sys/fs/cgroup:/sys/fs/cgroup 
  --privileged=true lhrbest/greenplum:7.0.0 
  /usr/sbin/init

docker exec -it gpdb7 bash
su - gpadmin
gpstart -a


[gpadmin@gpdb7 ~]$ psql
psql (12.12)
Type "help" for help.

postgres=# select version();
                                                                                                                      version                                                                                                                       
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 12.12 (Greenplum Database 7.0.0 build commit:0a7a3566873325aca1789ae6f818c80f17a9402d Open Source) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-18), 64-bit compiled on Sep 20 2023 23:29:19 Bhuvnesh C.
(1 row)

postgres=# select * from gp_segment_configuration ;
 dbid | content | role | preferred_role | mode | status | port | hostname | address |                  datadir                   
------ --------- ------ ---------------- ------ -------- ------ ---------- --------- --------------------------------------------
    1 |      -1 | p    | p              | n    | u      | 5432 | gpdb7    | gpdb7   | /opt/greenplum/data/master/gpseg-1
    6 |      -1 | m    | m              | s    | u      | 5433 | gpdb7    | gpdb7   | /opt/greenplum/data/master_standby/gpseg-1
    2 |       0 | p    | p              | n    | u      | 6000 | gpdb7    | gpdb7   | /opt/greenplum/data/primary/gpseg0
    4 |       0 | m    | m              | n    | d      | 7000 | gpdb7    | gpdb7   | /opt/greenplum/data/mirror/gpseg0
    3 |       1 | p    | p              | n    | u      | 6001 | gpdb7    | gpdb7   | /opt/greenplum/data/primary/gpseg1
    5 |       1 | m    | m              | n    | d      | 7001 | gpdb7    | gpdb7   | /opt/greenplum/data/mirror/gpseg1
(6 rows)

重要的新特性

内核

Greenplum 7的内核从9.4.26升级到了12.12,其性能、功能均领先于Greenplum 6

自动启用vacuuum

Automatic Vacuum is now enabled by default for all databases, which automatically performs VACUUM and ANALYZE operations against all catalog tables, as well as runs ANALYZE for all users tables in those databases.

  • autovacuum
  • autovacuum_analyze_scale_factor
  • autovacuum_analyze_threshold
  • autovacuum_freeze_max_age
  • autovacuum_max_workers
  • autovacuum_multixact_freeze_max_age
  • autovacuum_naptime
  • autovacuum_vacuum_cost_delay
  • autovacuum_vacuum_cost_limit
  • autovacuum_vacuum_scale_factor
  • autovacuum_vacuum_threshold
  • gp_autovacuum_scope

参数含义参考: https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/ref_guide-config_params-guc-list.html#autovacuum

代码语言:javascript复制
[gpadmin@gpdb7 ~]$ gpconfig -s autovacuum
Values on all segments are consistent
GUC              : autovacuum
Coordinator value: on
Segment     value: on
[gpadmin@gpdb7 ~]$ gpconfig -s autovacuum_analyze_threshold
Values on all segments are consistent
GUC              : autovacuum_analyze_threshold
Coordinator value: 50
Segment     value: 50
[gpadmin@gpdb7 ~]$ 
[gpadmin@gpdb7 ~]$ gpconfig -s autovacuum_max_workers
Values on all segments are consistent
GUC              : autovacuum_max_workers
Coordinator value: 3
Segment     value: 3
[gpadmin@gpdb7 ~]$ gpconfig -s gp_autovacuum_scope
Values on all segments are consistent
GUC              : gp_autovacuum_scope
Coordinator value: catalog
Segment     value: catalog
[gpadmin@gpdb7 ~]$ 
[gpadmin@gpdb7 ~]$ gpconfig -s autovacuum_naptime
Values on all segments are consistent
GUC              : autovacuum_naptime
Coordinator value: 1min
Segment     value: 1min
[gpadmin@gpdb7 ~]$ 
[gpadmin@gpdb7 ~]$ ps -ef|grep cuum
gpadmin     1252    1236  0 12:17 ?        00:00:00 postgres:  6000, autovacuum launcher   
gpadmin     1259    1241  0 12:17 ?        00:00:00 postgres:  6001, autovacuum launcher   
gpadmin     1279    1273  0 12:17 ?        00:00:00 postgres:  5432, autovacuum launcher   
gpadmin     2903     461  0 12:41 pts/1    00:00:00 grep --color=auto cuum
[gpadmin@gpdb7 ~]$ 



gpconfig -c gp_autostats_mode -v on_change
gpconfig -c gp_autostats_on_change_threshold -v 100000

PostgreSQL中的autovacuum我们已经十分熟悉,会定期针对达到阈值的表执行vacuum,在Greenplum中有所变化,仅对template0会执行,预防可能的事务ID回卷问题

“ PostgreSQL has a separate optional server process called the autovacuum daemon, whose purpose is to automate the execution of VACUUM and ANALYZE commands. Greenplum Database enables the autovacuum daemon to perform VACUUM operations only on the Greenplum Database template database template0. Autovacuum is enabled for template0 because connections are not allowed to template0. The autovacuum daemon performs VACUUM operations on template0 to manage transaction IDs (XIDs) and help avoid transaction ID wraparound issues in template0.

所以Greenplum没有worker这个说法。在以前,执行了大量的增删改之后要手动执行vacuum/analyze,同样系统表也会膨胀,比如频繁使用了临时表,会在pg_class/pg_attribute中留下垃圾,并且系统表膨胀的影响是全局性的,所以系统表也要经常性vacuum,假如系统表遇到不得不做vacuum full的场景就会很头疼。在Greenplum7中对此进行了很大改进,Greenplum会自动清理系统表以及根据可选参数清理一些辅助表,自动收集普通用户表的统计信息。

代码语言:javascript复制
create table t8 as select * from pg_class limit 5;
create table t9 as select * from pg_class limit 50;

select * from pg_stat_all_tables  where schemaname='public'  and relname = 't9';

SELECT * from pg_stats d where d.tablename='t9';

支持仅索引扫描和覆盖索引

增加了对覆盖索引(covering index)的支持。覆盖索引允许用户使用INCLUDE 子句将额外的列添加到索引中,对于执行索引 (index-only)扫描非常有用,特别是对于那些不能被B-tree所索引的数据类型。 Greenplum 查询优化器部分支持索引扫描和覆盖索引。

Index-only scans can answer queries from an index alone without accessing the table’s heap, which significantly improves query performance. In addition, covering indexes allow you to add additional columns to an index using the INCLUDE clause, in order to make the use of index-only scans more effective. See Understanding Index-Only Scans and Covering Indexes for more details.

代码语言:javascript复制
create table t1 (id int, c1 int, c2 int, c3 int, info text, crt_time timestamp); 
create table t2(like t1);
create table t3(like t1);  
create index idx_t1_1 on t1 (id) include(c1,c2,c3,info,crt_time);  
create index idx_t2_1 on t2 (id,c1,c2,c3,info,crt_time);  
create index idx_t3_1 on t3(id);

timing on
insert into t1 select (1000*random())::int,1,1,1,'test',now() from generate_series(1,10000000); 
insert into t2 select (1000*random())::int,1,1,1,'test',now() from generate_series(1,10000000); 
insert into t3 select (1000*random())::int,1,1,1,'test',now() from generate_series(1,10000000); 


explain (analyze,verbose,timing,costs,buffers) select id,c1,c2,c3,info,crt_time from t1 where id=10000; 
explain (analyze,verbose,timing,costs,buffers) select id,c1,c2,c3,info,crt_time from t2 where id=10000; 
explain (analyze,verbose,timing,costs,buffers) select id,c1,c2,c3,info,crt_time from t3 where id=10000; 

analyze t1;
analyze t2;
analyze t3;


select id,c1,c2,c3,info,crt_time from t1 where id=1000; 
select id,c1,c2,c3,info,crt_time from t2 where id=1000; 
select id,c1,c2,c3,info,crt_time from t3 where id=1000; 




create table t66(id int PRIMARY key, c1 text, crt_time timestamp); 

insert into t66
SELECT id, md5(id::text),now()
FROM generate_series(1, 2000000) AS id;


create index idx_t66_1 on t66 (id) include(c1);


set optimizer=0;
select id,c1 from t66 where id =100;

视频:https://www.youtube.com/watch?v=J7_xLoq3E20&t=1383s

代码语言:javascript复制
lhrgpdb=# set optimizer=0;
SET
时间:76.123 ms
lhrgpdb=# explain select id from t67 where id =100;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Gather Motion 1:1  (slice1; segments: 1)  (cost=0.16..8.20 rows=1 width=4)
   ->  Index Only Scan using t67_pkey on t67  (cost=0.16..8.18 rows=1 width=4)
         Index Cond: (id = 100)
 Optimizer: Postgres-based planner
(4 行记录)


时间:53.847 ms
lhrgpdb=# set optimizer=1;
SET
时间:73.323 ms
lhrgpdb=# explain select id from t67 where id =100;
                                 QUERY PLAN
----------------------------------------------------------------------------
 Gather Motion 1:1  (slice1; segments: 1)  (cost=0.00..6.00 rows=1 width=4)
   ->  Index Scan using t67_pkey on t67  (cost=0.00..6.00 rows=1 width=4)
         Index Cond: (id = 100)
 Optimizer: GPORCA
(4 行记录)


时间:63.131 ms
lhrgpdb=#

增加列不再需要重写表

Greenplum Database no longer rewrites the table when a column is added to a table (ALTER TABLE ... ADD COLUMN ...).

ALTER TABLE … ADD COLUMN …不再需要重写表:这个功能是PostgreSQL11引入的特性,通过在pg_attribute系统表中增加atthasmissing和attmissingval两个额外字段,所以GP7自然也支持了该功能,可以有效降低加字段所带来的影响。

Operation

AO Columnar (AOCO)

AO Row (AO)

Heap

DROP COLUMN

No

No

No

ALTER COLUMN TYPE [1]

No [2]

Yes

Yes

ADD COLUMN (w/ non-volative default [3])

No [2]

No

No

ADD COLUMN (w/ volative default)

No [2]

Yes

Yes

ALTER COLUMN SET ENCODING

No [2]

N/A

N/A

SET ()[4]

Yes

Yes

Yes

SET ACCESS METHOD

Yes

Yes

Yes

[1] If the new type is not binary coercible with the old. Otherwise, no rewrite for all cases.

[2] But write/rewrite single column data. [3] Including NULL. [4] If the options differ from the current table. Otherwise, no rewrite for all cases.

参考:https://greenplum.org/alter-table-in-greenplum-7-avoiding-table-rewrite/

分区表性能与功能提升

PostgreSQL declarative table partitioning syntaxis now supported. See About Changes to Table Partitioning in Greenplum 7 for more details.

分区表性能与功能提升:关于PostgreSQL自身分区表的问题我已经写过诸多案例,因为严格来说,从12以后的分区表才能真正算得上是"可用",Greenplum7引入了所有用于表分区的原生PostgreSQL语法,同时保留了旧语法,诸如split partition(拆分分区)、exchange partition(交换分区),另外需要提及的是,12的attach partition只需要在父表上添加ShareUpdateExclusiveLock了,这使得在Greenplum7中使用分区表更加便捷,当然也合并了一些pg_partition_ancestors、pg_partition_root等使用分区表函数

代码语言:javascript复制
-- Assuming there's long-running insert
INSERT INTO sales SELECT * FROM ext_sales_data;

-- This will be blocked
ALTER TABLE sales ADD PARTITION march_sales START ('2023-03-01') END ('2023-04-01');

-- This will go through
ALTER TABLE sales ATTACH PARTITION march_sales FOR VALUES FROM ('2023-03-01') TO ('2023-04-01');  -- 这种方式添加分区不会被阻塞

参考:

https://greenplum.org/20-examples-of-greenplum-partition-commands/

https://greenplum.org/partition-in-greenplum-7-whats-new/

https://greenplum.org/partition-in-greenplum-7-recursion-and-inheritance/

Example 1: Creating a Range Partitioned Table

This example creates a sales table partitioned by range using the sale_date column.

代码语言:javascript复制
-- (GP7 only)
CREATE TABLE sales(
    sale_id SERIAL,
    sale_date DATE,
    amount DECIMAL
) PARTITION BY RANGE (sale_date);

Then adds a new partition to the sales table for the first half of 2023.

代码语言:javascript复制
-- (GP7 only)
CREATE TABLE sales_p1
PARTITION OF sales FOR VALUES FROM ('2023-01-01') TO ('2023-06-30');
Example 2: Classic Range Partitioned Table

For convenience, a single command can be used.

代码语言:javascript复制
CREATE TABLE sales(
    sale_id SERIAL,
    sale_date DATE,
    amount DECIMAL
) PARTITION BY RANGE (sale_date)
(
    PARTITION p1 START ('2023-01-01') END ('2023-06-30')
);
Example 3: Creating a List Partitioned Table

Create an orders table partitioned by a list of product categories.

代码语言:javascript复制
-- (GP7 only)
CREATE TABLE orders (
    order_id SERIAL,
    product_category TEXT,
    order_date DATE
)
PARTITION BY LIST (product_category);

Then add list partitions for the specified product categories.

代码语言:javascript复制
-- (GP7 only)
CREATE TABLE orders_p1
PARTITION OF orders FOR VALUES IN ('Electronics', 'Clothing');
Example 4: Classic List Partitioned Table

For convenience, a single command can be used.

代码语言:javascript复制
CREATE TABLE orders (
    order_id SERIAL,
    product_category TEXT,
    order_date DATE
) PARTITION BY LIST (product_category)
(
    PARTITION p1 VALUES ('Electronics', 'Clothing')
);
Example 5: Creating a Hash Partitioned Table

Create a sensor_data table partitioned by hash on the sensor_id column.

代码语言:javascript复制
-- (GP7 only)
CREATE TABLE sensor_data (
    sensor_id SERIAL PRIMARY KEY,
    timestamp TIMESTAMP,
    reading FLOAT
)
PARTITION BY HASH (sensor_id);
Example 6: Adding Hash Partitions

Add hash partitions to the sensor_data table using a modulus-based approach. Note: The classic convenience single command is not supported for hash type partitions

代码语言:javascript复制
-- (GP7 only)
CREATE TABLE sensor_data_p1
PARTITION OF sensor_data FOR VALUES WITH (MODULUS 4, REMAINDER 0);
Example 7: Creating a Composite Partitioned Table

Create a composite partitioned table based on both range and list partitioning.

代码语言:javascript复制
-- (GP7 only)
CREATE TABLE logs (
    log_id SERIAL,
    log_date DATE,
    log_type TEXT
)
DISTRIBUTED BY (log_id)
PARTITION BY RANGE (log_date) SUBPARTITION BY LIST (log_type);
Example 8: Adding Composite Partitions

Add composite partitions to the logs table for a specific date range and log types.

代码语言:javascript复制
-- (GP7 only)
CREATE TABLE logs_p1
PARTITION OF logs FOR VALUES FROM ('2023-01-01') TO ('2023-03-31')
PARTITION BY LIST (log_type);
Example 9: Adding Default Partition
代码语言:javascript复制
ALTER TABLE sales ADD DEFAULT PARTITION other;
Example 10: Splitting Default Partition

Split the default partition of the sales table into two partitions.

代码语言:javascript复制
ALTER TABLE sales SPLIT DEFAULT PARTITION
  START ('2023-12-01') INCLUSIVE
  END ('2023-12-31') INCLUSIVE
INTO (PARTITION dec2023, PARTITION other);
Example 11: Detaching Partitions
代码语言:javascript复制
--- (GP7 only)
ALTER TABLE sensor_data
DETACH PARTITION sensor_data_p1;
Example 12: Attaching Existing Table

Reattach a detached partition from another table back into the sensor_data table.

代码语言:javascript复制
CREATE TABLE sensor_data_old(LIKE sensor_data);

-- add stuff to sensor_data_old... then attach

-- (GP7 only)
ALTER TABLE sensor_data
ATTACH PARTITION sensor_data_old FOR VALUES WITH (MODULUS 3, REMAINDER 0);
Example 13: Checking Partition Information

Retrieve information about partitions and their rules in the database.

代码语言:javascript复制
d  logs
                                             Partitioned table "public.logs"
  Column  |  Type   | Collation | Nullable |               Default                | Storage  | Stats target | Description
---------- --------- ----------- ---------- -------------------------------------- ---------- -------------- -------------
 log_id   | integer |           | not null | nextval('logs_log_id_seq'::regclass) | plain    |              |
 log_date | date    |           |          |                                      | plain    |              |
 log_type | text    |           |          |                                      | extended |              |
Partition key: RANGE (log_date)
Partitions: logs_p1 FOR VALUES FROM ('2023-01-01') TO ('2023-03-31'), PARTITIONED
Distributed by: (log_id)
Access method: heap
代码语言:javascript复制
d  logs_p1
                                            Partitioned table "public.logs_p1"
  Column  |  Type   | Collation | Nullable |               Default                | Storage  | Stats target | Description
---------- --------- ----------- ---------- -------------------------------------- ---------- -------------- -------------
 log_id   | integer |           | not null | nextval('logs_log_id_seq'::regclass) | plain    |              |
 log_date | date    |           |          |                                      | plain    |              |
 log_type | text    |           |          |                                      | extended |              |
Partition of: logs FOR VALUES FROM ('2023-01-01') TO ('2023-03-31')
Partition constraint: ((log_date IS NOT NULL) AND (log_date >= '2023-01-01'::date) AND (log_date < '2023-03-31'::date))
Partition key: LIST (log_type)
Number of partitions: 0
Distributed by: (log_id)
Access method: heap
Example 14: Getting Partition Statistics

Analyze the sales table to update partition statistics for query optimization.

代码语言:javascript复制
ANALYZE sales;
Example 15: Truncate a Partition

Remove all rows from the specified partition while keeping its structure intact.

代码语言:javascript复制
--(GP7 only)
ALTER TABLE sensor_data TRUNCATE PARTITION FOR (42);
Example 16: Exchange Partition Data

Exchange data between a partition in the main table and a corresponding partition in an archive table.

代码语言:javascript复制
CREATE TABLE sensor_data_archive (LIKE sensor_data);

-- add stuff to sensor_data_archive... then exchange

ALTER TABLE sensor_data
EXCHANGE PARTITION FOR (42) WITH TABLE sensor_data_archive;
Example 17: Applying Constraints on Partitions

Apply a check constraint on a partitioned table to enforce data integrity within partitions.

代码语言:javascript复制
ALTER TABLE logs
ADD CONSTRAINT check_date CHECK (log_date != '2023-01-01');
Example 18: Creating an indexed Partitioned Tables

Create an index on all the tables in a partitioned table.

代码语言:javascript复制
CREATE INDEX index_logs_check_date ON logs(log_date);
Example 19: Rename Partitioned Tables

Rename a partitioned table is same as normal table and will be reflected in the partition heirarchy.

代码语言:javascript复制
-- GP7 syntax:
ALTER TABLE sales_p1 RENAME TO sales_p1_backup;

-- GP6 syntax:
ALTER TABLE sales RENAME PARTITION "p1" TO "p1_backup";
Example 20: Dropping Partitions

Remove the specified partitions from the sales table.

代码语言:javascript复制
ALTER TABLE sales
DROP PARTITION FOR ('2023-01-01');

Or

代码语言:javascript复制
ALTER TABLE sales DROP PARTITION other;

支持了MCV

Multi-column most-common-value (MCV) extended statistics compute the correlation ratio and number of distinct values to generate better plans for queries that test several non-uniformly-distributed columns. This feature is helpful in estimating query memory usage and when combining the statistics from individual columns. See CREATE STATISTICS for more details.

支持了MCV,默认情况下会将多列的选择率进行相乘,通过扩展统计信息可以改进优化器对于此类查询的评估

支持Brin索引

BRIN indexes (Block Range INdexes) use much less space in disk compared to a standard b-tree index for very large tables whose columns have some natural correlation with their physical location within the table.

支持Brin索引,Brin特别适用于物联网、车联网等数据高速持续写入,并需要按时间区间进行分析的时序场景,索引中存储的是这些最小单位的统计信息(最大值,最小值,记录条数,SUM,NULL值条数等),并且大小相较传统Btree要小得多

支持JIT

Just-in-Time (JIT) compilation allows you compile otherwise interpreted queries into compiled code at run time, which provides a performance improvement for long running CPU bound queries, like analytical queries.

支持JIT,即时编译是PostgreSQL11中的新功能之一,这可以减少多余的逻辑操作和虚函数调用,在TPC-H场景下部分查询性能提升明显

hash索引

Hash indexes are supported with the Postgres-based planner and GPORCA.

支持原生PostgreSQL优化器,也支持ORCA(standard_planner 是 PostgreSQL 缺省的优化器),对于GPORCA不支持的特性,GPORCA会自动回到Planner。可以看到,Greenplum7对于两种引擎,都支持了哈希索引。

引入监控视图可以监控长运行的操作

引入了pg_stat_progress_vacuum/copy/create_index等进展视图,执行此类维护性操作时不再是黑盒,有迹可循

Greenplum Database can report the progress of ANALYZE, CLUSTER, COPY, CREATE INDEX, REINDEX, and VACUUM commands during command execution. Greenplum can also report the progress of a running base backup (initiated during gprecoverseg -F) command invocation, allowing you to monitor the progress of these possibly long-running operations.

  • gp_stat_progress_analyze
  • gp_stat_progress_basebackup
  • gp_stat_progress_cluster
  • gp_stat_progress_copy
  • gp_stat_progress_create_index
  • gp_stat_progress_vacuum

参考:https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/admin_guide-managing-progress_reporting.html

https://greenplum.org/progress-reporting-views-in-greenplum-7/

https://greenplum.org/partition-in-greenplum-7-recursion-and-inheritance/

支持生成列

支持了generate column,符合SQL标准,也可以解决序列的诸多"黑洞"

Generated Columns are table columns whose content is computed from other expressions, including references to other columns in the same table, remove the need to use the INSERT or UPDATE commands.

代码语言:javascript复制
CREATE TABLE t (w real, h real, area real GENERATED ALWAYS AS (w*h) STORED);
INSERT INTO t (w, h) VALUES (10, 20);

SELECT * FROM t;
 w  | h  | area
---- ---- ------
 10 | 20 |  200
(1 row)


CREATE TABLE foo (a int, b_generated int GENERATED ALWAYS AS (a*2) STORED, c int);
INSERT INTO foo VALUES(1, DEFAULT, 1);
INSERT INTO foo (a,c) VALUES(2, 2);

SELECT * FROM foo;
 a | b_generated | c 
--- ------------- ---
 2 |           4 | 2
 1 |           2 | 1
(2 rows)

参考:https://greenplum.org/generated-columns-in-greenplum-7/

AO表特性

AO表分析可以更快

Fast ANALYZE improves the speed of ANALYZE for append-optimized tables. You do not need to enable fast ANALYZE, this is the default and only behaviour when you analyze an append-optimized table.

AO表支持唯一索引、唯一约束和主键

Unique indexes, unique constraints, and primary keys are now supported on append-optimized tables.

引入gppkg v2

Greenplum package utility, gppkg v2, allows you to install Greenplum Database extensions even when the database is not running.

引入了pgvector

引入了pgvector,分布式 向量化多专场景

Greenplum Database 7 introduces the pgvector module, which provides vector similarity search capabilities for Greenplum Database that enable searching, storing, and querying machine language-generated embeddings at large scale.

更多信息可以参考 https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/relnotes-release-notes.html

UPSERT操作符

UPSERT operations turn INSERT operations that would violate constraints into an UPDATE, or ignore them. See INSERT for more details.

多数据中心灾难恢复解决方案

作为多数据中心灾难恢复解决方案的一部分,数据通过事务日志归档进行复制,从而实现了比以前版本的Greenplum更高效、更低的恢复点目标(RPO)和恢复时间目标(RTO)灾难恢复解决方案。

Multi-data center disaster recovery solution: As a part of the multi-data center disaster recovery solution, data is replicated via transaction log archiving, enabling more efficient and lower recovery point objective (RPO) and recovery time objective (RTO) disaster recovery solutions than previous versions of Greenplum.

资源管理组

Greenplum Database introduces substantial improvements to resource group-based resource management, such as support for Linux Control Groups v2, simplified memory management, and support for disk I/O limits per resource group. Refer to About Changes to Resource Groups for more information about what has changed in resource groups.

资源管理组可针对每一个组别单独设置其特定的CPU、内存和并发限制,提供数据库内部更加灵活和有效的资源管理手段,可以看到Greenplum7也支持了对IO进行限制。(Greenplum社区建议在CentOS 7.x系统中使用资源组管理资源能获得更好性能,如果是CentOS 6.x系统,则会导致明显的数据库性能下降现象,建议升级内核到 2.6.32-696或更高版本)。

改进的文本搜索

The built-in Full Text Search functionality provides data types, functions, operators, index types, and configurations for querying natural language documents. You may also search for phrases (multiple adjacent words) that appear next to each other in a specific order, or with a specified distance between the words.

改进的文本搜索:支持词汇和人工智能驱动的语义搜索,以提供更准确的搜索结果,提供了用于查询自然语言文档的数据类型、函数、运算符、索引类型和配置。

行级策略

Row-leval security allows database administrators to set security policies that filter which rows particular users are permitted to update or view. Refer to About Configuring Row-Level Security Policies for more information.

行级策略:基于角色的安全模型,限制用户查看表数据的权限,使得不同的用户访问一个表时可以看到不同的数据。

数据库维护

  • VACUUM operations now clean up any dead ranges from BRIN indexes on append-optimized tables.
  • VACUUM can now identify pages containing only already frozen tuples in the table’s visibility map and skips these pages, hence reducing the cost of maintaining large tables which contain mostly unchanging data. The new VACUUM parameter DISABLE_PAGE_SKIPPING forces VACUUM to run against all frozen pages in case the contents of the visibility map are suspect, which should happen only if there is a hardware or software issue causing database corruption.
  • The new option SKIP_LOCKED allows VACUUM and ANALYZE to skip relations that cannot lock immediately due to conflicting locks.
  • The new option INDEX_CLEANUP allows VACUUM to skip index cleanup. Setting the option to false will make VACUUM run as quickly as possible, for example, to avoid imminent transaction ID wraparound.
  • VACUUM can now avoid unnecessary heap table truncation attempts that require taking an exclusive table lock even when no truncation is possible. This enhancement avoids unnecessary query cancellations on the standby servers.
  • The new configuration parameter vacuum_cleanup_index_scale_factor helps minimize unnecessary index scans during VACUUM.
  • The new table and partition storage parameter vacuum_index_cleanup lets you control whether, for a given table ot partition, VACUUM attempts to remove index entries pointing to dead tuples.

性能

  • When the encoding of a table column changes (ALTER TABLE … ALTER COLUMN … SET ENCODING), Greenplum Database rewrites only the column data, it no longer rewrites the table.
  • Greenplum Database optimizes the use of snapshots when using immutable functions. It avoids taking a distributed snapshot and uses the local snapshot, resulting in improved performance for OLTP.
  • The sorting speed of varchar, text, and numeric fields via “abbreviated” keys has been improved.
  • Greenplum partitions the shared hash table freelist to reduce contention on multi-CPU-socket servers.
  • There are new performance improvements when using atomic operations, rather than a spinlock, to protect an LWLock’s wait queue.
  • Greenplum reduces the WAL overhead when building a GiST, GIN, or SP-GiST index; less space on disk is now required for these WAL records and the data replays faster during crash recovery or point-in-time recovery.
  • You may optionally use the ICU library for collation support.
  • Partitioned tables now support indexes.
  • The default size of the sequence cache is changed from 1 (no cache) to 20 to increase the performance of insert operations on tables that are defined with a serial data type using a sequence value.

优化器 Greenplum Query Optimizer (GPORCA)

  • The Greenplum Query Optimizer (GPORCA) supports index-only scans on append-optimized and append-optimized, column-oriented tables.
  • GPORCA supports backwards index scans.
  • GPORCA adds support for new server configuration parameters optimizer_enable_dynamicindexonlyscan and optimizer_enable_push_join_below_union_all.
  • GPORCA does not support planning or executing queries on multi-level partitioned tables.
  • GPORCA partially supports index-only scans and covering indexes. Refer to the GPORCA Limitations Unsupported SQL Query Features topic for a list of unsupported features in this area.
  • GPORCA now supports Dynamic Partition Elimination (DPE) for right joins
  • GPORCA now supports planning queries that involve foreign tables. Queries on foreign tables and queries on partitioned tables that include a foreign table or external table leaf partition can now be planned by GPORCA.
  • GPORCA now supports the CUBE grouping set result set.
  • GPORCA now supports planning and running queries that you specify with multiple grouping sets.

AO表 Append-Optimized Tables

  • Altering a column type for AO/CO tables requires only rewriting the column files for the specified column instead of the whole table.
  • VACUUM can now run against all auxiliary tables of an append-optimized table with the option AO_AUX_ONLY.
  • The catalog table pg_attribute_encoding now includes a new column filenum that helps improve efficiency when altering column type for AO/CO tables.
  • You may now fetch a subset of columns when using the command COPY TO from a AOCO table.
  • The table pg_appendonly no longer records append-only storage options, they are now only listed under pg_class.reloptions, which significantly reduces the size of pg_appendonly catalog table.
  • You may now dynamically update an AOCO table’s column encodings, using the ALTER TABLE command.
  • You many now alter a heap table with a unique index to an append-optimized table with a unique index.

GP7移除的功能

VMware Greenplum Database 7.0 removes these features:

  • The previously-deprecated createlang and droplang utilities.
  • The Greenplum R Client (GreenplumR).
  • Greenplum MapReduce.
  • The PL/Container 3.0 Beta extension.
  • The analyzedb option --skip_root_stats.
  • The gpsys1 utility.
  • The gpperfmon data collection agents, database, and views.
  • The ARRAY_NAME variable.
  • The CREATEUSER/NOCREATEUSER options from CREATE ROLE and allied commands.
  • The gp_percentile_agg extension.
  • VMware Greenplum Database 7 removes support for the QuickLZ compression algorithm. Use the new gp_quicklz_fallback server configuration parameter to ensure backward compatibility.

GP7系统视图

https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/ref_guide-system_catalogs-catalog_ref-views.html

  • Summary Views aggregate statistics across the Greenplum cluster which display the metrics reported by their corresponding gp_ view.
  • Progress reporting system views for Monitoring Long-Running Operations during the execution of the commands ANALYZE, CLUSTER, CREATE INDEX, VACUUM, COPY and BASE_BACKUP.
  • Progress reporting for Monitoring Long-Running Operations during the execution of the commands ANALYZE, CLUSTER, CREATE INDEX, VACUUM, COPY and BASE_BACKUP.
  • New function gp_toolkit.get_column_size(oid) and views gp_toolkit.gp_column_size, and gp_toolkit.gp_column_size_summary allow you to view column size and compression ratio for a given AO/AOCO table.
  • The pg_stat_` and `pg_statio_ system views now provide information for append-optimized tables and their auxiliary tables.
  • VMware Greenplum now includes a system catalog table called pg_sequence, which contains information about sequences. Note that some information about sequences, such as the name and the schema, is stored in the pg_class system table.
  • The new gp_ system views are cluster-wide views that display from every primary segment the information reported by its corresponding pg_ system view.
  • The new catalog views pg_stat_wal and pg_stat_slru display WAL information and track simple least-recently-used (SLRU) caches.
  • The pg_backend_memory_contexts system view and supporting administration functions report memory contexts and usage for arbitrary backends. See Viewing and Logging Per-Process Memory Usage Information for more information.
  • Greenplum Database 7 removes the following system catalog tables and views:
  • pg_partition_columns
  • pg_partition_encoding
  • pg_partition_rule
  • pg_partition_template
  • pg_stat_partition_operations
  • In the gp_configuration_history catalog table, the desc column has been renamed to description.
  • gp_read_error_log() is enhanced to detect division by zero, JSON mapping, and unsupported unicode errors encountered during foreign scans of external tables. This feature is not supported for a single segment Greenplum Database cluster when the Greenplum Query Optimizer (GPORCA) is enabled.
  • The functions array_position() and array_positions() are now included.
  • Window functions now support all framing options shown in the SQL:2011 standard, including RANGE distance PRECEDING/FOLLOWING, GROUPS mode, and frame exclusion options enhancement.
  • The new function gp_toolkit.__gp_aoblkdir(regclass) helps you obtain each block directory entry for a given AO/AOCO table that had or has an index.
  • The pattern matching behaviour of the substring() function is changed. In cases where the pattern can be matched in more than one way, the initial sub-pattern is now treated as matching the least possible amount of text rather than the greatest. For example, a pattern such as %#"aa*#"% now selects the first group of a’s from the input, not the last group.

GP7调优参数

https://greenplum.org/commonly-tuned-parameters-in-gp7/

参考

https://tanzu.vmware.com/content/blog/vmware-greenplum-7-release

https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/relnotes-release-notes.html

https://greenplum.org/alter-table-in-greenplum-7-avoiding-table-rewrite/

https://mp.weixin.qq.com/s/vUitZgjNxqC_cu6OuCjp1w

0 人点赞