GPCC参数metrics_collector配置错误导致GreenPlum启动报错

2023-04-27 13:24:53 浏览数 (3)

现象

代码语言:javascript复制
[gpadmin@mdw1 ~]$ gpstart -a
20230116:12:58:42:008927 gpstart:mdw1:gpadmin-[INFO]:-Starting gpstart with args: -a
20230116:12:58:42:008927 gpstart:mdw1:gpadmin-[INFO]:-Gathering information and validating the environment...
20230116:12:58:42:008927 gpstart:mdw1:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 6.19.1 build commit:0e314744a460630073b46cea7b7cf20a81e3da63 Open Source'
20230116:12:58:42:008927 gpstart:mdw1:gpadmin-[INFO]:-Greenplum Catalog Version: '301908232'
20230116:12:58:42:008927 gpstart:mdw1:gpadmin-[INFO]:-Starting Master instance in admin mode
20230116:12:58:42:008927 gpstart:mdw1:gpadmin-[CRITICAL]:-Failed to start Master instance in admin mode
20230116:12:58:42:008927 gpstart:mdw1:gpadmin-[CRITICAL]:-Error occurred: non-zero rc: 1
 Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /data/gpdb/master/gpseg-1/ -l /data/gpdb/master/gpseg-1//pg_log/startup.log -w -t 600 -o " -p 5432 -c gp_role=utility " start'
rc=1, stdout='waiting for server to start.... stopped waiting
', stderr='pg_ctl: could not start server
Examine the log output.
'
[gpadmin@mdw1 ~]$ tailf /data/gpdb/master/gpseg-1//pg_log/startup.log
2023-01-16 12:58:59.464993 CST,,,p8992,th834783360,,,,0,,,seg-1,,,,,"LOG","00000","registering background worker ""sweeper process""",,,,,,,,"RegisterBackgroundWorker","bgworker.c",774,
2023-01-16 12:58:59.465304 CST,,,p8992,th834783360,,,,0,,,seg-1,,,,,"FATAL","58P01","could not access file ""metrics_collector"": No such file or directory",,,,,,,,"internal_load_library","dfmgr.c",202,1    0xbef3fc postgres errstart (elog.c:557)
2    0xbf456d postgres <symbol not found> (dfmgr.c:199)
3    0xbf4f54 postgres load_file (dfmgr.c:156)
4    0xc083a4 postgres process_shared_preload_libraries (miscinit.c:1378)
5    0xa0d6e3 postgres PostmasterMain (postmaster.c:1151)
6    0x6b0871 postgres main (main.c:205)
7    0x7f522e7ed3d5 libc.so.6 __libc_start_main   0xf5
8    0x6bc58c postgres <symbol not found>   0x6bc58c

分析

从启动日志“2023-01-16 12:58:59.465304 CST,,,p8992,th834783360,,,,0,,,seg-1,,,,,"FATAL","58P01","could not access file ""metrics_collector"": No such file or directory",,,,,,,,"internal_load_library","dfmgr.c",202,1 0xbef3fc postgres errstart (elog.c:557)”可以看到应该是metrics_collector的问题,这个值是参数文件postgresql.conf中的shared_preload_libraries的值,用于开启gpcc的指标监控。

报错,应该是gpcc安装有错误,然后启动数据库导致的。

若是GPCC安装成功,则会在如下位置有库文件,否则不能随便重启GreenPlum,会导致启动失败:

代码语言:javascript复制
[root@lhrgp40 /]# find /usr/local -name metrics_collector*
/usr/local/greenplum-db-6.19.3/share/postgresql/extension/metrics_collector--1.0.sql
/usr/local/greenplum-db-6.19.3/share/postgresql/extension/metrics_collector.control
/usr/local/greenplum-db-6.19.3/lib/postgresql/metrics_collector.so
[root@lhrgp40 /]# 
[gpadmin@lhrgp40 ~]$ ll $GPHOME/share/postgresql/extension/gp_wlm*
-rw-r--r-- 1 gpadmin gpadmin 856 Dec  6 12:27 /usr/local/greenplum-db-6.19.3/share/postgresql/extension/gp_wlm--0.1.sql
-rw-r--r-- 1 gpadmin gpadmin 232 Dec  6 12:27 /usr/local/greenplum-db-6.19.3/share/postgresql/extension/gp_wlm.control
[gpadmin@lhrgp40 ~]$ ll $GPHOME/share/postgresql/extension/metrics_collector*
-rw-r--r-- 1 gpadmin gpadmin 846 Dec  6 12:27 /usr/local/greenplum-db-6.19.3/share/postgresql/extension/metrics_collector--1.0.sql
-rw-r--r-- 1 gpadmin gpadmin 233 Dec  6 12:27 /usr/local/greenplum-db-6.19.3/share/postgresql/extension/metrics_collector.control
[gpadmin@lhrgp40 ~]$ ll $GPHOME/lib/postgresql/metrics_collector.so
-rwxr-xr-x 1 gpadmin gpadmin 3357064 Dec  6 12:27 /usr/local/greenplum-db-6.19.3/lib/postgresql/metrics_collector.so
[gpadmin@lhrgp40 ~]$ 
[gpadmin@lhrgp40 ~]$ gppkg -q --all
20230116:14:58:39:020317 gppkg:lhrgp40:gpadmin-[INFO]:-Starting gppkg with args: -q --all
MetricsCollector-6.8.3_gp_6.19.3

解决

1、先修复master实例,将参数文件postgresql.conf中的shared_preload_libraries的值清空

2、再修改segment实例,将参数文件postgresql.conf中的shared_preload_libraries的值清空

3、尽快启动GreenPlum实例,命令gpstart -a

4、再修复mirror实例的参数文件,将参数文件postgresql.conf中的shared_preload_libraries的值清空

5、最后再单独启动mirror实例,启动方式:

代码语言:javascript复制
nohup  /usr/local/greenplum-db-6.19.1/bin/postgres -D /data/gpdb/mirror/gpseg5 -p 7002 &

segment的配置可以在master实例上查看:

代码语言:javascript复制
 select * from gp_segment_configuration order by 2,1 ;

最后重新安装gpcc,请参考:https://www.xmmup.com/greenplumguanfangjiankonggongjugpcc-6deanzhuanghexiezai.html

postgresql.conf参数文件的位置

代码语言:javascript复制
[gpadmin@lhrgp40 ~]$ ps -ef|grep green
gpadmin    520     1  0 14:28 pts/0    00:00:07 /usr/local/greenplum-cc-6.8.3/bin/gpccws -W masterport5432e
gpadmin    672     1  0 14:28 ?        00:00:02 /usr/local/greenplum-cc-6.8.3/bin/ccagent -udpport 9898 -rpcaddr lhrgp40:8899 masterport5432e
gpadmin   1845     1  0 14:33 ?        00:00:21 /usr/local/greenplum-db-6.19.3/bin/postgres -D /opt/greenplum/data/master/gpseg-1 -p 5432 -E
gpadmin  15037 15036  0 15:28 ?        00:00:00 addr2line -s -e /usr/local/greenplum-db-6.19.3/bin/postgres 0xbefe0c 0xbf2e08 0xa12c84 0x9fd127 0xa08dd0 0x6ac32e 0xa0e592 0x6b09e1 0x7f969816e555 0x6bc6fc
gpadmin  15039 15724  0 15:28 pts/0    00:00:00 grep --color=auto green
[gpadmin@lhrgp40 ~]$ ll /opt/greenplum/data/master/gpseg-1/postgresql.conf
-rw------- 1 gpadmin gpadmin 23762 Jan 16 14:31 /opt/greenplum/data/master/gpseg-1/postgresql.conf
[gpadmin@lhrgp40 ~]$ more postgresql.conf^C
[gpadmin@lhrgp40 ~]$ more /opt/greenplum/data/master/gpseg-1/postgresql.conf | grep shared_preload_libraries
#shared_preload_libraries = ''          # (change requires restart)
shared_preload_libraries='metrics_collector'

同一个主机上可能有多个primary和mirror,那么每个库都需要修改,如下得修改6个库的参数文件:

代码语言:javascript复制
[root@hdw ~]# ps -ef|grep green
gpadmin   3120     1  0 13:47 ?        00:00:00 /usr/local/greenplum-db-6.19.1/bin/postgres -D /data/gpdb/mirror/gpseg3 -p 7000
gpadmin   3138     1  0 13:47 ?        00:00:00 /usr/local/greenplum-db-6.19.1/bin/postgres -D /data/gpdb/mirror/gpseg4 -p 7001
gpadmin   7256     1  0 13:53 ?        00:00:00 /usr/local/greenplum-db-6.19.1/bin/postgres -D /data/gpdb/mirror/gpseg5 -p 7002
gpadmin  27039     1  0 13:19 ?        00:00:30 /usr/local/greenplum-db-6.19.1/bin/postgres -D /data/gpdb/primary/gpseg7 -p 6001
gpadmin  27041     1  0 13:19 ?        00:00:30 /usr/local/greenplum-db-6.19.1/bin/postgres -D /data/gpdb/primary/gpseg8 -p 6002
gpadmin  27042     1  0 13:19 ?        00:00:30 /usr/local/greenplum-db-6.19.1/bin/postgres -D /data/gpdb/primary/gpseg6 -p 6000
[root@hdw5 ~]# 

0 人点赞