现象
代码语言:javascript复制[gpadmin@mdw1 ~]$ gpstart -a
20230116:12:58:42:008927 gpstart:mdw1:gpadmin-[INFO]:-Starting gpstart with args: -a
20230116:12:58:42:008927 gpstart:mdw1:gpadmin-[INFO]:-Gathering information and validating the environment...
20230116:12:58:42:008927 gpstart:mdw1:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 6.19.1 build commit:0e314744a460630073b46cea7b7cf20a81e3da63 Open Source'
20230116:12:58:42:008927 gpstart:mdw1:gpadmin-[INFO]:-Greenplum Catalog Version: '301908232'
20230116:12:58:42:008927 gpstart:mdw1:gpadmin-[INFO]:-Starting Master instance in admin mode
20230116:12:58:42:008927 gpstart:mdw1:gpadmin-[CRITICAL]:-Failed to start Master instance in admin mode
20230116:12:58:42:008927 gpstart:mdw1:gpadmin-[CRITICAL]:-Error occurred: non-zero rc: 1
Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /data/gpdb/master/gpseg-1/ -l /data/gpdb/master/gpseg-1//pg_log/startup.log -w -t 600 -o " -p 5432 -c gp_role=utility " start'
rc=1, stdout='waiting for server to start.... stopped waiting
', stderr='pg_ctl: could not start server
Examine the log output.
'
[gpadmin@mdw1 ~]$ tailf /data/gpdb/master/gpseg-1//pg_log/startup.log
2023-01-16 12:58:59.464993 CST,,,p8992,th834783360,,,,0,,,seg-1,,,,,"LOG","00000","registering background worker ""sweeper process""",,,,,,,,"RegisterBackgroundWorker","bgworker.c",774,
2023-01-16 12:58:59.465304 CST,,,p8992,th834783360,,,,0,,,seg-1,,,,,"FATAL","58P01","could not access file ""metrics_collector"": No such file or directory",,,,,,,,"internal_load_library","dfmgr.c",202,1 0xbef3fc postgres errstart (elog.c:557)
2 0xbf456d postgres <symbol not found> (dfmgr.c:199)
3 0xbf4f54 postgres load_file (dfmgr.c:156)
4 0xc083a4 postgres process_shared_preload_libraries (miscinit.c:1378)
5 0xa0d6e3 postgres PostmasterMain (postmaster.c:1151)
6 0x6b0871 postgres main (main.c:205)
7 0x7f522e7ed3d5 libc.so.6 __libc_start_main 0xf5
8 0x6bc58c postgres <symbol not found> 0x6bc58c
分析
从启动日志“2023-01-16 12:58:59.465304 CST,,,p8992,th834783360,,,,0,,,seg-1,,,,,"FATAL","58P01","could not access file ""metrics_collector"": No such file or directory",,,,,,,,"internal_load_library","dfmgr.c",202,1 0xbef3fc postgres errstart (elog.c:557)”可以看到应该是metrics_collector的问题,这个值是参数文件postgresql.conf中的shared_preload_libraries的值,用于开启gpcc的指标监控。
报错,应该是gpcc安装有错误,然后启动数据库导致的。
若是GPCC安装成功,则会在如下位置有库文件,否则不能随便重启GreenPlum,会导致启动失败:
代码语言:javascript复制[root@lhrgp40 /]# find /usr/local -name metrics_collector*
/usr/local/greenplum-db-6.19.3/share/postgresql/extension/metrics_collector--1.0.sql
/usr/local/greenplum-db-6.19.3/share/postgresql/extension/metrics_collector.control
/usr/local/greenplum-db-6.19.3/lib/postgresql/metrics_collector.so
[root@lhrgp40 /]#
[gpadmin@lhrgp40 ~]$ ll $GPHOME/share/postgresql/extension/gp_wlm*
-rw-r--r-- 1 gpadmin gpadmin 856 Dec 6 12:27 /usr/local/greenplum-db-6.19.3/share/postgresql/extension/gp_wlm--0.1.sql
-rw-r--r-- 1 gpadmin gpadmin 232 Dec 6 12:27 /usr/local/greenplum-db-6.19.3/share/postgresql/extension/gp_wlm.control
[gpadmin@lhrgp40 ~]$ ll $GPHOME/share/postgresql/extension/metrics_collector*
-rw-r--r-- 1 gpadmin gpadmin 846 Dec 6 12:27 /usr/local/greenplum-db-6.19.3/share/postgresql/extension/metrics_collector--1.0.sql
-rw-r--r-- 1 gpadmin gpadmin 233 Dec 6 12:27 /usr/local/greenplum-db-6.19.3/share/postgresql/extension/metrics_collector.control
[gpadmin@lhrgp40 ~]$ ll $GPHOME/lib/postgresql/metrics_collector.so
-rwxr-xr-x 1 gpadmin gpadmin 3357064 Dec 6 12:27 /usr/local/greenplum-db-6.19.3/lib/postgresql/metrics_collector.so
[gpadmin@lhrgp40 ~]$
[gpadmin@lhrgp40 ~]$ gppkg -q --all
20230116:14:58:39:020317 gppkg:lhrgp40:gpadmin-[INFO]:-Starting gppkg with args: -q --all
MetricsCollector-6.8.3_gp_6.19.3
解决
1、先修复master实例,将参数文件postgresql.conf中的shared_preload_libraries的值清空
2、再修改segment实例,将参数文件postgresql.conf中的shared_preload_libraries的值清空
3、尽快启动GreenPlum实例,命令gpstart -a
4、再修复mirror实例的参数文件,将参数文件postgresql.conf中的shared_preload_libraries的值清空
5、最后再单独启动mirror实例,启动方式:
代码语言:javascript复制nohup /usr/local/greenplum-db-6.19.1/bin/postgres -D /data/gpdb/mirror/gpseg5 -p 7002 &
segment的配置可以在master实例上查看:
代码语言:javascript复制 select * from gp_segment_configuration order by 2,1 ;
最后重新安装gpcc,请参考:https://www.xmmup.com/greenplumguanfangjiankonggongjugpcc-6deanzhuanghexiezai.html
postgresql.conf参数文件的位置
代码语言:javascript复制[gpadmin@lhrgp40 ~]$ ps -ef|grep green
gpadmin 520 1 0 14:28 pts/0 00:00:07 /usr/local/greenplum-cc-6.8.3/bin/gpccws -W masterport5432e
gpadmin 672 1 0 14:28 ? 00:00:02 /usr/local/greenplum-cc-6.8.3/bin/ccagent -udpport 9898 -rpcaddr lhrgp40:8899 masterport5432e
gpadmin 1845 1 0 14:33 ? 00:00:21 /usr/local/greenplum-db-6.19.3/bin/postgres -D /opt/greenplum/data/master/gpseg-1 -p 5432 -E
gpadmin 15037 15036 0 15:28 ? 00:00:00 addr2line -s -e /usr/local/greenplum-db-6.19.3/bin/postgres 0xbefe0c 0xbf2e08 0xa12c84 0x9fd127 0xa08dd0 0x6ac32e 0xa0e592 0x6b09e1 0x7f969816e555 0x6bc6fc
gpadmin 15039 15724 0 15:28 pts/0 00:00:00 grep --color=auto green
[gpadmin@lhrgp40 ~]$ ll /opt/greenplum/data/master/gpseg-1/postgresql.conf
-rw------- 1 gpadmin gpadmin 23762 Jan 16 14:31 /opt/greenplum/data/master/gpseg-1/postgresql.conf
[gpadmin@lhrgp40 ~]$ more postgresql.conf^C
[gpadmin@lhrgp40 ~]$ more /opt/greenplum/data/master/gpseg-1/postgresql.conf | grep shared_preload_libraries
#shared_preload_libraries = '' # (change requires restart)
shared_preload_libraries='metrics_collector'
同一个主机上可能有多个primary和mirror,那么每个库都需要修改,如下得修改6个库的参数文件:
代码语言:javascript复制[root@hdw ~]# ps -ef|grep green
gpadmin 3120 1 0 13:47 ? 00:00:00 /usr/local/greenplum-db-6.19.1/bin/postgres -D /data/gpdb/mirror/gpseg3 -p 7000
gpadmin 3138 1 0 13:47 ? 00:00:00 /usr/local/greenplum-db-6.19.1/bin/postgres -D /data/gpdb/mirror/gpseg4 -p 7001
gpadmin 7256 1 0 13:53 ? 00:00:00 /usr/local/greenplum-db-6.19.1/bin/postgres -D /data/gpdb/mirror/gpseg5 -p 7002
gpadmin 27039 1 0 13:19 ? 00:00:30 /usr/local/greenplum-db-6.19.1/bin/postgres -D /data/gpdb/primary/gpseg7 -p 6001
gpadmin 27041 1 0 13:19 ? 00:00:30 /usr/local/greenplum-db-6.19.1/bin/postgres -D /data/gpdb/primary/gpseg8 -p 6002
gpadmin 27042 1 0 13:19 ? 00:00:30 /usr/local/greenplum-db-6.19.1/bin/postgres -D /data/gpdb/primary/gpseg6 -p 6000
[root@hdw5 ~]#