本节主要从greenplum集群初次搭建时初始化遇到的问题及调试为切入点。
目录:
- 集群初始化遇到的问题
- Greenplum执行流程概要
- 常用gp调试策略和手段
基本概念:
gpinitsystem | gp初始化工具,装载配置文件,初始化集群 |
---|---|
lldb/gdb | LLDB 是新一代高性能调试器,其是一组可重用组件的集合,这些组件大多是 LLVM 工程中的类库,如 Clang 表达式解析器或 LLVM 反汇编程序等。LLDB 是 Xcode 中默认的调试器,并且支持调试 C/C 程序 |
1.集群初始化遇到的问题:gpinitsystem 初始化工具的使用
- 使用gpinitsystem调试模式:-D 选项,可以输出额外信息
[gpadmin@gp-master ~]$ gpinitsystem -D
20191220:14:37:08:013619 gpinitsystem:gp-master:gpadmin-[INFO]:-Start Main
20191220:14:37:08:013619 gpinitsystem:gp-master:gpadmin-[INFO]:-Command line options passed to utility = -D
20191220:14:37:08:013619 gpinitsystem:gp-master:gpadmin-[INFO]:-Start Function CHK_GPDB_ID
20191220:14:37:08:013619 gpinitsystem:gp-master:gpadmin-[INFO]:-Current user id of gpadmin, matches initdb id of gpadmin
20191220:14:37:08:013619 gpinitsystem:gp-master:gpadmin-[INFO]:-End Function CHK_GPDB_ID
20191220:14:37:08:013619 gpinitsystem:gp-master:gpadmin-[INFO]:-Start Function CHK_PARAMS
20191220:14:37:08:013619 gpinitsystem:gp-master:gpadmin-[INFO]:-Checking configuration parameters, please wait...
20191220:14:37:08:013619 gpinitsystem:gp-master:gpadmin-[INFO]:-Start Function ERROR_EXIT
20191220:14:37:08:gpinitsystem:gp-master:gpadmin-[FATAL]:-At least one of two options, [-c] or [-I], is required. Script Exiting!
- 查看日志
常见日志有两种: gpinitsystem 和 数据库日志
gpinitsystem 的日志文件。默认路径为 ~/gpAdmin/gpinitsystem_***
代码语言:javascript复制[gpadmin@gp-master ~]$ ll gpAdminLogs/
total 76
-rw-rw-r-- 1 gpadmin gpadmin 3336 Dec 19 19:25 gpcheck_20191219.log
-rw-rw-r-- 1 gpadmin gpadmin 3285 Dec 19 16:46 gpinitstandby_20191219.log
-rw-rw-r-- 1 gpadmin gpadmin 50977 Dec 19 16:42 gpinitsystem_20191219.log
-rw-rw-r-- 1 gpadmin gpadmin 908 Dec 20 14:37 gpinitsystem_20191220.log
-rw-rw-r-- 1 gpadmin gpadmin 3500 Dec 19 16:51 gpstart_20191219.log
-rw-rw-r-- 1 gpadmin gpadmin 5370 Dec 19 17:29 gpstop_20191219.log
数据库的日志文件:进入 master (segment 的日志类似)的日志目录(例如/data/master/gpseg-1/pg_log/) 查看日志。这里面有2种类型的日志:
代码语言:javascript复制startup.log
gpdb-.csv
代码语言:javascript复制[gpadmin@gp-node2 pg_log]$ pwd
/data/primary/gpseg1/pg_log
[gpadmin@gp-node2 pg_log]$ ll
total 12
-rw------- 1 gpadmin gpadmin 2849 Dec 19 16:42 gpdb-2019-12-19_164201.csv
-rw------- 1 gpadmin gpadmin 2650 Dec 19 16:42 gpdb-2019-12-19_164210.csv
-rw------- 1 gpadmin gpadmin 0 Dec 20 00:00 gpdb-2019-12-20_000000.csv
-rw------- 1 gpadmin gpadmin 2444 Dec 19 16:42 startup.log
- 初始化master数据库失败
手动执行initdb查看详细错误信息,针对性解决。不同于gpinitsystem为自动化加载配置文件参数初始化
代码语言:javascript复制Usage:
initdb [OPTION]... [DATADIR]
//指定字符集 数据目录 编码 最大连接 共享内存
$ initdb -E UNICODE -D /data/master/gpseg-1 --locale=en_US.utf8 --max_connections=250 --shared_buffers=128000kB --backend_output=/data/master/gpseg-1.initdb
- master起不来
使用utility模式启动,并且进容许utility模式连接,通过打印日志排查问题
代码语言:javascript复制postgres -D /data/master/gpseg-1 -i -p 5432-c gp_role=utility -M master -b 1-C -1-z 0 -m
- 启动segment出错:使用工具pg_ctl
通过手动启动segment来定位错误
pg_ctl用于初始化PostgreSQL数据库集群, 启动、停止、或者重启PostgreSQL后端服务器(postgres), 或者显示一个运行着的服务器的状态。 尽管可以手动启动服务器, 但是pg_ctl封装了重新定向日志输出, 与终端和进程组合理分离, 以及另外提供了一个便捷选项用于有控制的关闭。
- 不能连接sever找不到domain socket-相关lib库文件调用的问题
connections on Unix domain socket "/var/pgsql_socket/.s.PGSQL.5432"?
通常由于psql binary造成,自己编译的psql调用了系统的libpq库。
代码语言:javascript复制解决办法:手动申明psql库地址
export LD_LIBRARY_PATH=/path/to/your/psql/lib
- gpstart启动失败,原因不明
gpstart -v // 使用 verbose 模式,显示每个执行的命令以及其结果。
[gpadmin@gp-master ~]$ gpstart -v
20191220:15:32:44:024730 gpstart:gp-master:gpadmin-[INFO]:-Starting gpstart with args: -v
20191220:15:32:44:024730 gpstart:gp-master:gpadmin-[DEBUG]:-Setting level of parallelism to: 64
20191220:15:32:44:024730 gpstart:gp-master:gpadmin-[INFO]:-Gathering information and validating the environment...
20191220:15:32:44:024730 gpstart:gp-master:gpadmin-[DEBUG]:---Checking that current user can use GP binaries
20191220:15:32:44:024730 gpstart:gp-master:gpadmin-[DEBUG]:-Obtaining master's port from master data directory
20191220:15:32:44:024730 gpstart:gp-master:gpadmin-[DEBUG]:-Read from postgresql.conf port=5432
20191220:15:32:44:024730 gpstart:gp-master:gpadmin-[DEBUG]:-Read from postgresql.conf max_connections=250
20191220:15:32:44:024730 gpstart:gp-master:gpadmin-[INFO]:-Reading the gp_dbid file - /data/master/gpseg-1/gp_dbid...
20191220:15:32:44:024730 gpstart:gp-master:gpadmin-[DEBUG]:-Parsing : # Greenplum Database identifier for this master/segment. ...
20191220:15:32:44:024730 gpstart:gp-master:gpadmin-[DEBUG]:-Parsing : # Do not change the contents of this file. ...
20191220:15:32:44:024730 gpstart:gp-master:gpadmin-[DEBUG]:-Parsing : dbid = 1 ...
20191220:15:32:44:024730 gpstart:gp-master:gpadmin-[INFO]:-Found match for dbid: 1.
20191220:15:32:44:024730 gpstart:gp-master:gpadmin-[DEBUG]:-Parsing : standby_dbid = 4 ...
20191220:15:32:44:024730 gpstart:gp-master:gpadmin-[INFO]:-Found match for standby_dbid: 4.
20191220:15:32:44:024730 gpstart:gp-master:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 5.21.1 build commit:ca0b8106b893028d18f241dcb858d85f12af90b6'
2.Greenplum执行流程概要
- 数据准备
CREATE TABLE students ( id int , name text ) DISTRIBUTED BY ( id );
CREATE TABLE classes ( id int , classname text , student_id int ) DISTRIBUTED BY ( id );
INSERT INTO students VALUES ( 1 , ' steven '), ( 2 , ' changchang '), ( 3 , ' guoguo ');
INSERT INTO classes VALUES ( 1 , ' math ', 1 ), ( 2 , ' math ', 2 ), ( 3 , ' physics ', 3 );
- 查询sql生成执行计划
postgres=# explain SELECT s . name student_name , c . classnameFROM students s , classes cWHERE s . id = c . student_id;
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Gather Motion 2:1 (slice2; segments: 2) (cost=0.00..862.00 rows=3 width=18)
-> Hash Join (cost=0.00..862.00 rows=2 width=18)
Hash Cond: students.id = classes.student_id
-> Table Scan on students (cost=0.00..431.00 rows=2 width=14)
-> Hash (cost=431.00..431.00 rows=2 width=12)
-> Redistribute Motion 2:2 (slice1; segments: 2) (cost=0.00..431.00 rows=2 width=12)
Hash Key: classes.student_id
-> Table Scan on classes (cost=0.00..431.00 rows=2 width=12)
Optimizer status: PQO version 3.59.0
(9 rows)
3.常用调试策略
- 调试master节点backend进程(QD进程)——双窗口对比:psql vs lldb/gdb
lldb工具的安装:yum安装即可
代码语言:javascript复制[root@gp-node2 ~]# yum search lldb
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
============================================================= N/S matched: lldb =============================================================
lldb-devel.x86_64 : Header files for LLDB
lldb.x86_64 : Next generation high-performance debugger
Name and summary matches only, use "search all" for everything.
[root@gp-node2 ~]# yum install lldb-devel.x86_64 -y
Loaded plugins: fastestmirror, langpacks
epel | 5.4 kB 00:00:00
extras | 2.9 kB 00:00:00
os | 3.6 kB 00:00:00
updates | 2.9 kB 00:00:00
(1/2): epel/7/x86_64/updateinfo | 1.0 MB 00:00:00
(2/2): epel/7/x86_64/primary_db | 6.9 MB 00:00:00
Loading mirror speeds from cached hostfile
Resolving Dependencies
--> Running transaction check
---> Package lldb-devel.x86_64 0:3.4.2-9.el7 will be installed
--> Processing Dependency: lldb(x86-64) = 3.4.2-9.el7 for package: lldb-devel-3.4.2-9.el7.x86_64
代码语言:javascript复制postgres=# SELECT pg_backend_pid(); //查询QD进程
pg_backend_pid
----------------
25904
(1 row)
////////////////
[root@gp-master gpAdminLogs]# lldb -p 25904 //追踪QD进程
Attaching to process with:
process attach -p 25904
Process 25904 stopped
Executable module set to "/usr/local/greenplum-db-5.21.1/bin/postgres".
Architecture set to: x86_64--linux-gnu.
(lldb)
代码语言:javascript复制postgres=# SELECT count(1) FROM students; //psql查询
////////
(lldb) b exec_simple_query //断点追踪
(lldb) c
* thread #1: tid = 32362, 0x000000000083b324 postgres`exec_simple_query(query_string=0x00000000021f2920, seqServerHost=0x0000000000000000, seqServerPort=-1) 20 at postgres.c:1521, name = 'postgres', stop reason = breakpoint 1.1
frame #0: 0x000000000083b324 postgres`exec_simple_query(query_string=0x00000000021f2920, seqServerHost=0x0000000000000000, seqServerPort=-1) 20 at postgres.c:1521
- 调试 Segment 节点Backend进程 (QE)
Greenplum 为了提高效率,降低创建 Gang/QEs 的代价,通常会重用已经创建的Gang/QEs。利用这一特性,可以方便的找到每个 segment上 QE 的pid。
代码语言:javascript复制ps -ef|grep postgres| grep idle
503 38965 38387 0 9:35PM 0:00.46 postgres: 5432, yydzero test ::1(51161) con9 cmd65 idle
503 41210 38354 0 10:39PM 0:00.10 postgres: 40000, yydzero test **(51490) con9 seg0 idle
503 41211 38355 0 10:39PM 0:00.11 postgres: 40001, yydzero test **(51491) con9 seg1 idle
未完待续;