PostgreSQL13流复制主从同步配置,切换步骤

2022-06-08 18:07:04 浏览数 (1)

本文档详细记录如何一步一步在两台Linux服务器上,搭建PostgreSQL数据库主从模式的高可用数据库环境,以及如何手工切换主备库,及其注意事项。

为了实现数据库对于应用层的透明,配合浮动IP来实现,即使数据库执行主备切换,应用层依然通过浮动IP来访问数据库。即:后端数据库执行主备切换前后,应用系统无需做任何更改。

搭建环境选择在Rehat 7.9 PostgreSQL 13.2上。

基本思路和流程大致如下:

1 主库真实 IP为192.168.1.106,配置浮动IP为192.168.1.126,主机名为pgprimary,在其上安装并初始化PostgreSQL数据库

2 备库真实IP为192.168.1.116,配置浮动IP为192.168.1.126,主机名为pgstandby,在其上只安装PostgreSQL数据库软件;

3 通常情况下,浮动IP 192.168.1.126运行在主库上,当主库出故障时,手工执行数据库的主备切换,备库成为新主库,然后新主库上启用浮动IP;

4 修复并重配原主库,使其成为新主库的备库;

主库

代码语言:javascript复制
[root@Centos ~]# cp /etc/sysconfig/network-scripts/ifcfg-ens33 /etc/sysconfig/network-scripts/ifcfg-ens33:1
[root@Centos ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens33:1 TYPE="Ethernet" PROXY_METHOD="none" BROWSER_ONLY="no" BOOTPROTO="none" DEFROUTE="yes" IPV4_FAILURE_FATAL="no" IPV6INIT="yes" IPV6_AUTOCONF="yes" IPV6_DEFROUTE="yes" IPV6_FAILURE_FATAL="no" IPV6_ADDR_GEN_MODE="stable-privacy" NAME="ens33" UUID="02bc077a-e6b9-492b-a5a3-91bbd808b4e9" DEVICE="ens33:1" ONBOOT="yes" IPADDR="192.168.1.116" PREFIX="24" NM_CONTROLLED=no DNS1="114.114.114.114" IPV6_PRIVACY="no"

一、执行stream主备配置流程

1.1 主库创建流复制的用户

代码语言:javascript复制
[postgres@pgprimary data]$ psql
​
postgres=# CREATE ROLE replica login replication encrypted password 'replica';
CREATE ROLE
postgres=#

1.2 主库修改pg_hba.conf文件,允许备库IP通过复制用户访问数据库

代码语言:javascript复制
[postgres@pgprimary data]$ vi pg_hba.conf
# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            trust
host    all             all             0.0.0.0/0               md5
# IPv6 local connections:
host    all             all             ::1/128                 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     trust
host    replication     all             127.0.0.1/32            trust
host    replication     all             ::1/128                 trust
host    replication     replica         192.168.1.116/32        md5
​

最后一行,添加了replica用户可以从备库IP 192.168.1.116访问主库。

代码语言:javascript复制
[postgres@pgstandby 13.2]$ ll
total 16
drwxrwxr-x. 2 postgres postgres 4096 Jan  7 14:12 bin
drwxrwxr-x. 6 postgres postgres 4096 Jan  7 14:12 include
drwxrwxr-x. 4 postgres postgres 4096 Jan  7 14:12 lib
drwxrwxr-x. 8 postgres postgres 4096 Jan  7 14:12 share
​

1.3 备库上执行对于主库的基础备份

代码语言:javascript复制
[postgres@pgstandby data]$ pwd
/data/postgres/13.2/data
​
[postgres@pgstandby 13.2]$ pg_basebackup -h 192.168.1.106 -p 5432 -U replica --password -X stream -Fp --progress -D $PGDATA -R
Password: 
40128/40128 kB (100%), 2/2 tablespaces

注意,备份选项上带有-R选项。

1.4 备库就可以执行pg_ctl start启动了

这时,就可以看到备库服务器上自动生成了standby.signal文件。同时,也看到在$PGDATA路径下,数据库自动帮我们配置了关于流复制的主库的信息:

代码语言:javascript复制
[postgres@pgstandby 13.2]$ pg_ctl start
[postgres@pgstandby 13.2]$ cat data/postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'user=replica password=replica channel_binding=disable host=192.168.1.106 port=5432 sslmode=disable sslcompression=0 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any'
​

当然了,如果我们没有使用-R来备份主库的话。我们完全可以在备库上手工创建standby.signal文件,然后手工编辑postgresql.auto.conf,并在其内容中配置主库的信息。

代码语言:javascript复制

[postgres@pgstandby 13.2]$ ll
total 20
drwxrwxr-x.  2 postgres postgres 4096 Jan  7 14:12 bin
drwx------. 19 postgres postgres 4096 Jan 10 21:04 data
drwxrwxr-x.  6 postgres postgres 4096 Jan  7 14:12 include
drwxrwxr-x.  4 postgres postgres 4096 Jan  7 14:12 lib
drwxrwxr-x.  8 postgres postgres 4096 Jan  7 14:12 share
[postgres@pgstandby 13.2]$ cd data
[postgres@pgstandby data]$ ll
total 272
-rw-------. 1 postgres postgres    224 Jan 10 21:04 backup_label
-rw-------. 1 postgres postgres 220496 Jan 10 21:04 backup_manifest
drwx------. 7 postgres postgres     67 Jan 10 21:04 base
drwx------. 2 postgres postgres   4096 Jan 10 21:04 global
drwx------. 2 postgres postgres      6 Jan 10 21:04 pg_commit_ts
drwx------. 2 postgres postgres      6 Jan 10 21:04 pg_dynshmem
-rw-------. 1 postgres postgres   4896 Jan 10 21:04 pg_hba.conf
-rw-------. 1 postgres postgres   1636 Jan 10 21:04 pg_ident.conf
drwx------. 4 postgres postgres     68 Jan 10 21:04 pg_logical
drwx------. 4 postgres postgres     36 Jan 10 21:04 pg_multixact
drwx------. 2 postgres postgres      6 Jan 10 21:04 pg_notify
drwx------. 2 postgres postgres      6 Jan 10 21:04 pg_replslot
drwx------. 2 postgres postgres      6 Jan 10 21:04 pg_serial
drwx------. 2 postgres postgres      6 Jan 10 21:04 pg_snapshots
drwx------. 2 postgres postgres      6 Jan 10 21:04 pg_stat
drwx------. 2 postgres postgres      6 Jan 10 21:04 pg_stat_tmp
drwx------. 2 postgres postgres      6 Jan 10 21:04 pg_subtrans
drwx------. 2 postgres postgres     19 Jan 10 21:04 pg_tblspc
drwx------. 2 postgres postgres      6 Jan 10 21:04 pg_twophase
-rw-------. 1 postgres postgres      3 Jan 10 21:04 PG_VERSION
drwx------. 3 postgres postgres     60 Jan 10 21:04 pg_wal
drwx------. 2 postgres postgres     18 Jan 10 21:04 pg_xact
-rw-------. 1 postgres postgres    322 Jan 10 21:04 postgresql.auto.conf
-rw-------. 1 postgres postgres  27981 Jan 10 21:04 postgresql.conf
-rw-------. 1 postgres postgres      0 Jan 10 21:04 standby.signal
​
​

1.5 备库数据库进程信息

代码语言:javascript复制
[postgres@pgstandby data]$ ps -ef|grep postgres
root      19760  19406  0 19:43 pts/1    00:00:00 su - postgres
postgres  19761  19760  0 19:43 pts/1    00:00:00 -bash
root      20509  20400  0 20:41 pts/2    00:00:00 su - postgres
postgres  20510  20509  0 20:41 pts/2    00:00:00 -bash
postgres  20924      1  0 21:18 ?        00:00:00 /data/postgres/13.2/bin/postgres
postgres  20925  20924  0 21:18 ?        00:00:00 postgres: startup recovering 000000010000000000000004
postgres  20926  20924  0 21:18 ?        00:00:00 postgres: checkpointer 
postgres  20927  20924  0 21:18 ?        00:00:00 postgres: background writer 
postgres  20928  20924  0 21:18 ?        00:00:00 postgres: stats collector 
postgres  20929  20924  0 21:18 ?        00:00:00 postgres: walreceiver streaming 0/4000148
postgres  20944  20510  0 21:20 pts/2    00:00:00 ps -ef
postgres  20945  20510  0 21:20 pts/2    00:00:00 grep --color=auto postgres
[postgres@pgstandby data]$ 
​

备库上,可以看到walreceiver进程,正在读取日志streaming 0/4000148,执行恢复recovering 000000010000000000000004。

1.6 主库数据库进程信息

代码语言:javascript复制
[postgres@pgprimary data]$ ps -ef|grep postgres
root      20334  19836  0 19:46 pts/2    00:00:00 su - postgres
postgres  20335  20334  0 19:46 pts/2    00:00:00 -bash
postgres  21221      1  0 20:57 ?        00:00:00 /data/postgres/13.2/bin/postgres
postgres  21223  21221  0 20:57 ?        00:00:00 postgres: checkpointer 
postgres  21224  21221  0 20:57 ?        00:00:00 postgres: background writer 
postgres  21225  21221  0 20:57 ?        00:00:00 postgres: walwriter 
postgres  21226  21221  0 20:57 ?        00:00:00 postgres: autovacuum launcher 
postgres  21227  21221  0 20:57 ?        00:00:00 postgres: stats collector 
postgres  21228  21221  0 20:57 ?        00:00:00 postgres: logical replication launcher
postgres  21487  21221  0 21:18 ?        00:00:00 postgres: walsender replica 192.168.1.116(43648) streaming 0/4000148
postgres  21537  20335  0 21:22 pts/2    00:00:00 ps -ef
postgres  21538  20335  0 21:22 pts/2    00:00:00 grep --color=auto postgres
​

主库上看到,后台进程walsender,正在向replica 192.168.1.116(43648) streaming 0/4000148推送日志信息

1.7 主库查看数据库复制信息

代码语言:javascript复制
[postgres@pgprimary data]$ psql -xc "select * from pg_stat_replication"
-[ RECORD 1 ]---- ------------------------------
pid              | 21487
usesysid         | 16404
usename          | replica
application_name | walreceiver
client_addr      | 192.168.1.116
client_hostname  | 
client_port      | 43648
backend_start    | 2022-01-10 21:18:57.112831 08
backend_xmin     | 
state            | streaming
sent_lsn         | 0/4000148
write_lsn        | 0/4000148
flush_lsn        | 0/4000148
replay_lsn       | 0/4000148
write_lag        | 
flush_lag        | 
replay_lag       | 
sync_priority    | 0
sync_state       | async
reply_time       | 2022-01-10 21:23:47.870841 08
​
​

二、主备切换及注意事项

如果因为意外或故障导致主库不可用的情况下,可以直接将备库提升为主库对外提供服务。然后视具体情况看原来的主库是否需要重建,或者是否待故障恢复之后,可以直接作为新的备库,然后从新的主库(原备库)同步数据。

下面是模拟切换步骤:

2.1 主库停止,模拟故障

代码语言:javascript复制
[postgres@pgprimary ~]$ ps -ef|grep postgres
root      18132  18071  0 11:54 pts/0    00:00:00 su - postgres
postgres  18133  18132  0 11:54 pts/0    00:00:00 -bash
postgres  20582      1  0 15:14 ?        00:00:00 /data/postgres/13.2/bin/postgres
postgres  20584  20582  0 15:14 ?        00:00:00 postgres: checkpointer
postgres  20585  20582  0 15:14 ?        00:00:00 postgres: background writer
postgres  20586  20582  0 15:14 ?        00:00:00 postgres: walwriter
postgres  20587  20582  0 15:14 ?        00:00:00 postgres: autovacuum launcher
postgres  20588  20582  0 15:14 ?        00:00:00 postgres: stats collector
postgres  20589  20582  0 15:14 ?        00:00:00 postgres: logical replication launcher
root      20930  20816  0 15:32 pts/0    00:00:00 su - postgres
postgres  20931  20930  0 15:32 pts/0    00:00:00 -bash
postgres  21179  20582  0 15:50 ?        00:00:00 postgres: walsender replica 192.168.1.116(41508) streaming 0/8000148
postgres  21330  20931  0 16:03 pts/0    00:00:00 ps -ef
postgres  21331  20931  0 16:03 pts/0    00:00:00 grep --color=auto postgres
​
[postgres@pgprimary ~]$ pg_ctl status
pg_ctl: server is running (PID: 20582)
/data/postgres/13.2/bin/postgres
​
[postgres@pgprimary ~]$ pg_ctl stop -m fast
waiting for server to shut down....2022-05-27 16:03:33.186 CST [20582] LOG:  received fast shutdown request
2022-05-27 16:03:33.198 CST [20582] LOG:  aborting any active transactions
2022-05-27 16:03:33.199 CST [20582] LOG:  background worker "logical replication launcher" (PID 20589) exited with exit code 1
2022-05-27 16:03:33.199 CST [20584] LOG:  shutting down
2022-05-27 16:03:33.220 CST [20582] LOG:  database system is shut down
 done
server stopped
​
[postgres@pgprimary ~]$ ps -ef|grep postgres
root      18132  18071  0 11:54 pts/0    00:00:00 su - postgres
postgres  18133  18132  0 11:54 pts/0    00:00:00 -bash
root      20930  20816  0 15:32 pts/0    00:00:00 su - postgres
postgres  20931  20930  0 15:32 pts/0    00:00:00 -bash
postgres  21343  20931  0 16:03 pts/0    00:00:00 ps -ef
postgres  21344  20931  0 16:03 pts/0    00:00:00 grep --color=auto postgres
[postgres@pgprimary ~]$
​

通过pg_ctl stop -m fast停止原来的主库之后,数据库后台进程都没有了。

2.2 备库提升为新主库,对外提供服务

代码语言:javascript复制
[postgres@pgstandby 13.2]$ ps -ef|grep postgres
root      18284  18060  0 14:44 pts/0    00:00:00 su - postgres
postgres  18285  18284  0 14:44 pts/0    00:00:00 -bash
postgres  19107      1  0 15:50 ?        00:00:00 /data/postgres/13.2/bin/postgres
postgres  19108  19107  0 15:50 ?        00:00:00 postgres: startup recovering 000000010000000000000008
postgres  19109  19107  0 15:50 ?        00:00:00 postgres: checkpointer
postgres  19110  19107  0 15:50 ?        00:00:00 postgres: background writer
postgres  19111  19107  0 15:50 ?        00:00:00 postgres: stats collector
postgres  19268  18285  0 16:05 pts/0    00:00:00 ps -ef
postgres  19269  18285  0 16:05 pts/0    00:00:00 grep --color=auto postgres
​
[postgres@pgstandby 13.2]$ pg_ctl status
pg_ctl: server is running (PID: 19107)
/data/postgres/13.2/bin/postgres
​
[postgres@pgstandby 13.2]$ pg_ctl promote
waiting for server to promote....2022-05-27 16:06:25.714 CST [19108] LOG:  received promote request
2022-05-27 16:06:25.715 CST [19108] LOG:  redo done at 0/8000148
2022-05-27 16:06:25.728 CST [19108] LOG:  selected new timeline ID: 2
2022-05-27 16:06:25.979 CST [19108] LOG:  archive recovery complete
2022-05-27 16:06:25.982 CST [19107] LOG:  database system is ready to accept connections
 done
server promoted
[postgres@pgstandby 13.2]$
​
[postgres@pgstandby 13.2]$ ps -ef|grep postgres
root      18284  18060  0 14:44 pts/0    00:00:00 su - postgres
postgres  18285  18284  0 14:44 pts/0    00:00:00 -bash
postgres  19107      1  0 15:50 ?        00:00:00 /data/postgres/13.2/bin/postgres
postgres  19109  19107  0 15:50 ?        00:00:00 postgres: checkpointer
postgres  19110  19107  0 15:50 ?        00:00:00 postgres: background writer
postgres  19111  19107  0 15:50 ?        00:00:00 postgres: stats collector
postgres  19347  19107  0 16:06 ?        00:00:00 postgres: walwriter
postgres  19348  19107  0 16:06 ?        00:00:00 postgres: autovacuum launcher
postgres  19349  19107  0 16:06 ?        00:00:00 postgres: logical replication launcher
postgres  19407  18285  0 16:07 pts/0    00:00:00 ps -ef
postgres  19408  18285  0 16:07 pts/0    00:00:00 grep --color=auto postgres
[postgres@pgstandby 13.2]$
​

重要1:启动备库为新主库的命令是pg_ctl promote。

提升备库为主库之后,可以看到,后台进程中不再有startup recovering,以及walreceiver streaming进程了。同时,多了postgres: walwriter 写进程。

重要2:$PGDATA/standby.signal文件自动消失了。这是告诉PostgreSQL,我现在不再是备库了,我的身份是主库了

2.3 新主库修改pg_hba.conf文件

修改新主库(原备库192.168.1.116)的$PGDATA/pg_hba.conf文件,在其中添加允许新备库(原主库192.168.1.106)可以通过replica用户访问数据库的条目信息。

代码语言:javascript复制
host    replication     all             192.168.1.106/32           md5

注意:这里的192.168.1.126是原主库上配置的1个浮动IP地址,绑定在eth0:1设备上。如果主从环境的数据库没有配置浮动IP的话,则这里的IP地址,应该直接填原主库的实际IP地址。

2.4 原主库新建$PGDATA/standby.signal文件

代码语言:javascript复制
[postgres@pgprimary ~]$ cd $PGDATA
[postgres@pgprimary data]$ touch standby.signal
​
[postgres@pgprimary data]$ pwd
/data/postgres/13.2/data
[postgres@pgprimary data]$ ll standby.signal
-rw-rw-r--. 1 postgres postgres 0 May 27 16:17 standby.signal
​

注意:这一步骤非常非常重要,如果不配置该文件的话,那么原来的主库一旦重新启动话,就将成为了1个新的独立主库,脱离了主从数据库环境。

2.5 原主库修改$PGDATA/postgresql.auto.conf文件

代码语言:javascript复制
[postgres@pgprimary data]$ cat postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
[postgres@pgprimary data]$ vim postgresql.auto.conf
[postgres@pgprimary data]$ cat postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo='user=replica password=replica host=192.168.1.116 port=5432'
[postgres@pgprimary data]$
​

2.6 启动原主库,变为新备库

代码语言:javascript复制
[postgres@pgprimary data]$ pg_ctl start -l /home/postgres/startup.log
waiting for server to start.... done
server started
[postgres@pgprimary data]$
​
[postgres@pgprimary data]$ ps -ef|grep postgres
root      18132  18071  0 11:54 pts/0    00:00:00 su - postgres
postgres  18133  18132  0 11:54 pts/0    00:00:00 -bash
root      20930  20816  0 15:32 pts/0    00:00:00 su - postgres
postgres  20931  20930  0 15:32 pts/0    00:00:00 -bash
root      22329  22133  0 16:56 pts/0    00:00:00 su - postgres
postgres  22330  22329  0 16:56 pts/0    00:00:00 -bash
postgres  22391      1  0 16:58 ?        00:00:00 /data/postgres/13.2/bin/postgres
postgres  22392  22391  0 16:58 ?        00:00:00 postgres: startup recovering 000000020000000000000008
postgres  22393  22391  0 16:58 ?        00:00:00 postgres: checkpointer
postgres  22394  22391  0 16:58 ?        00:00:00 postgres: background writer
postgres  22395  22391  0 16:58 ?        00:00:00 postgres: stats collector
root      22918  22717  0 17:17 pts/0    00:00:00 su - postgres
postgres  22919  22918  0 17:17 pts/0    00:00:00 -bash
postgres  23002  22391  0 17:18 ?        00:00:00 postgres: walreceiver streaming 0/8000798
root      23142  23100  0 17:28 pts/1    00:00:00 su - postgres
postgres  23143  23142  0 17:28 pts/1    00:00:00 -bash
postgres  23193  23143  0 17:28 pts/1    00:00:00 psql
postgres  23194  22391  0 17:28 ?        00:00:00 postgres: postgres postgres [local] idle
postgres  23228  22919  0 17:31 pts/0    00:00:00 ps -ef
postgres  23229  22919  0 17:31 pts/0    00:00:00 grep --color=auto postgres
​

三、 测试同步状态

代码语言:javascript复制
[postgres@pgstandby ~]$ psql
psql (13.2)
Type "help" for help.
​
postgres=# select version();
                                                 version
---------------------------------------------------------------------------------------------------------
 PostgreSQL 13.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
(1 row)
​
#查看所有数据库信息
postgres=# l 
​
​
postgres=#  du
                                   List of roles
 Role name |                         Attributes                         | Member of
----------- ------------------------------------------------------------ -----------
 postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 replica   | Replication                                                | {}
 t_user    |                                                            | {}
​
postgres=# c postgres
You are now connected to database "postgres" as user "postgres".
postgres=# c testdb
You are now connected to database "testdb" as user "postgres".
testdb=# d
          List of relations
 Schema |    Name    | Type  | Owner
-------- ------------ ------- --------
 public | test_table | table | t_user
(1 row)
​
​
​

3.1 新主库(原备库192.168.1.116)插入数据

代码语言:javascript复制
​
postgres=# select * from pg_test0524;
 id |  name
---- ---------
  1 | beijing
(1 row)
​
postgres=# insert into pg_test0524 values(2,'shanghai');
INSERT 0 1
​
postgres=# insert into pg_test0524 values(3,'tianjin');
INSERT 0 1
​
​
​

3.2 同步数据正常(新备库原主库)

代码语言:javascript复制
Last login: Fri May 27 17:11:44 2022
[root@pgprimary ~]# su - postgres
Last login: Fri May 27 17:17:10 CST 2022 on pts/0
[postgres@pgprimary ~]$ psql
psql (13.2)
Type "help" for help.
​
postgres=# select * from pg_test0524;
 id |   name
---- ----------
  1 | beijing
  2 | shanghai
(2 rows)
​
postgres=# select * from pg_test0524;
 id |   name
---- ----------
  1 | beijing
  2 | shanghai
  3 | tianjin
(3 rows)
​
postgres=#
​

四、手动切换回原主库

4.1 查看主库状态(192.168.1.116)

代码语言:javascript复制
[postgres@pgstandby data]$ pg_ctl status
pg_ctl: server is running (PID: 18137)
/data/postgres/13.2/bin/postgres "-D" "/data/postgres/13.2/data"
[postgres@pgstandby data]$ echo $PGDATA
/data/postgres/13.2/data
[postgres@pgstandby data]$ ps -ef|grep postgres
root      18006  17942  0 11:54 pts/0    00:00:00 su - postgres
postgres  18007  18006  0 11:54 pts/0    00:00:00 -bash
postgres  18137      1  0 12:02 ?        00:00:00 /data/postgres/13.2/bin/postgres -D /data/postgres/13.2/data
postgres  18139  18137  0 12:02 ?        00:00:00 postgres: checkpointer
postgres  18140  18137  0 12:02 ?        00:00:00 postgres: background writer
postgres  18141  18137  0 12:02 ?        00:00:00 postgres: walwriter
postgres  18142  18137  0 12:02 ?        00:00:00 postgres: autovacuum launcher
postgres  18143  18137  0 12:02 ?        00:00:00 postgres: stats collector
postgres  18144  18137  0 12:02 ?        00:00:00 postgres: logical replication launcher
postgres  19604  18137  0 14:07 ?        00:00:00 postgres: walsender replica 192.168.1.106(41370) streaming 0/9000D80
root      19778  19623  0 14:18 pts/1    00:00:00 su - postgres
postgres  19779  19778  0 14:18 pts/1    00:00:00 -bash
postgres  20222  19779  0 14:53 pts/1    00:00:00 ps -ef
postgres  20223  19779  0 14:53 pts/1    00:00:00 grep --color=auto postgres
[postgres@pgstandby data]$ pg_ctl status
pg_ctl: server is running (PID: 18137)
/data/postgres/13.2/bin/postgres "-D" "/data/postgres/13.2/data"

4.2 停止主库,模拟故障

代码语言:javascript复制
[postgres@pgstandby data]$ pg_ctl stop -m fast
waiting for server to shut down.... done
server stopped
[postgres@pgstandby data]$ ps -ef|grep postgres
root      18006  17942  0 11:54 pts/0    00:00:00 su - postgres
postgres  18007  18006  0 11:54 pts/0    00:00:00 -bash
root      19778  19623  0 14:18 pts/1    00:00:00 su - postgres
postgres  19779  19778  0 14:18 pts/1    00:00:00 -bash
postgres  20236  19779  0 14:54 pts/1    00:00:00 ps -ef
postgres  20237  19779  0 14:54 pts/1    00:00:00 grep --color=auto postgres

通过pg_ctl stop -m fast停止主库(192.168.30.116)之后,数据库后台进程都没有了。

代码语言:javascript复制
查看备库的状态
[postgres@pgprimary data]$ ps -ef|grep postgres
root      18287  18230  0 11:54 pts/0    00:00:00 su - postgres
postgres  18288  18287  0 11:54 pts/0    00:00:00 -bash
postgres  18417      1  0 12:02 ?        00:00:00 /data/postgres/13.2/bin/postgres -D /data/postgres/13.2/data
postgres  18418  18417  0 12:02 ?        00:00:00 postgres: startup recovering 000000020000000000000009
postgres  18419  18417  0 12:02 ?        00:00:00 postgres: checkpointer
postgres  18420  18417  0 12:02 ?        00:00:00 postgres: background writer
postgres  18421  18417  0 12:02 ?        00:00:00 postgres: stats collector
postgres  19326  18288  0 13:34 pts/0    00:00:00 psql
postgres  19336  18417  0 13:34 ?        00:00:00 postgres: postgres postgres [local] idle
root      19779  19654  0 14:18 pts/1    00:00:00 su - postgres
postgres  19780  19779  0 14:18 pts/1    00:00:00 -bash
postgres  20177  19780  0 14:54 pts/1    00:00:00 ps -ef
postgres  20178  19780  0 14:54 pts/1    00:00:00 grep --color=auto postgres
[postgres@pgprimary data]$ pg_ctl status
pg_ctl: server is running (PID: 18417)
/data/postgres/13.2/bin/postgres "-D" "/data/postgres/13.2/data"

4.3 备注提升为主库pg_ctl promote

代码语言:javascript复制
[postgres@pgprimary data]$ pg_ctl promote
waiting for server to promote.... done
server promoted
[postgres@pgprimary data]$ ps -ef|grep postgres
root      18287  18230  0 11:54 pts/0    00:00:00 su - postgres
postgres  18288  18287  0 11:54 pts/0    00:00:00 -bash
postgres  18417      1  0 12:02 ?        00:00:00 /data/postgres/13.2/bin/postgres -D /data/postgres/13.2/data
postgres  18419  18417  0 12:02 ?        00:00:00 postgres: checkpointer
postgres  18420  18417  0 12:02 ?        00:00:00 postgres: background writer
postgres  18421  18417  0 12:02 ?        00:00:00 postgres: stats collector
postgres  19326  18288  0 13:34 pts/0    00:00:00 psql
postgres  19336  18417  0 13:34 ?        00:00:00 postgres: postgres postgres [local] idle
root      19779  19654  0 14:18 pts/1    00:00:00 su - postgres
postgres  19780  19779  0 14:18 pts/1    00:00:00 -bash
postgres  20201  18417  0 14:55 ?        00:00:00 postgres: walwriter
postgres  20202  18417  0 14:55 ?        00:00:00 postgres: autovacuum launcher
postgres  20203  18417  0 14:55 ?        00:00:00 postgres: logical replication launcher
postgres  20204  19780  0 14:55 pts/1    00:00:00 ps -ef
postgres  20205  19780  0 14:55 pts/1    00:00:00 grep --color=auto postgres

4.4新主库修改pg_hba.conf文件,加入备库访问

代码语言:javascript复制
[postgres@pgprimary data]$ cat $PGDATA/pg_hba.conf
# PostgreSQL Client Authentication Configuration File

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            trust
host    all             all             0.0.0.0/0               md5
# IPv6 local connections:
host    all             all             ::1/128                 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     trust
host    replication     all             127.0.0.1/32            trust
host    replication     all             ::1/128                 trust
host   replication     replica         192.168.1.116/32         md5

4.5 新备库创建文件 standby.signal

代码语言:javascript复制
[postgres@pgstandby data]$ cd $PGDATA
[postgres@pgstandby data]$ pwd
/data/postgres/13.2/data
[postgres@pgstandby data]$ touch standby.signal
[postgres@pgstandby data]$ ll

查看连接主库的信息
[postgres@pgstandby data]$ vim postgresql.auto.conf
[postgres@pgstandby data]$ cat postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'user=replica password=replica channel_binding=disable host=192.168.1.106 port=5432 sslmode=disable sslcompression=0 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any'

4.6 启动原主库(第一次切换后的主库),变为新备库

代码语言:javascript复制
[postgres@pgstandby data]$ pg_ctl start -l /home/postgres/startup.log
waiting for server to start.... done
server started
[postgres@pgstandby data]$ ps -ef|grep postgres
root      18006  17942  0 11:54 pts/0    00:00:00 su - postgres
postgres  18007  18006  0 11:54 pts/0    00:00:00 -bash
root      19778  19623  0 14:18 pts/1    00:00:00 su - postgres
postgres  19779  19778  0 14:18 pts/1    00:00:00 -bash
postgres  20330      1  0 15:02 ?        00:00:00 /data/postgres/13.2/bin/postgres
postgres  20331  20330  0 15:02 ?        00:00:00 postgres: startup recovering 000000030000000000000009
postgres  20332  20330  0 15:02 ?        00:00:00 postgres: checkpointer
postgres  20333  20330  0 15:02 ?        00:00:00 postgres: background writer
postgres  20334  20330  0 15:02 ?        00:00:00 postgres: stats collector
postgres  20335  20330  2 15:02 ?        00:00:00 postgres: walreceiver streaming 0/9000F10
postgres  20336  19779  0 15:02 pts/1    00:00:00 ps -ef
postgres  20337  19779  0 15:02 pts/1    00:00:00 grep --color=auto postgres

可以看到恢复进程开始工作,wal 进程 postgres: walreceiver streaming 0/9000F10

4.7 测试数据同步

代码语言:javascript复制
主库插入数据
[postgres@pgprimary data]$ psql
psql (13.2)
Type "help" for help.

postgres-# d
            List of relations
 Schema |    Name     | Type  |  Owner
-------- ------------- ------- ----------
 public | pg_test     | table | postgres
 public | pg_test0524 | table | postgres
 public | test_tbs    | table | postgres
(3 rows)

postgres-# c
You are now connected to database "postgres" as user "postgres".
postgres=# select * from pg_test0524;
 id |   name
---- -----------
  1 | beijing
  2 | shanghai
  3 | tianjin
  4 | xian
  5 | guangzhou
  6 | shenzheng
(6 rows)

postgres=# insert into pg_test0524 values(7,'langfang');
INSERT 0 1
postgres=#

备库查看同步情况
[postgres@pgstandby data]$ psql
psql (13.2)
Type "help" for help.

postgres=# d
            List of relations
 Schema |    Name     | Type  |  Owner
-------- ------------- ------- ----------
 public | pg_test     | table | postgres
 public | pg_test0524 | table | postgres
 public | test_tbs    | table | postgres
(3 rows)

postgres=# c
You are now connected to database "postgres" as user "postgres".
postgres=# select * from pg_test0524;
 id |   name
---- -----------
  1 | beijing
  2 | shanghai
  3 | tianjin
  4 | xian
  5 | guangzhou
  6 | shenzheng
(6 rows)

postgres=# select * from pg_test0524;
 id |   name
---- -----------
  1 | beijing
  2 | shanghai
  3 | tianjin
  4 | xian
  5 | guangzhou
  6 | shenzheng
  7 | langfang
(7 rows)

postgres=#

五 小结
  1. 随着新版本的发行,数据库的配置和使用也越来越简单顺手了。
  2. 备库提升为主库的命令:pg_ctl promote;
  3. 新主库(原备库)的pg_hba.conf文件,要开放允许流复制访问数据库的信息给原主库的IP地址;
  4. 原主库配置为新备库的时候,务必要创建$PGDATA/standby.signal文件;
  5. 原主库配置为新备库的时候,务必要修改$PGDATA/postgresql.auto.conf文件,添加主库primary_conninfo的信息;

0 人点赞