目录
一、架构介绍
1.1、流复制简介
PostgreSQL在9.x之后引入了主从的流复制机制,所谓流复制,就是备服务器通过tcp流从主服务器中同步相应的数据,主服务器在WAL记录产生时即将它们以流式传送给备服务器,而不必等到WAL文件被填充。
流复制属于物理层面的复制,可以从实例级复制出一个与主库一模一样的实例级的从库,流复制同步方式有同步、异步两种。
- 异步流复制模式中,主库提交的事务不会等待备库接收WAL日志流并返回确认信息,因此异步流复制模式下主库与备库的数据版本上会存在一定的处理延迟(毫秒级),当主库宕机,这个延迟就主要受到故障发现与切换时间的影响而拉长。该模式为默认模式。
- 同步流复制模式中,要求主库把WAL日志写入磁盘,同时等待WAL日志记录复制到备库、并且WAL日志记录在任意一个备库写入磁盘后,才能向应用返回Commit结果。一旦所有备库故障,在主库的应用操作则会被挂起,所以此方式建议起码是1主2备。
物理复制优点∶ √物理层面完全一致,是主要的复制方式,其类似于Oracle的DG。 √延迟低,事务执行过程中产生REDO record,实时的在备库apply,事务结束时,备库立马能见到数据。 √物理复制的一致性、可靠性高,不必担心数据逻辑层面不一致。
物理复制缺点︰ √无法满足不同的版本之间、不同库名之间的表同步。 √无法满足指定库或部分表的复制需求 √无法满足将多个数据库实例同步到一个库,将一个库的数据分发到多个不同的库。
物理复制场景: √适合于单向同步。 √适合于任意事务,任意密度写(重度写)的同步。√适合于HA、容灾、读写分离。 √适合于备库没有写,只有读的场景。
物理复制原理︰ √PG主备流复制的核心部分由walsender , walreceiver和startup三个进程组成。 √ walsender进程是用来发送WAL日志记录的,用于主库发送WAL日志记录至从库 √ walreceiver进程是用来接收WAL日志记录的,用于从库接收主库的WAL日志记录 √ startup进程用于从库apply日志
物理流复制的过程如下所示:
1.2、本文目标
1、先搭建1主1从异步模式,测试主从同步,再进行主从切换
2、再添加一个从库,变为1主2从
3、同步模式和异步模式相互切换
1主1从环境架构如下:
二、环境准备
代码语言:javascript复制-- 拉取镜像
docker pull postgres:12
-- 创建PG高可用环境专用网络
docker network create --subnet=172.72.6.0/24 pg-network
-- 创建宿主机相关映射路径
mkdir -p /docker_data/pg/lhrpg64302/data
mkdir -p /docker_data/pg/lhrpg64303/data
-- 主库
docker rm -f lhrpg64302
rm -rf /docker_data/pg/lhrpg64302/data
docker run -d --name lhrpg64302 -h lhrpg64302
-p 64302:5432 --net=pg-network --ip 172.72.6.2
-v /docker_data/pg/lhrpg64302/data:/var/lib/postgresql/data
-v /docker_data/pg/lhrpg64302/bk:/bk
-e POSTGRES_PASSWORD=lhr
-e TZ=Asia/Shanghai
postgres:12
-- 从库
docker rm -f lhrpg64303
rm -rf /docker_data/pg/lhrpg64303/data
rm -rf /docker_data/pg/lhrpg64303/bk
docker run -d --name lhrpg64303 -h lhrpg64303
-p 64303:5432 --net=pg-network --ip 172.72.6.3
-v /docker_data/pg/lhrpg64303/data:/var/lib/postgresql/data
-v /docker_data/pg/lhrpg64303/bk:/bk
-e POSTGRES_PASSWORD=lhr
-e TZ=Asia/Shanghai
postgres:12
-- 远程登录
psql -U postgres -h 192.168.66.35 -p 64302
psql -U postgres -h 192.168.66.35 -p 64303
三、主库操作
3.1、主库放开防火墙
代码语言:javascript复制cat << EOF > /docker_data/pg/lhrpg64302/data/pg_hba.conf
# TYPE DATABASE USER ADDRESS METHOD
local all all trust
host all all 127.0.0.1/32 trust
host all all 0.0.0.0/0 md5
host replication all 0.0.0.0/0 md5
EOF
? 注意添加replication
3.2、主库配置归档
代码语言:javascript复制-- 登陆主库环境
docker exec -it lhrpg64302 bash
-- 该路径也需要在从库创建
mkdir -p /postgresql/archive
chown -R postgres.postgres /postgresql/archive
-- 修改参数
cat >> /var/lib/postgresql/data/postgresql.conf <<"EOF"
wal_level='replica'
archive_mode='on'
archive_command='test ! -f /postgresql/archive/%f && cp %p /postgresql/archive/%f'
max_wal_senders=10
wal_keep_segments=256
wal_sender_timeout=60s
EOF
-- 重启主库
docker restart lhrpg64302
-- 或:
/usr/lib/postgresql/12/bin/pg_ctl restart -D /var/lib/postgresql/data/
-- 查询参数
psql -U postgres -h 192.168.66.35 -p 64302
select * from pg_settings where name in ('wal_level','archive_mode','archive_command');
-- 切换归档
select pg_switch_wal();
执行结果:
代码语言:javascript复制postgres=# select * from pg_settings where name in ('wal_level','archive_mode','archive_command');
name | setting | unit | category | short_desc | extra_desc | context | vartype | source | min_val | max_val | enumvals | boot_val | reset_val | sourcefile | sourceline | pending_restart
----------------- ------------------------------------------------------------------ ------ ----------------------------- ------------------------------------------------------------------- ------------ ------------ --------- -------------------- --------- --------- --------------------------- ---------- ------------------------------------------------------------------ ------------------------------------------ ------------ -----------------
archive_command | test ! -f /postgresql/archive/%f && cp %p /postgresql/archive/%f | | Write-Ahead Log / Archiving | Sets the shell command that will be called to archive a WAL file. | | sighup | string | configuration file | | | | | test ! -f /postgresql/archive/%f && cp %p /postgresql/archive/%f | /var/lib/postgresql/data/postgresql.conf | 753 | f
archive_mode | on | | Write-Ahead Log / Archiving | Allows archiving of WAL files using archive_command. | | postmaster | enum | configuration file | | | {always,on,off} | off | on | /var/lib/postgresql/data/postgresql.conf | 752 | f
wal_level | replica | | Write-Ahead Log / Settings | Set the level of information written to the WAL. | | postmaster | enum | configuration file | | | {minimal,replica,logical} | replica | replica | /var/lib/postgresql/data/postgresql.conf | 751 | f
(3 rows)
postgres=# select * from pg_stat_get_archiver();
-[ RECORD 1 ]------ -----------------------------------------
archived_count | 8
last_archived_wal | 000000010000000000000006.00000028.backup
last_archived_time | 2021-04-22 11:42:54.049649 00
failed_count | 0
last_failed_wal |
last_failed_time |
stats_reset | 2021-04-22 11:35:55.727069 00
postgres=# select pg_switch_wal();
-[ RECORD 1 ]- ----------
pg_switch_wal | 0/7015058
postgres=# select * from pg_stat_get_archiver();
-[ RECORD 1 ]------ ------------------------------
archived_count | 9
last_archived_wal | 000000010000000000000007
last_archived_time | 2021-04-23 01:00:30.076916 00
failed_count | 0
last_failed_wal |
last_failed_time |
stats_reset | 2021-04-22 11:35:55.727069 00
-- 切换归档前
root@lhrpg64302:/# ps -ef|grep post
postgres 1 0 0 01:28 ? 00:00:00 postgres
postgres 26 1 0 01:28 ? 00:00:00 postgres: checkpointer
postgres 27 1 0 01:28 ? 00:00:00 postgres: background writer
postgres 28 1 0 01:28 ? 00:00:00 postgres: walwriter
postgres 29 1 0 01:28 ? 00:00:00 postgres: autovacuum launcher
postgres 30 1 0 01:28 ? 00:00:00 postgres: archiver
postgres 31 1 0 01:28 ? 00:00:00 postgres: stats collector
postgres 32 1 0 01:28 ? 00:00:00 postgres: logical replication launcher
postgres 33 1 0 01:29 ? 00:00:00 postgres: postgres postgres 172.72.6.1(6884) idle
root 40 34 0 01:29 pts/0 00:00:00 grep post
root@lhrpg64302:/# cd /postgresql/archive/
root@lhrpg64302:/postgresql/archive# ls -l
total 0
-- 切换归档
postgres=# select pg_switch_wal();
pg_switch_wal
---------------
0/1645528
(1 row)
-- 切换归档后
root@lhrpg64302:/postgresql/archive# ls -l
total 16384
-rw------- 1 postgres postgres 16777216 Apr 23 01:30 000000010000000000000001
root@lhrpg64302:/postgresql/archive# ps -ef|grep post
postgres 1 0 0 01:28 ? 00:00:00 postgres
postgres 26 1 0 01:28 ? 00:00:00 postgres: checkpointer
postgres 27 1 0 01:28 ? 00:00:00 postgres: background writer
postgres 28 1 0 01:28 ? 00:00:00 postgres: walwriter
postgres 29 1 0 01:28 ? 00:00:00 postgres: autovacuum launcher
postgres 30 1 0 01:28 ? 00:00:00 postgres: archiver last was 000000010000000000000001
postgres 31 1 0 01:28 ? 00:00:00 postgres: stats collector
postgres 32 1 0 01:28 ? 00:00:00 postgres: logical replication launcher
postgres 33 1 0 01:29 ? 00:00:00 postgres: postgres postgres 172.72.6.1(6884) idle
root 47 34 0 01:30 pts/0 00:00:00 grep post
- 参数max_wal_senders介绍:
Specifies the maximum number of concurrent connections from standby servers or streaming base backup clients (i.e., the maximum number of simultaneously running WAL sender processes). The default is zero, meaning replication is disabled. WAL sender processes count towards the total number of connections, so the parameter cannot be set higher than max_connections. This parameter can only be set at server start. wal_level must be set to archive or hot_standby to allow connections from standby servers.
也就是说,这个参数是在主机上设置的,是从机连接到主机的并发连接数之总和,所以这个参数是个正整型。默认值是0,也即默认没有流复制功能。该并发连接数从进程上看,就是各个wal sender进程数之和,可以通过ps -ef|grep senders来查看,所以该值不能超过系统的最大连接数(max_connections,该BUG在9.1.5被修复),可以允许超过实际的流复制用户数。该参数更改需要重启DB,比如我只配了一个从机:
[postgres@ndb2 database]$ ps -ef|grep sender postgres 21257 21247 0 20:57 ? 00:00:00 postgres: wal sender process repuser 192.25.10.71(46161) streaming 0/4018ED8 postgres 22193 20949 0 23:02 pts/0 00:00:00 grep sender
- 参数wal_keep_segments=256介绍
表示保留多少个WAL文件。如果源库业务较繁忙,那么应该相应的增加这个值。
在PG13中,wal_keep_segments 已经取消,改用 wal_keep_size
- 参数wal_sender_timeout=60s介绍
中断那些停止活动超过指定毫秒数的复制连接。这对发送服务器检测一个后备机崩溃或网络中断有用。设置为0将禁用该超时机制。这个参数只能在postgresql.conf文件中或在服务器命令行上设置。默认值是 60 秒。
3.3、主库创建复制用户
代码语言:javascript复制create role replhr login encrypted password 'lhr' replication;
? 创建用户需要加上replication选项。
四、从库操作
4.1、在从库对主库进行备份
代码语言:javascript复制docker exec -it lhrpg64303 bash
mkdir -p /bk
chown postgres:postgres /bk
su - postgres
pg_basebackup -h 172.72.6.2 -p 5432 -U replhr -l bk20210422 -F p -P -R -D /bk
执行完成后,会产生文件standby.signal,如下:
代码语言:javascript复制root@lhrpg64303:/# mkdir -p /bk
root@lhrpg64303:/# chown postgres:postgres /bk
root@lhrpg64303:/#
root@lhrpg64303:/# su - postgres
postgres@lhrpg64303:~$
postgres@lhrpg64303:~$ pg_basebackup -h 172.72.6.2 -p 5432 -U replhr -l bk20210422 -F p -P -R -D /postgresql/pgdata
Password:
24560/24560 kB (100%), 1/1 tablespace
postgres@lhrpg64303:~$ cd /postgresql/pgdata/
postgres@lhrpg64303:/postgresql/pgdata$ ll
-bash: ll: command not found
postgres@lhrpg64303:/postgresql/pgdata$ ls -l
total 116
-rw------- 1 postgres postgres 3 Apr 22 10:52 PG_VERSION
-rw------- 1 postgres postgres 209 Apr 22 10:52 backup_label
drwx------ 5 postgres postgres 4096 Apr 22 10:52 base
drwx------ 2 postgres postgres 4096 Apr 22 10:52 global
drwx------ 2 postgres postgres 4096 Apr 22 10:52 pg_commit_ts
drwx------ 2 postgres postgres 4096 Apr 22 10:52 pg_dynshmem
-rw-r--r-- 1 postgres postgres 243 Apr 22 10:52 pg_hba.conf
-rw------- 1 postgres postgres 1636 Apr 22 10:52 pg_ident.conf
drwx------ 4 postgres postgres 4096 Apr 22 10:52 pg_logical
drwx------ 4 postgres postgres 4096 Apr 22 10:52 pg_multixact
drwx------ 2 postgres postgres 4096 Apr 22 10:52 pg_notify
drwx------ 2 postgres postgres 4096 Apr 22 10:52 pg_replslot
drwx------ 2 postgres postgres 4096 Apr 22 10:52 pg_serial
drwx------ 2 postgres postgres 4096 Apr 22 10:52 pg_snapshots
drwx------ 2 postgres postgres 4096 Apr 22 10:52 pg_stat
drwx------ 2 postgres postgres 4096 Apr 22 10:52 pg_stat_tmp
drwx------ 2 postgres postgres 4096 Apr 22 10:52 pg_subtrans
drwx------ 2 postgres postgres 4096 Apr 22 10:52 pg_tblspc
drwx------ 2 postgres postgres 4096 Apr 22 10:52 pg_twophase
drwx------ 3 postgres postgres 4096 Apr 22 10:52 pg_wal
drwx------ 2 postgres postgres 4096 Apr 22 10:52 pg_xact
-rw------- 1 postgres postgres 255 Apr 22 10:52 postgresql.auto.conf
-rw------- 1 postgres postgres 26756 Apr 22 10:52 postgresql.conf
-rw------- 1 postgres postgres 0 Apr 22 10:52 standby.signal
postgres@lhrpg64303:/postgresql/pgdata$
? 在PG12之前,-R备份结束之后会自动生成recovery.conf文件,用来做流复制判断主从同步的信息。但是从PG12开始,这个文件已经不需要了。只需要在参数文件postgresql.conf中配置primary_conninfo参数即可。
4.2、还原从库
代码语言:javascript复制-- 关闭从库,删除从库的数据文件,并且将备份文件覆盖从库的数据文件
docker stop lhrpg64303
rm -rf /docker_data/pg/lhrpg64303/data/*
cp -r /docker_data/pg/lhrpg64303/bk/* /docker_data/pg/lhrpg64303/data/
4.3、修改从库primary_conninfo参数
代码语言:javascript复制cat >> /docker_data/pg/lhrpg64303/data/postgresql.conf <<"EOF"
primary_conninfo = 'host=172.72.6.2 port=5432 user=replhr password=lhr'
EOF
4.4、启动从库
代码语言:javascript复制docker start lhrpg64303
主库进程:
代码语言:javascript复制root@lhrpg64302:/# ps -ef|grep post
postgres 1 0 0 11:35 ? 00:00:00 postgres
postgres 26 1 0 11:35 ? 00:00:00 postgres: checkpointer
postgres 27 1 0 11:35 ? 00:00:00 postgres: background writer
postgres 28 1 0 11:35 ? 00:00:00 postgres: walwriter
postgres 29 1 0 11:35 ? 00:00:00 postgres: autovacuum launcher
postgres 30 1 0 11:35 ? 00:00:00 postgres: archiver last was 000000010000000000000006.00000028.backup
postgres 31 1 0 11:35 ? 00:00:00 postgres: stats collector
postgres 32 1 0 11:35 ? 00:00:00 postgres: logical replication launcher
postgres 33 1 0 11:35 ? 00:00:00 postgres: postgres postgres 172.72.6.1(52776) idle
postgres 129 1 0 11:48 ? 00:00:00 postgres: walsender replhr 172.72.6.3(40056) streaming 0/7000148
从库进程:
代码语言:javascript复制root@lhrpg64303:/# ps -ef|grep post
postgres 1 0 0 11:48 ? 00:00:00 postgres
postgres 26 1 0 11:48 ? 00:00:00 postgres: startup recovering 000000010000000000000007
postgres 27 1 0 11:48 ? 00:00:00 postgres: checkpointer
postgres 28 1 0 11:48 ? 00:00:00 postgres: background writer
postgres 29 1 0 11:48 ? 00:00:00 postgres: stats collector
postgres 30 1 0 11:48 ? 00:00:00 postgres: walreceiver streaming 0/7000148
postgres 31 1 0 11:48 ? 00:00:00 postgres: postgres postgres 172.72.6.1(54413) idle
4.5、查询复制状态
代码语言:javascript复制-- 主库查看wal日志发送状态
select * from pg_stat_replication;
-- 从库查看wal日志接收状态
select * from pg_stat_wal_receiver;
-- 也可以通过该命令查看
pg_controldata | grep state
-- 也可以查看这个,主库是f代表false ;备库是t,代表true
select pg_is_in_recovery();
主库查询复制状态:
代码语言:javascript复制C:Userslhrxxt>psql -U postgres -h 192.168.66.35 -p 64302
psql (13.2, server 12.2 (Debian 12.2-2.pgdg100 1))
Type "help" for help.
postgres=# x
Expanded display is on.
postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]---- ------------------------------
pid | 129
usesysid | 16384
usename | replhr
application_name | walreceiver
client_addr | 172.72.6.3
client_hostname |
client_port | 40056
backend_start | 2021-04-22 11:48:21.664801 00
backend_xmin |
state | streaming
sent_lsn | 0/7000148
write_lsn | 0/7000148
flush_lsn | 0/7000148
replay_lsn | 0/7000148
write_lag |
flush_lag |
replay_lag |
sync_priority | 0
sync_state | async
reply_time | 2021-04-22 11:50:51.998328 00
pg_stat_replication是一个视图,主要用于监控一个基于流的设置,在这个系统视图中每个记录只代表一个slave。因此,可以看到谁处于连接状态,在做什么任务。pg_stat_replication也是检查slave是否处于连接状态的一个好方法。
每个字段代码的含义:
• pid: 这代表负责流连接的wal_sender进程的进程ID。例如“postgres: walsender replhr 172.72.6.3(40056) streaming 0/7000148”。
• usesysid: 每个内部用户都有一个独一无二的编号。该系统的工作原理很像UNIX。usesysid 是 (PostgreSQL) 用户连接到系统的唯一标识符。
• usename: (不是用户名, 注意少了 r),它存储与用户相关的 usesysid 的名字。这是客户端放入到连接字符串中的东西。
• application_name:这是同步复制的通常设置。它可以通过连接字符串传递到master。
• client_addr: 它会告诉您流连接从何而来。它拥有客户端的IP地址。
• client_hostname: 除了客户端的IP,您还可以这样做,通过它的主机名来标识客户端。您可以通过master上的postgresql.conf中的log_hostname启用DNS反向查找。
• client_port: 这是客户端用来和WALsender进行通信使用的TPC端口号。如果不本地UNIX套接字被使用了将显示-1。
• backend_start: 它告诉我们slave什么时间创建了流连接。
• state: 此列告诉我们数据的连接状态。如果事情按计划进行,它应该包含流信息。
• sent_lsn:这代表发送到连接的最后的事务日志的位置。已经通过网络发送了多少WAL?
• write_lsn: 这是写到standby系统磁盘上最后的事务日志位置。已向操作系统发送了多少WAL?( 尚未 flushing)
• flush_lsn: 这是被刷新到standby系统的最后位置。(这里注意写和刷新之间的区别。写并不意味着刷新 。)已经有多少WAL已 flush 到磁盘?
• replay_lsn: 这是slave上重放的最后的事务日志位置。已重放了多少WAL,因此对查询可见?
• sync_priority: 这个字段是唯一和同步复制相关的。每次同步复制将会选择一个优先权 —sync_priority—会告诉您选择了那个优先权。
• sync_state: 最后您会看到slave在哪个状态。这个状态可以是async, sync, or potential。当有一个带有较高优先权的同步slave时,PostgreSQL会把slave 标记为 potential。
人们经常说 pg_stat_replication 视图是primary 端的,这是不对的。该视图的作用是揭示有关wal sender 进程的信息。换句话说:如果你正在运行级联复制,该视图意味着在 secondary 复制到其他slaves 的时候, secondary 端的 pg_stat_replication 上的也会显示entries ( 条目 )
从库查询wal日志接收状态:
代码语言:javascript复制postgres=# select * from pg_stat_wal_receiver;
-[ RECORD 1 ]--------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid | 30
status | streaming
receive_start_lsn | 0/4000000
receive_start_tli | 1
received_lsn | 0/4000060
received_tli | 1
last_msg_send_time | 2021-04-23 14:33:12.462989 08
last_msg_receipt_time | 2021-04-23 14:33:12.463126 08
latest_end_lsn | 0/4000060
latest_end_time | 2021-04-23 14:32:42.441224 08
slot_name |
sender_host | 172.72.6.2
sender_port | 5432
conninfo | user=replhr password=******** dbname=replication host=172.72.6.2 port=5432 fallback_application_name=walreceiver sslmode=prefer sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any
也可以通过pg_controldata来查询主备库的状态:
代码语言:javascript复制root@lhrpg64302:/# pg_controldata
pg_control version number: 1201
Catalog version number: 201909212
Database system identifier: 6954163543229509670
Database cluster state: in production
pg_control last modified: Fri 23 Apr 2021 09:54:48 AM CST
Latest checkpoint location: 0/5000098
Latest checkpoint's REDO location: 0/5000060
Latest checkpoint's REDO WAL file: 000000010000000000000005
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:488
Latest checkpoint's NextOID: 24576
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 480
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 488
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint: Fri 23 Apr 2021 09:54:48 AM CST
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: replica
wal_log_hints setting: off
max_connections setting: 100
max_worker_processes setting: 8
max_wal_senders setting: 10
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0
Mock authentication nonce: 6006c7a6310b7d3904a3a1cf6b6ca96accce7a702df332c33e1c3fbc210e3623
root@lhrpg64302:/# pg_controldata | grep state
Database cluster state: in production
从库:
代码语言:javascript复制root@lhrpg64303:/var/lib/postgresql/data# pg_controldata
pg_control version number: 1201
Catalog version number: 201909212
Database system identifier: 6954163543229509670
Database cluster state: in archive recovery
pg_control last modified: Fri 23 Apr 2021 09:55:28 AM CST
Latest checkpoint location: 0/5000098
Latest checkpoint's REDO location: 0/5000060
Latest checkpoint's REDO WAL file: 000000010000000000000005
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:488
Latest checkpoint's NextOID: 24576
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 480
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 488
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint: Fri 23 Apr 2021 09:54:48 AM CST
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 0/5000148
Min recovery ending loc's timeline: 1
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: replica
wal_log_hints setting: off
max_connections setting: 100
max_worker_processes setting: 8
max_wal_senders setting: 10
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0
Mock authentication nonce: 6006c7a6310b7d3904a3a1cf6b6ca96accce7a702df332c33e1c3fbc210e3623
root@lhrpg64303:/var/lib/postgresql/data# pg_controldata | grep state
Database cluster state: in archive recovery
五、测试主从复制
select pg_is_in_recovery();--主库是f代表false ;备库是t,代表true
代码语言:javascript复制-- 主库新增表
postgres=# c sbtest
psql (13.2, server 12.2 (Debian 12.2-2.pgdg100 1))
You are now connected to database "sbtest" as user "postgres".
sbtest=# select count(*) from sbtest1;
count | 10
sbtest=# select count(*) from sbtest11;
ERROR: relation "sbtest11" does not exist
LINE 1: select count(*) from sbtest11;
^
sbtest=# create table sbtest11 as select * from sbtest1;
SELECT 10
sbtest=# select count(*) from sbtest11;
count | 10
sbtest=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
f
(1 row)
-- 从库查询
sbtest=# select count(*) from sbtest11;
count
-------
10
(1 row)
sbtest=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)
sbtest=# create database lhrdb;
ERROR: cannot execute CREATE DATABASE in a read-only transaction
-- 主库插入
sbtest=# insert into sbtest11 select * from sbtest11;
INSERT 0 10
sbtest=#
sbtest=# select count(*) from sbtest11;
count
-------
20
(1 row)
-- 从库查询
sbtest=# select count(*) from sbtest11;
count
-------
20
(1 row)
六、主从切换
6.1、主从Switchover切换
在保证主从数据一致的情况下:
1、在当前主库操作:
代码语言:javascript复制1、停止服务
docker stop lhrpg64302
2、以“postgres”用户创建创建standby.signal文件
touch /docker_data/pg/lhrpg64302/data/standby.signal
cat >> /docker_data/pg/lhrpg64302/data/postgresql.conf <<"EOF"
primary_conninfo = 'host=172.72.6.3 port=5432 user=replhr password=lhr'
EOF
3、启动服务
docker start lhrpg64302
2、在当前备库操作:
代码语言:javascript复制1、停止服务
docker stop lhrpg64303
2、删除“standby.signal”文件
rm -rf /docker_data/pg/lhrpg64303/data/standby.signal
3、修改postgresql.conf
sed -i 's/primary_conninfo/#primary_conninfo/g' /docker_data/pg/lhrpg64303/data/postgresql.conf
4、启动服务
docker start lhrpg64303
6.2、主从Failover切换
主数据库是读写的,备数据库是只读的。当主库出现故障时,我们需要将备库提升为主库进行读写操作。
1)切换后,原主库以从库的身份启动:修复过程类似于重建
2)切换后,原主库以主库的身份启动:
切换方法跟版本有关系:
- P12之前,有2种办法:
① pg_ctl promote:执行之后发现recovery.conf变为recovery.done文件,代表 切换成功
② 触发器方式,:备库配置 recovery.conf 文件的 trigger_file 参数,之后在备库主机上创建触发器文件
- 从P12开始 : 直接执行 select pg_promote(true,60)函数即可完成切换。
pg12开始新增了一个pg_promote()函数,让我们可以通过SQL命令激活备库。
pg_promote()语法:
pg_promote(wait boolean DEFAULT true, wait_seconds integer DEFAULT 60)
两个参数:
wait: 表示是否等待备库的 promotion 完成或者 wait_seconds 秒之后返回成功,默认值为 true。
wait_seconds: 等待时间,单位秒,默认 60
6.2.1、Failover切换过程
1)主库操作:关闭主库,模拟主库故障:
代码语言:javascript复制docker stop lhrpg64302
2)备库操作:激活备库为主库:
代码语言:javascript复制-- 不关闭主库,也可以执行该命令强制切换为主库角色,切换完成后,主从关系断开
select pg_promote(true,60);
-- 查询是否切换成功
select pg_is_in_recovery();
6.2.2、原主库修复后降为备库
过程类似于重建备库。
代码语言:javascript复制-- 对6.3进行备份
docker start lhrpg64302
docker exec -it lhrpg64302 bash
mkdir -p /bk
chown postgres:postgres /bk
su - postgres
pg_basebackup -h 172.72.6.3 -p 5432 -U replhr -l bk20210423 -F p -P -R -D /bk
-- 还原6.2
-- 关闭从库,并且将备份文件覆盖从库的数据文件
docker stop lhrpg64302
rm -rf /docker_data/pg/lhrpg64302/data/*
cp -r /docker_data/pg/lhrpg64302/bk/* /docker_data/pg/lhrpg64302/data/
sed -i 's/172.72.6.2/172.72.6.3/g' /docker_data/pg/lhrpg64302/data/postgresql.conf
-- 启动6.2
docker start lhrpg64302
七、新增节点(扩容节点)
新增节点后,架构如下所示:
7.1、申请一台从库
代码语言:javascript复制mkdir -p /docker_data/pg/lhrpg64304/data
-- 从库
docker rm -f lhrpg64304
rm -rf /docker_data/pg/lhrpg64304/data
rm -rf /docker_data/pg/lhrpg64304/bk
docker run -d --name lhrpg64304 -h lhrpg64304
-p 64304:5432 --net=pg-network --ip 172.72.6.4
-v /docker_data/pg/lhrpg64304/data:/var/lib/postgresql/data
-v /docker_data/pg/lhrpg64304/bk:/bk
-e POSTGRES_PASSWORD=lhr
-e TZ=Asia/Shanghai
postgres:12
7.2、在从库对主库进行备份
代码语言:javascript复制docker exec -it lhrpg64304 bash
mkdir -p /bk
chown postgres:postgres /bk
su - postgres
pg_basebackup -h 172.72.6.2 -p 5432 -U replhr -l bk20210422 -F p -P -R -D /bk
7.3、还原从库
代码语言:javascript复制-- 关闭从库,删除从库的数据文件,并且将备份文件覆盖从库的数据文件
docker stop lhrpg64304
rm -rf /docker_data/pg/lhrpg64304/data/*
cp -r /docker_data/pg/lhrpg64304/bk/* /docker_data/pg/lhrpg64304/data/
7.4、修改从库primary_conninfo参数
代码语言:javascript复制cat >> /docker_data/pg/lhrpg64304/data/postgresql.conf <<"EOF"
primary_conninfo = 'host=172.72.6.2 port=5432 user=replhr password=lhr'
EOF
7.5、启动从库
代码语言:javascript复制docker start lhrpg64304
7.6、查询复制状态
代码语言:javascript复制--主库查看wal日志发送状态
select * from pg_stat_replication;
--从库查看wal日志接收状态
select * from pg_stat_wal_receiver;
-- 也可以通过该名称查看
pg_controldata | grep state
-- 也可以查看这个,主库是f代表false ;备库是t,代表true
select pg_is_in_recovery();
主库查询复制状态:
代码语言:javascript复制postgres=# select * from pg_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time
----- ---------- --------- ------------------ ------------- ----------------- ------------- ------------------------------- -------------- ----------- ----------- ----------- ----------- ------------ ----------- ----------- ------------ --------------- ------------ -------------------------------
33 | 16384 | replhr | walreceiver | 172.72.6.3 | | 33806 | 2021-04-23 14:51:18.044699 08 | | streaming | 0/6000060 | 0/6000060 | 0/6000060 | 0/6000060 | | | | 0 | async | 2021-04-23 15:23:31.27773 08
75 | 16384 | replhr | walreceiver | 172.72.6.4 | | 47540 | 2021-04-23 15:23:14.795969 08 | | streaming | 0/6000060 | 0/6000060 | 0/6000060 | 0/6000060 | | | | 0 | async | 2021-04-23 15:23:34.927623 08
(2 rows)
postgres=# select client_addr,state,sync_state from pg_stat_replication;
client_addr | state | sync_state
------------- ----------- ------------
172.72.6.3 | streaming | async
172.72.6.4 | streaming | async
(2 rows)
postgres=#
postgres=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
f
(1 row)
新增从库查询:
代码语言:javascript复制C:Userslhrxxt>psql -U postgres -h 192.168.66.35 -p 64304
Password for user postgres:
psql (13.2, server 12.2 (Debian 12.2-2.pgdg100 1))
Type "help" for help.
postgres=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)
postgres=# x
Expanded display is on.
postgres=# select * from pg_stat_wal_receiver;
-[ RECORD 1 ]--------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid | 30
status | streaming
receive_start_lsn | 0/6000000
receive_start_tli | 1
received_lsn | 0/6000B40
received_tli | 1
last_msg_send_time | 2021-04-23 15:28:04.852893 08
last_msg_receipt_time | 2021-04-23 15:28:04.853655 08
latest_end_lsn | 0/6000B40
latest_end_time | 2021-04-23 15:27:34.782781 08
slot_name |
sender_host | 172.72.6.2
sender_port | 5432
conninfo | user=replhr password=******** dbname=replication host=172.72.6.2 port=5432 fallback_application_name=walreceiver sslmode=prefer sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any
八、修改为同步模式
在3个节点都修改如下参数:
代码语言:javascript复制cat >> /docker_data/pg/lhrpg64302/data/postgresql.conf <<"EOF"
synchronous_commit='on'
synchronous_standby_names='*'
EOF
cat >> /docker_data/pg/lhrpg64303/data/postgresql.conf <<"EOF"
synchronous_commit='on'
synchronous_standby_names='*'
EOF
cat >> /docker_data/pg/lhrpg64304/data/postgresql.conf <<"EOF"
synchronous_commit='on'
synchronous_standby_names='*'
EOF
重启所有 节点:
代码语言:javascript复制docker stop lhrpg64302
docker stop lhrpg64303
docker stop lhrpg64304
docker start lhrpg64302
docker start lhrpg64303
docker start lhrpg64304
查询同步模式:
代码语言:javascript复制postgres=# select client_addr,state,sync_state from pg_stat_replication;
client_addr | state | sync_state
------------- ----------- ------------
172.72.6.3 | streaming | sync
172.72.6.4 | streaming | potential
(2 rows)
若要修改为异步模式,则注销以上参数即可:
代码语言:javascript复制sed -i 's|synchronous_commit|#synchronous_commit|g' /docker_data/pg/lhrpg64302/data/postgresql.conf
sed -i 's|synchronous_standby_names|#synchronous_standby_names|g' /docker_data/pg/lhrpg64302/data/postgresql.conf
sed -i 's|synchronous_commit|#synchronous_commit|g' /docker_data/pg/lhrpg64303/data/postgresql.conf
sed -i 's|synchronous_standby_names|#synchronous_standby_names|g' /docker_data/pg/lhrpg64303/data/postgresql.conf
sed -i 's|synchronous_commit|#synchronous_commit|g' /docker_data/pg/lhrpg64304/data/postgresql.conf
sed -i 's|synchronous_standby_names|#synchronous_standby_names|g' /docker_data/pg/lhrpg64304/data/postgresql.conf
本文结束。