PostgreSQL 的预写日志 (WAL) 示例
1)
代码语言:javascript复制SELECT datname, oid FROM pg_database WHERE datname = 'postgres';
datname | oid
---------- -------
postgres | 15709
Note the database OID, i.e. 15709
2)
代码语言:javascript复制SELECT oid,* from pg_tablespace;
oid | spcname | spcowner | spcacl | spcoptions
------ ------------ ---------- -------- ------------
1663 | pg_default | 10 | |
1664 | pg_global | 10 | |
Note the table space OID, i.e. 1663
3)
代码语言:javascript复制SELECT pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/1C420B8
Note the LSN, i.e. 0/1C420B8
4)
CREATE TABLE abc(a VARCHAR(10));
5)
代码语言:javascript复制SELECT pg_relation_filepath('abc');
pg_relation_filepath
----------------------
base/15709/16384
Note the relation filename, base/15709/16384
6)通过pg_waldump --path=/tmp/sd/pg_wal -start=0/1C420B8看下日志文件里内容。使用的是步骤3中的起始LSN。注意WAL中包含创建物理文件的指令:
代码语言:javascript复制15709 → database postgres → noted in step 1
16384 → table abc → noted in step 5
rmgr | Len(rec/tot) | tx | lsn | prev | desc |
---|---|---|---|---|---|
XLOG | 30/ 30 | 0 | 0/01C420B8 | 0/01C42080 | NEXTOID 24576 |
Storage | 42/ 42 | 0 | 0/01C420D8 | 0/01C420B8 | CREATE base/15709/16384 |
Heap | 203/203 | 1216 | 0/01C42108 | 0/01C420D8 | INSERT off 2, blkref #0: rel 1663/15709/1247 blk 0 |
Btree | 64/ 64 | 1216 | 0/01C421D8 | 0/01C42108 | INSERT_LEAF off 298, blkref #0: rel 1663/15709/2703 blk 2 |
Btree | 64/ 64 | 1216 | 0/01C42218 | 0/01C421D8 | INSERT_LEAF off 7, blkref #0: rel 1663/15709/2704 blk 5 |
Heap | 80/ 80 | 1216 | 0/01C42258 | 0/01C42218 | INSERT off 30, blkref #0: rel 1663/15709/2608 blk 9 |
Btree | 72/ 72 | 1216 | 0/01C422A8 | 0/01C42258 | INSERT_LEAF off 243, blkref #0: rel 1663/15709/2673 blk 51 |
Btree | 72/ 72 | 1216 | 0/01C422F0 | 0/01C422A8 | INSERT_LEAF off 170, blkref #0: rel 1663/15709/2674 blk 61 |
Heap | 203/203 | 1216 | 0/01C42338 | 0/01C422F0 | INSERT off 6, blkref #0: rel 1663/15709/1247 blk 1 |
Btree | 64/64 | 1216 | 0/01C42408 | 0/01C42338 | INSERT_LEAF off 298, blkref #0: rel 1663/15709/2703 blk 2 |
Btree | 72/ 72 | 1216 | 0/01C42448 | 0/01C42408 | INSERT_LEAF off 3, blkref #0: rel 1663/15709/2704 blk 1 |
Heap | 80/ 80 | 1216 | 0/01C42490 | 0/01C42448 | INSERT off 36, blkref #0: rel 1663/15709/2608 blk 9 |
Btree | 72/ 72 | 1216 | 0/01C424E0 | 0/01C42490 | INSERT_LEAF off 243, blkref #0: rel 1663/15709/2673 blk 51 |
Btree | 72/ 72 | 1216 | 0/01C42528 | 0/01C424E0 | INSERT_LEAF off 97, blkref #0: rel 1663/15709/2674 blk 57 |
Heap | 199/199 | 1216 | 0/01C42570 | 0/01C42528 | INSERT off 2, blkref #0: rel 1663/15709/1259 blk 0 |
Btree | 64/ 64 | 1216 | 0/01C42638 | 0/01C42570 | INSERT_LEAF off 257, blkref #0: rel 1663/15709/2662 blk 2 |
Btree | 64/ 64 | 1216 | 0/01C42678 | 0/01C42638 | INSERT_LEAF off 8, blkref #0: rel 1663/15709/2663 blk 1 |
Btree | 64/ 64 | 1216 | 0/01C426B8 | 0/01C42678 | INSERT_LEAF off 217, blkref #0: rel 1663/15709/3455 blk 5 |
Heap | 171/171 | 1216 | 0/01C426F8 | 0/01C426B8 | INSERT off 53, blkref #0: rel 1663/15709/1249 blk 16 |
Btree | 64/ 64 | 1216 | 0/01C427A8 | 0/01C426F8 | INSERT_LEAF off 185, blkref #0: rel 1663/15709/2658 blk 25 |
Btree | 64/ 64 | 1216 | 0/01C427E8 | 0/01C427A8 | INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16 |
Heap | 171/171 | 1216 | 0/01C42828 | 0/01C427E8 | INSERT off 54, blkref #0: rel 1663/15709/1249 blk 16 |
Btree | 72/ 72 | 1216 | 0/01C428D8 | 0/01C42828 | INSERT_LEAF off 186, blkref #0: rel 1663/15709/2658 blk 25 |
Btree | 64/ 64 | 1216 | 0/01C42920 | 0/01C428D8 | INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16 |
Heap | 171/171 | 1216 | 0/01C42960 | 0/01C42920 | INSERT off 55, blkref #0: rel 1663/15709/1249 blk 16 |
Btree | 72/ 72 | 1216 | 0/01C42A10 | 0/01C42960 | INSERT_LEAF off 187, blkref #0: rel 1663/15709/2658 blk 25 |
Btree | 64/ 64 | 1216 | 0/01C42A58 | 0/01C42A10 | INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16 |
Heap | 171/171 | 1216 | 0/01C42A98 | 0/01C42A58 | INSERT off 1, blkref #0: rel 1663/15709/1249 blk 17 |
Btree | 72/ 72 | 1216 | 0/01C42B48 | 0/01C42A98 | INSERT_LEAF off 186, blkref #0: rel 1663/15709/2658 blk 25 |
Btree | 64/ 64 | 1216 | 0/01C42B90 | 0/01C42B48 | INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16 |
Heap | 171/171 | 1216 | 0/01C42BD0 | 0/01C42B90 | INSERT off 3, blkref #0: rel 1663/15709/1249 blk 17 |
Btree | 72/ 72 | 1216 | 0/01C42C80 | 0/01C42BD0 | INSERT_LEAF off 188, blkref #0: rel 1663/15709/2658 blk 25 |
Btree | 64/ 64 | 1216 | 0/01C42CC8 | 0/01C42C80 | INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16 |
Heap | 171/171 | 1216 | 0/01C42D08 | 0/01C42CC8 | INSERT off 5, blkref #0: rel 1663/15709/1249 blk 17 |
Btree | 72/ 72 | 1216 | 0/01C42DB8 | 0/01C42D08 | INSERT_LEAF off 186, blkref #0: rel 1663/15709/2658 blk 25 |
Btree | 64/ 64 | 1216 | 0/01C42E00 | 0/01C42DB8 | INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16 |
Heap | 171/171 | 1216 | 0/01C42E40 | 0/01C42E00 | INSERT off 30, blkref #0: rel 1663/15709/1249 blk 32 |
Btree | 72/ 72 | 1216 | 0/01C42EF0 | 0/01C42E40 | INSERT_LEAF off 189, blkref #0: rel 1663/15709/2658 blk 25 |
Btree | 64/ 64 | 1216 | 0/01C42F38 | 0/01C42EF0 | INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16 |
Heap | 80/ 80 | 1216 | 0/01C42F78 | 0/01C42F38 | INSERT off 25, blkref #0: rel 1663/15709/2608 blk 11 |
Btree | 72/ 72 | 1216 | 0/01C42FC8 | 0/01C42F78 | INSERT_LEAF off 131, blkref #0: rel 1663/15709/2673 blk 44 |
Btree | 72/ 72 | 1216 | 0/01C43010 | 0/01C42FC8 | INSERT_LEAF off 66, blkref #0: rel 1663/15709/2674 blk 46 |
Standby | 42/ 42 | 1216 | 0/01C43058 | 0/01C43010 | LOCK xid 1216 db 15709 rel 16384 |
Txn | 405/405 | 1216 | 0/01C43088 | 0/01C43058 | COMMIT 2019-03-04 07:42:23.165514 EST;... snapshot 2608 relcache 16384 |
Standby | 50/ 50 | 0 | 0/01C43220 | 0/01C43088 | RUNNING_XACTS nextXid 1217 latestCompletedXid 1216 oldestRunningXid 1217 |
7)
代码语言:javascript复制SELECT pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/1C43258
(1 row)
8)
INSERT INTO abc VALUES('pkn');
9)
./pg_waldump --path=/tmp/sd/pg_wal --start=0/1C43258
使用步骤7)中的起始LSN
代码语言:javascript复制1663 → pg_default tablespace → noted in step 2
15709 → database postgres → noted in step 1
16384 → table abc → noted in step 5
rmgr | Len (rec/tot) | tx | lsn | prev | desc |
---|---|---|---|---|---|
Heap | 59/59 | 1217 | 0/01C43258 | 0/01C43220 | INSERT INIT off 1, blkref #0: rel 1663/15709/16384 blk 0 |
Transaction | 34/34 | 1217 | 0/01C43298 | 0/01C43258 | COMMIT 2019-03-04 07:43:45.887511 EST |
Standby | 54/54 | 0 | 0/01C432C0 | 0/01C43298 | RUNNING_XACTS nextXid 1218 latestCompletedXid 1216 oldestRunningXid 1217; 1 xacts: 1217 |
10)
代码语言:javascript复制SELECT pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/1C432F8
(1 row)
11)
INSERT INTO abc VALUES('ujy');
12)./pg_waldump --path=/tmp/sd/pg_wal –start=0/1C432F8使用步骤10)中的起始LSN
rmgr | Len (rec/tot) | tx | lsn | prev | desc |
---|---|---|---|---|---|
Heap | 59/59 | 1218 | 0/01C432F8 | 0/01C432C0 | INSERT off 2, blkref #0: rel 1663/15709/16384 blk 0 |
Transaction | 34/34 | 1218 | 0/01C43338 | 0/01C432F8 | COMMIT 2019-03-04 07:44:25.449151 EST |
Standby | 50/50 | 0 | 0/01C43360 | 0/01C43338 | RUNNING_XACTS nextXid 1219 latestCompletedXid 1218 oldestRunningXid 1219 |
13)检查WAL段文件中的真实记录
代码语言:javascript复制--------- --------------------------------------------------- ----------------
Offset | Hex Bytes | ASCII chars |
--------- --------------------------------------------------- ----------------
00000060 | 3b 00 00 00 c3 04 00 00 28 00 40 02 00 00 00 00 |;.......(.@.....|
00000070 | 00 0a 00 00 ec 28 75 6e 00 20 0a 00 7f 06 00 00 |.....(un. ......|
00000080 | 5d 3d 00 00 00 40 00 00 00 00 00 00 ff 03 01 00 |]=...@..........|
00000090 | 02 08 18 00 09 70 6b 6e 03 00 00 00 00 00 00 00 |.....pkn........|
000000a0 | 22 00 00 00 c3 04 00 00 60 00 40 02 00 00 00 00 |".......`.@.....|
000000b0 | 00 01 00 00 dd 4c 87 04 ff 08 e4 73 44 e7 41 26 |.....L.....sD.A&|
000000c0 | 02 00 00 00 00 00 00 00 32 00 00 00 00 00 00 00 |........2.......|
000000d0 | a0 00 40 02 00 00 00 00 10 08 00 00 9e 01 36 88 |..@...........6.|
000000e0 | ff 18 00 00 00 00 00 00 00 00 00 03 00 00 c4 04 |................|
000000f0 | 00 00 c4 04 00 00 c3 04 00 00 00 00 00 00 00 00 |................|
00000100 | 3b 00 00 00 c4 04 00 00 c8 00 40 02 00 00 00 00 |;.........@.....|
00000110 | 00 0a 00 00 33 df b4 71 00 20 0a 00 7f 06 00 00 |....3..q. ......|
00000120 | 5d 3d 00 00 00 40 00 00 00 00 00 00 ff 03 01 00 |]=...@..........|
00000130 | 02 08 18 00 09 75 6a 79 04 00 00 00 00 00 00 00 |.....ujy........|
00000140 | 22 00 00 00 c4 04 00 00 00 01 40 02 00 00 00 00 |".........@.....|
00000150 | 00 01 00 00 96 2e 96 a6 ff 08 d8 f3 79 ed 41 26 |............y.A&|
00000160 | 02 00 00 00 00 00 00 00 32 00 00 00 00 00 00 00 |........2.......|
00000170 | 40 01 40 02 00 00 00 00 10 08 00 00 eb 6b 95 36 |@.@..........k.6|
00000180 | ff 18 00 00 00 00 00 00 00 00 00 03 00 00 c5 04 |................|
00000190 | 00 00 c5 04 00 00 c4 04 00 00 00 00 00 00 00 00 |................|
000001a0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
PostgreSQL 中基于 WAL 的复制选项有哪些?
持续WAL归档
基于日志传送的复制 -- 文件级别
基于日志传送的复制 -- 块级
持续WAL归档
将生成的 WAL 文件复制到 pg_wal 子目录以外的任何位置以进行归档称为 WAL 归档。每次生成 WAL 文件时,PostgreSQL 都会调用用户提供的用于归档的脚本。该脚本可以使用 scp 命令将文件复制到一个或多个位置。该位置可以是 NFS 挂载。归档后,WAL 段文件可用于在任何指定时间点恢复数据库。
基于日志传送的复制 -- 文件级别
将日志文件复制到另一个 PostgreSQL 服务器以通过重放WAL文件来创建另一个备用服务器称为日志传送。此备用服务器配置为处于恢复模式,其唯一目的是在任何新 WAL 文件到达时回放它们。这第二台服务器(也称为备用服务器)然后成为主PostgreSQL服务器的热备份。备用数据库也可以配置为只读副本,它还可以提供只读查询。这称为热备份。
基于日志传送的复制 -- 块级
流式复制改进了日志传送过程。无需等待WAL切换,而是在生成记录时发送记录,从而减少复制延迟。另一个改进是备用服务器将使用复制协议通过网络连接到主服务器。然后,主服务器可以直接通过此连接发送WAL记录,而无需依赖用户提供的脚本。
主服务器应该保留WAL段文件多久?
如果没有流复制,一旦归档脚本报告归档成功,就可以丢弃/回收崩溃恢复不需要的WAL文件。
但是,存在备后就会产生一个问题:只要最慢的备需要它们,服务器就需要保留 WAL 文件。例如,如果备数据库被关闭了一段时间,然后重新联机并向主数据库请求主数据库不再拥有的 WAL 文件,则复制失败并出现类似于以下内容的错误:
ERROR: requested WAL segment 00000001000000010000002D has already been removed
因此,主服务器应该跟踪备用服务器的落后程度,而不是删除/回收任何备用服务器仍然需要的 WAL 文件。此功能是通过复制槽提供的。
每个复制槽都有一个用于标识槽的名称。每个slot:
1) 槽的消费者需要的最旧的 WAL 段文件。在检查点期间不会删除/回收比这更新的 WAL 段文件。
2) 槽的消费者需要保留的最早的事务 ID。任何比这更近的事务所需的行都不会被真空删除。
基于日志传送的复制
物理流复制
PostgreSQL 数据库中的物理流复制是基于WAL的数据复制。在流式复制中,备用服务器连接到主服务器并使用复制协议接收 WAL 记录。
物理流复制有哪些优势?
1) 备用服务器不需要等待 WAL 文件填满,这改善了复制延迟。
2) 删除了对用户提供的脚本和服务器之间的中间共享存储的依赖。
PostgreSQL 中的 WAL Sender 和 WAL Receiver 是什么?
WAL receiver进程运行在备机上,使用 recovery.conf 的primary_conninfo参数中提供的连接详细信息,并使用 TCP/IP 连接连接到主服务器。
WAL sender是运行在主服务器上的另一个进程,负责在生成 WAL 记录时将其发送到备用服务器。WAL receiver将 WAL 记录保存在 WAL 中,就好像它们是由本地连接的客户端的客户端活动生成的一样。一旦 WAL 记录到达 WAL 段文件,备用服务器会不断地回放 WAL,以便备用服务器和主服务器是最新的。
PostgreSQL 复制和故障转移设置
该设置由两台通过 LAN 连接的 CentOS 7 机器组成,其中安装了 PostgreSQL 版本 10.7。
配置流复制:
1)禁用并停止两台机器上的防火墙:
代码语言:javascript复制sudo firewall-cmd --state
sudo systemctl stop firewalld
sudo systemctl disable firewalld
sudo systemctl mask --now firewalld
2)在主服务器上,允许复制连接和来自同一网络的连接。修改 pg_hba.conf
代码语言:javascript复制Local all all md5
host all all 172.16.214.167/24 md5
host all all ::1/128 md5
local replication all md5
host replication all 172.16.214.167/24 md5
host replication all ::1/128 md5
3)在主服务器上,编辑 postgresql.conf 修改以下参数
代码语言:javascript复制max_wal_senders = 10
wal_level = replica
max_replication_slots = 10
synchronous_commit = on
synchronous_standby_names = '*'
listen_addresses = '*'
4)启动主服务器
./postgres -D ../pr_data -p 5432
5)通过基础备份构建备
代码语言:javascript复制./pg_basebackup
--pgdata=/tmp/sb_data/
--format=p
--write-recovery-conf
--checkpoint=fast
--label=mffb
--progress
--verbose
--host=172.16.214.167
--port=5432
--username=postgres
6)检查base backup label文件
代码语言:javascript复制START WAL LOCATION: 0/2000028 (file 000000010000000000000002)
CHECKPOINT LOCATION: 0/2000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2019-02-24 05:25:30 EST
LABEL: mffb
7)在基本备份中,在 recovery.conf 中添加以下行:
primary_slot_name = 'node_a_slot'
8)检查 /tmp/sb_data/recovery.conf 文件
代码语言:javascript复制standby_mode = 'on'
primary_conninfo = 'user=enterprisedb
password=abc123
host=172.16.214.167
port=5432
sslmode=prefer
sslcompression=1
krbsrvname=postgres
target_session_attrs=any'
primary_slot_name = 'node_a_slot'
9)连接到主服务器并发出以下命令:
代码语言:javascript复制edb=# SELECT * FROM pg_create_physical_replication_slot('node_a_slot');
slot_name | xlog_position
------------- ---------------
node_a_slot |
(1 row)
代码语言:javascript复制edb=# SELECT slot_name, slot_type, active FROM pg_replication_slots;
slot_name | slot_type | active
------------- ----------- --------
node_a_slot | physical | f
(1 row)
10)将基础备份传输到备用服务器:
scp /tmp/sb_data.tar.gz abbas@172.16.214.166:/tmp
sudo mv /tmp/sb_data /opt/PostgreSQL/10/
sudo chown postgres:postgres /opt/PostgreSQL/10/sb_data/
sudo chown -R postgres:postgres /opt/PostgreSQL/10/sb_data/
sudo chmod 700 /opt/PostgreSQL/10/sb_data/
11)启动备
./postgres -D ../sb_data/ -p 5432
主上显示:
LOG: standby "walreceiver" is now a synchronous standby with priority 1
备上显示:
LOG: database system was interrupted; last known up at 2018-10-24 15:49:55
LOG: entering standby mode
LOG: redo starts at 0/3000028
LOG: consistent recovery state reached at 0/30000F8
LOG: started streaming WAL from primary at 0/4000000 on timeline 1
12)连接到主服务器并发出一些简单的命令:
代码语言:javascript复制-bash-4.2$ ./edb-psql -p 5432 edb
Password:
psql.bin (10.7)
Type "help" for help.
create table abc(a int, b varchar(250));
insert into abc values(1,'One');
insert into abc values(2,'Two');
insert into abc values(3,'Three');
13)检查备上数据
代码语言:javascript复制./psql -p 5432 -U postgres postgres
Password for user postgres:
psql.bin (10.7)
Type "help" for help.
postgres=# select * from abc;
a | b
--- -------
1 | One
2 | Two
3 | Three
(3 rows)
PostgreSQL 手动故障转移步骤是什么?
1)使主崩溃
2)备上执行promote提升主
./pg_ctl promote -D ../sb_data/
server promoting
3)连接提升的备用服务器并插入一行:
代码语言:javascript复制-bash-4.2$ ./edb-psql -p 5432 edb
Password:
psql.bin (10.7)
Type "help" for help.
edb=# insert into abc values(4,'Four');
此插入工作正常的事实意味着备用服务器(否则为只读服务器)已被提升为新的主服务器
如何在 PostgreSQL 中自动进行故障转移和复制
使用 EDB Postgres Failover Manager (EFM) 可以轻松设置自动故障转移。在每个主节点和备用节点上下载并安装 EFM后,您可以创建一个EFM 集群,该集群由一个主节点、一个或多个备用节点和一个可选的见证节点组成,该节点在发生故障时确认断言。
EFM 持续监控系统运行状况并根据系统事件发送电子邮件警报。当发生故障时,它会自动切换到最新的备用服务器,并重新配置所有其他备用服务器以识别新的主服务器。它还重新配置负载平衡器(例如 pgPool)并防止“脑裂”(当两个节点都认为它们是主节点时)发生。
PostgreSQL 的 repmgr
另一个开源工具是 repmgr(复制管理器),它还管理 PostgreSQL 集群的复制和故障转移。EDB为安装和运行 repmgr for PostgreSQL提供了深入的教程。正确配置后,repmgr 可以检测主服务器何时发生故障并执行自动故障转移:
https://www.enterprisedb.com/postgres-tutorials/how-implement-repmgr-postgresql-automatic-failover
原文
https://www.enterprisedb.com/postgres-tutorials/postgresql-replication-and-automatic-failover-tutorial