简介
使用postgresql etcd patroni haproxy keepalived可以实现PG的高可用集群,其中,以postgresql做数据库,Patroni监控本地的PostgreSQL状态,并将本地PostgreSQL信息/状态写入etcd来存储集群状态,所以,patroni与etcd结合可以实现数据库集群故障切换(自动或手动切换),而haproxy可以实现数据库读写分离 读负载均衡(通过不同端口实现),keepalived实现VIP跳转,对haproxy提供了高可用,防止haproxy宕机。
Etcd用于Patroni节点之间共享信息。Patroni监控本地的PostgreSQL状态。如果主库(Primary)故障,Patroni把一个从库(Standby)拉起来,作为新的主(Primary)数据库, 如果一个故障PostgreSQL被抢救过来了,能够重新自动或手动加入集群。
Patroni基于Python开发的模板,结合DCS(Distributed Configuration Store,例如 ZooKeeper, etcd, Consul )可以定制PostgreSQL高可用方案。Patroni接管PostgreSQL数据库的启停,同时监控本地的PostgreSQL数据库,并将本地的PostgreSQL数据库信息写入DCS。Patroni的主备端是通过是否能获得 leader key 来控制的,获取到了leader key的Patroni为主节点,其它的为备节点。
其中Patroni不仅简单易用而且功能非常强大。
- 支持自动failover和按需switchover
- 支持一个和多个备节点
- 支持级联复制
- 支持同步复制,异步复制
- 支持同步复制下备库故障时自动降级为异步复制(功效类似于MySQL的半同步,但是更加智能)
- 支持控制指定节点是否参与选主,是否参与负载均衡以及是否可以成为同步备机
- 支持通过
pg_rewind
自动修复旧主 - 支持多种方式初始化集群和重建备机,包括
pg_basebackup
和支持wal_e
,pgBackRest
,barman
等备份工具的自定义脚本 - 支持自定义外部callback脚本
- 支持REST API
- 支持通过watchdog防止脑裂
- 支持k8s,docker等容器化环境部署
- 支持多种常见DCS(Distributed Configuration Store)存储元数据,包括etcd,ZooKeeper,Consul,Kubernetes
架构规划
部署规划如下:
主机 | IP | 组件 | 备注 |
---|---|---|---|
lhrpgpatroni71 | 172.72.6.71 | PostgreSQL 13.3 Patroni 2.1.2 | 主节点,yum安装PG,pip3安装Patroni |
lhrpgpatroni72 | 172.72.6.72 | PostgreSQL 13.3 Patroni 2.1.2 | 备节点1,yum安装PG,pip3安装Patroni |
lhrpgpatroni73 | 172.72.6.73 | PostgreSQL 13.3 Patroni 2.1.2 | 备节点2,yum安装PG,pip3安装Patroni |
lhretcd74 | 172.72.6.74 | etcd 3.3.11 | leader,yum安装etcd |
lhretcd75 | 172.72.6.75 | etcd 3.3.11 | follower,yum安装etcd |
lhretcd76 | 172.72.6.76 | etcd 3.3.11 | follower,yum安装etcd |
lhrhaproxyka77 | 172.72.6.77 | HAProxy 1.5.18 keepalived 1.3.5 | yum安装HAProxy keepalived主节点 |
lhrhaproxyka78 | 172.72.6.78 | HAProxy 1.5.18 keepalived 1.3.5 | yum安装HAProxy keepalived备节点 |
lhrmonitor | 172.72.6.79 | Prometheus Grafana etcd brower | 已内置普罗米修斯监控软件,需要配置监控PG和HAProxy,etcd浏览 |
172.72.6.70 | VIP | 虚拟IP,在172.72.6.77和172.72.6.78之间进行漂移,附着于eth0网卡 |
架构图如下:
拓扑图如下:
本文包括如下几个部分:
1、搭建1主2从的PG流复制环境 2、搭建3节点的etcd集群,并配置etcd-browser WEB可视化界面 3、patroni安装及配置实现switchover和failover故障切换 4、配置HAProxy和Keepalived实现负载均衡 读写分离 5、配置普罗米修斯(prometheus grafana)监控PG HAproxy etcd集群 6、高可用测试
如果配置防火墙,需要开放的端口:
程序 | 端口 | 用途 |
---|---|---|
PostgreSQL | 5432 | 数据库监听端口 |
Patroni | 8008 | restapi监听端口 |
etcd | 2379/2380 | 客户端访问端口和服务监听端口 |
haproxy | 1080/5000/5001 | Web访问端口和数据库对外服务端口 |
grafana | 3000 | Web登录端口 |
prometheus | 9090 | Web登录端口 |
node_exporter | 9100 | 对外服务端口 |
postgres_exporter | 9187 | 对外服务端口 |
haproxy_exporter | 9101 | 对外服务端口 |
环境准备
代码语言:javascript复制-- 网卡
docker network create --subnet=172.72.6.0/24 pg-network
-- pg Patroni
docker rm -f lhrpgpatroni71
docker run -d --name lhrpgpatroni71 -h lhrpgpatroni71
--net=pg-network --ip 172.72.6.71
-p 64371:5432 -p 18008:8008
-v /sys/fs/cgroup:/sys/fs/cgroup
--privileged=true lhrbest/lhrpgall:2.0
/usr/sbin/init
docker rm -f lhrpgpatroni72
docker run -d --name lhrpgpatroni72 -h lhrpgpatroni72
--net=pg-network --ip 172.72.6.72
-p 64372:5432 -p 28008:8008
-v /sys/fs/cgroup:/sys/fs/cgroup
--privileged=true lhrbest/lhrpgall:2.0
/usr/sbin/init
docker rm -f lhrpgpatroni73
docker run -d --name lhrpgpatroni73 -h lhrpgpatroni73
--net=pg-network --ip 172.72.6.73
-p 64373:5432 -p 38008:8008
-v /sys/fs/cgroup:/sys/fs/cgroup
--privileged=true lhrbest/lhrpgall:2.0
/usr/sbin/init
-- etcd
docker rm -f lhretcd74
docker run -d --name lhretcd74 -h lhretcd74
--net=pg-network --ip 172.72.6.74
-v /sys/fs/cgroup:/sys/fs/cgroup
--privileged=true lhrbest/lhrcentos76:8.5
/usr/sbin/init
docker rm -f lhretcd75
docker run -d --name lhretcd75 -h lhretcd75
--net=pg-network --ip 172.72.6.75
-v /sys/fs/cgroup:/sys/fs/cgroup
--privileged=true lhrbest/lhrcentos76:8.5
/usr/sbin/init
docker rm -f lhretcd76
docker run -d --name lhretcd76 -h lhretcd76
--net=pg-network --ip 172.72.6.76
-v /sys/fs/cgroup:/sys/fs/cgroup
--privileged=true lhrbest/lhrcentos76:8.5
/usr/sbin/init
-- HAProxy keepalived
docker rm -f lhrhaproxyka77
docker run -d --name lhrhaproxyka77 -h lhrhaproxyka77
--net=pg-network --ip 172.72.6.77
-p 11080:1080 -p 15000-15001:5000-5001
-v /sys/fs/cgroup:/sys/fs/cgroup
--privileged=true lhrbest/lhrcentos76:8.5
/usr/sbin/init
docker rm -f lhrhaproxyka78
docker run -d --name lhrhaproxyka78 -h lhrhaproxyka78
--net=pg-network --ip 172.72.6.78
-p 21080:1080 -p 25000-25001:5000-5001
-v /sys/fs/cgroup:/sys/fs/cgroup
--privileged=true lhrbest/lhrcentos76:8.5
/usr/sbin/init
-- 监控
docker rm -f lhrmonitor
docker run -d --name lhrmonitor -h lhrmonitor
--net=pg-network --ip 172.72.6.79
-p 23389:3389 -p 29090:9090 -p 29093:9093 -p 23000:3000 -p 28000:8000
-v /sys/fs/cgroup:/sys/fs/cgroup
--privileged=true lhrbest/lhrprometheus:1.0
/usr/sbin/init
搭建1主2从的流复制环境
详细搭建过程可以参考:PG12高可用之1主2从流复制环境搭建及切换测试:https://www.xmmup.com/dbbao60pg12gaokeyongzhi1zhu2congliufuzhihuanjingdajianjiqiehuanceshi.html
主库操作
配置主库pg_hba.conf
代码语言:javascript复制cat << EOF > /var/lib/pgsql/13/data/pg_hba.conf
# TYPE DATABASE USER ADDRESS METHOD
local all all trust
host all all 127.0.0.1/32 trust
host all all 0.0.0.0/0 md5
host replication all 0.0.0.0/0 md5
EOF
配置主库参数
代码语言:javascript复制-- 登陆主库环境
docker exec -it lhrpgpatroni71 bash
-- 该路径也需要在从库创建
mkdir -p /postgresql/archive
chown -R postgres.postgres /postgresql/archive
-- 修改参数
cat >> /var/lib/pgsql/13/data/postgresql.conf <<"EOF"
wal_level='replica'
archive_mode='on'
archive_command='test ! -f /postgresql/archive/%f && cp %p /postgresql/archive/%f'
max_wal_senders=10
EOF
-- 重启
systemctl restart postgresql-13.service
-- 查询参数
select * from pg_settings where name in ('wal_level','archive_mode','archive_command');
主库创建用户
代码语言:javascript复制create role replhr login encrypted password 'lhr' replication;
从库操作
代码语言:javascript复制mkdir -p /bk
chown postgres:postgres /bk
su - postgres
pg_basebackup -h 172.72.6.71 -p 5432 -U replhr -l bk20220120 -F p -P -R -D /bk
cat >> /bk/postgresql.conf <<"EOF"
primary_conninfo = 'host=172.72.6.71 port=5432 user=replhr password=lhr'
EOF
-- 关闭从库,删除从库的数据文件,并且将备份文件覆盖从库的数据文件
systemctl stop postgresql-13.service
rm -rf /var/lib/pgsql/13/data/*
cp -r /bk/* /var/lib/pgsql/13/data/
systemctl start postgresql-13.service
检查主从复制
代码语言:javascript复制-- 主库查看wal日志发送状态
select * from pg_stat_replication;
-- 从库查看wal日志接收状态
select * from pg_stat_wal_receiver;
-- 也可以通过该命令查看
pg_controldata | grep state
-- 也可以查看这个,主库是f代表false ;备库是t,代表true
select pg_is_in_recovery();
postgres=# select * from pg_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time
------ ---------- --------- ------------------ ------------- ----------------- ------------- ------------------------------- -------------- ----------- ----------- ----------- ----------- ------------ ----------- ----------- ------------ --------------- ------------ -------------------------------
3170 | 16430 | replhr | walreceiver | 172.72.6.73 | | 45000 | 2022-01-20 14:58:13.769779 08 | | streaming | 0/7000060 | 0/7000060 | 0/7000060 | 0/7000060 | | | | 0 | async | 2022-01-20 15:13:14.383888 08
3980 | 16430 | replhr | walreceiver | 172.72.6.72 | | 41278 | 2022-01-20 15:12:09.147278 08 | | streaming | 0/7000060 | 0/7000060 | 0/7000060 | 0/7000060 | | | | 0 | async | 2022-01-20 15:13:09.266891 08
(2 rows)
配置etcd集群
6.74、6.75、6.76操作:
代码语言:javascript复制-- 安装
yum install -y etcd
-- lhretcd74
cat > /etc/etcd/etcd.conf <<"EOF"
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
ETCD_LISTEN_PEER_URLS="http://172.72.6.74:2380"
ETCD_LISTEN_CLIENT_URLS="http://172.72.6.74:2379,http://127.0.0.1:2379"
ETCD_NAME="etcd74"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://172.72.6.74:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://172.72.6.74:2379"
ETCD_INITIAL_CLUSTER="etcd74=http://172.72.6.74:2380,etcd75=http://172.72.6.75:2380,etcd76=http://172.72.6.76:2380"
ETCD_INITIAL_CLUSTER_TOKEN="lhretcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"
EOF
-- lhretcd75
cat > /etc/etcd/etcd.conf <<"EOF"
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
ETCD_LISTEN_PEER_URLS="http://172.72.6.75:2380"
ETCD_LISTEN_CLIENT_URLS="http://172.72.6.75:2379,http://127.0.0.1:2379"
ETCD_NAME="etcd75"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://172.72.6.75:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://172.72.6.75:2379"
ETCD_INITIAL_CLUSTER="etcd74=http://172.72.6.74:2380,etcd75=http://172.72.6.75:2380,etcd76=http://172.72.6.76:2380"
ETCD_INITIAL_CLUSTER_TOKEN="lhretcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"
EOF
-- lhretcd76
cat > /etc/etcd/etcd.conf <<"EOF"
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
ETCD_LISTEN_PEER_URLS="http://172.72.6.76:2380"
ETCD_LISTEN_CLIENT_URLS="http://172.72.6.76:2379,http://127.0.0.1:2379"
ETCD_NAME="etcd76"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://172.72.6.76:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://172.72.6.76:2379"
ETCD_INITIAL_CLUSTER="etcd74=http://172.72.6.74:2380,etcd75=http://172.72.6.75:2380,etcd76=http://172.72.6.76:2380"
ETCD_INITIAL_CLUSTER_TOKEN="lhretcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"
EOF
systemctl start etcd
systemctl enable etcd
systemctl status etcd
-- 启动成功后,修改每个配置文件为:
sed -i 's/ETCD_INITIAL_CLUSTER_STATE="new"/ETCD_INITIAL_CLUSTER_STATE="existing"/g' /etc/etcd/etcd.conf
-- 查看状态
etcdctl member list
etcdctl cluster-health
端口2379 用于客户端连接,而 2380 用于伙伴通讯。
etcd集群状态:
代码语言:javascript复制[root@lhretcd74 /]# etcdctl member list
78e1f33282f17b1d: name=etcd76 peerURLs=http://172.72.6.76:2380 clientURLs=http://172.72.6.76:2379 isLeader=false
87e0134f0e4088f5: name=etcd75 peerURLs=http://172.72.6.75:2380 clientURLs=http://172.72.6.75:2379 isLeader=false
e65c83646e82fd9c: name=etcd74 peerURLs=http://172.72.6.74:2380 clientURLs=http://172.72.6.74:2379 isLeader=true
[root@lhretcd74 /]# etcdctl cluster-health
member 78e1f33282f17b1d is healthy: got healthy result from http://172.72.6.76:2379
member 87e0134f0e4088f5 is healthy: got healthy result from http://172.72.6.75:2379
member e65c83646e82fd9c is healthy: got healthy result from http://172.72.6.74:2379
cluster is healthy
etcd-browser WEB可视化界面
下载地址:https://github.com/henszey/etcd-browser 要启动 etcd-browser,还需要下载安装 nodejs,下载地址:https://nodejs.org/zh-cn/download/
代码语言:javascript复制docker rm -f lhrmonitor
docker run -d --name lhrmonitor -h lhrmonitor
--net=pg-network --ip 172.72.6.79
-p 23389:3389 -p 29090:9090 -p 29093:9093 -p 23000:3000 -p 28000:8000 -p 28800:8800
-v /sys/fs/cgroup:/sys/fs/cgroup
--privileged=true lhrbest/lhrprometheus:1.0
/usr/sbin/init
# 下载安装
wget https://codeload.github.com/henszey/etcd-browser/zip/refs/heads/master -O etcd-browser-master.zip
unzip etcd-browser-master.zip -d /usr/local/etcd
wget https://nodejs.org/dist/v16.13.2/node-v16.13.2-linux-x64.tar.xz
tar -xvf node-v16.13.2-linux-x64.tar.xz -C /usr/local/etcd
ln -s /usr/local/etcd/node-v16.13.2-linux-x64 /usr/local/etcd/nodejs
# 修改etcd-browser的配置文件
vi /usr/local/etcd/etcd-browser-master/server.js
var etcdHost = process.env.ETCD_HOST || '172.72.6.74';
var etcdPort = process.env.ETCD_PORT || 2379;
var serverPort = process.env.SERVER_PORT || 8000;
# 启动etcd-browser
cd /usr/local/etcd/etcd-browser-master
nohup /usr/local/etcd/nodejs/bin/node /usr/local/etcd/etcd-browser-master/server.js &
[root@lhrmonitor ~]# netstat -tulnp | grep 8000
tcp6 0 0 :::8000 :::* LISTEN 665/node
谷歌浏览器访问 http://192.168.66.35:8000/
patroni配置
安装相应的Python模块
6.71、6.72、6.73操作:
代码语言:javascript复制-- -i https://mirrors.aliyun.com/pypi/simple/
-- -ihttp://pypi.douban.com/simple/
python3 -m pip install --upgrade pip
python3 -m pip install psycopg2_binary
python3 -m pip install patroni[etcd]
-- 验证是否安装成功
which patroni
patronictl --help
配置patroni的yml参数文件
代码语言:javascript复制mkdir -p /app/patroni/etc/
mkdir -p /app/patroni/log/
chown postgres.postgres -R /app/patroni/
节点lhrpgpatroni71
需要注意python的yml文件格式,有严格的缩进要求,且以空格进行缩进,不要使用Tab键,缩进控制不好的话,参数配置将出现各种问题。
代码语言:javascript复制cat > /app/patroni/etc/patroni_config.yml <<"EOF"
scope: pg_cluster
namespace: /service
name: lhrpgpatroni71
log:
level: INFO
traceback_level: ERROR
dir: /app/patroni/log
file_num: 10
file_size: 104857600
restapi:
listen: 0.0.0.0:8008
connect_address: 172.72.6.71:8008
etcd:
host: 172.72.6.74:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
max_timelines_history: 0
master_start_timeout: 300
master_stop_timeout: 0
synchronous_mode: false
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
max_connections: 3000
superuser_reserved_connections: 100
max_locks_per_transaction: 64
max_worker_processes: 2
max_prepared_transactions: 0
wal_level: logical
wal_log_hints: on
track_commit_timestamp: off
max_wal_senders: 10
max_replication_slots: 10
listen_addresses: "*"
port: 5432
cluster_name: "pg_cluster"
archive_mode: on
archive_command: "test ! -f /postgresql/archive/%f && cp %p /postgresql/archive/%f"
postgresql:
listen: 0.0.0.0:5432
connect_address: 172.72.6.71:5432
data_dir: /var/lib/pgsql/13/data
pgpass: /home/postgres/.pgpass
pg_ctl_timeout: 60
use_pg_rewind: true
remove_data_directory_on_rewind_failure: false
remove_data_directory_on_diverged_timelines: true
authentication:
replication:
username: replhr
password: lhr
superuser:
username: postgres
password: lhr
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
EOF
节点lhrpgpatroni72
代码语言:javascript复制cat > /app/patroni/etc/patroni_config.yml <<"EOF"
scope: pg_cluster
namespace: /service
name: lhrpgpatroni72
log:
level: INFO
traceback_level: ERROR
dir: /app/patroni/log
file_num: 10
file_size: 104857600
restapi:
listen: 0.0.0.0:8008
connect_address: 172.72.6.72:8008
etcd:
host: 172.72.6.75:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
max_timelines_history: 0
master_start_timeout: 300
master_stop_timeout: 0
synchronous_mode: false
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
max_connections: 3000
superuser_reserved_connections: 100
max_locks_per_transaction: 64
max_worker_processes: 2
max_prepared_transactions: 0
wal_level: logical
wal_log_hints: on
track_commit_timestamp: off
max_wal_senders: 10
max_replication_slots: 10
listen_addresses: "*"
port: 5432
cluster_name: "pg_cluster"
archive_mode: on
archive_command: "test ! -f /postgresql/archive/%f && cp %p /postgresql/archive/%f"
postgresql:
listen: 0.0.0.0:5432
connect_address: 172.72.6.72:5432
data_dir: /var/lib/pgsql/13/data
pgpass: /home/postgres/.pgpass
pg_ctl_timeout: 60
use_pg_rewind: true
remove_data_directory_on_rewind_failure: false
remove_data_directory_on_diverged_timelines: true
authentication:
replication:
username: replhr
password: lhr
superuser:
username: postgres
password: lhr
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
EOF
节点lhrpgpatroni73
代码语言:javascript复制cat > /app/patroni/etc/patroni_config.yml <<"EOF"
scope: pg_cluster
namespace: /service
name: lhrpgpatroni73
log:
level: INFO
traceback_level: ERROR
dir: /app/patroni/log
file_num: 10
file_size: 104857600
restapi:
listen: 0.0.0.0:8008
connect_address: 172.72.6.73:8008
etcd:
host: 172.72.6.76:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
max_timelines_history: 0
master_start_timeout: 300
master_stop_timeout: 0
synchronous_mode: false
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
max_connections: 3000
superuser_reserved_connections: 100
max_locks_per_transaction: 64
max_worker_processes: 2
max_prepared_transactions: 0
wal_level: logical
wal_log_hints: on
track_commit_timestamp: off
max_wal_senders: 10
max_replication_slots: 10
listen_addresses: "*"
port: 5432
cluster_name: "pg_cluster"
archive_mode: on
archive_command: "test ! -f /postgresql/archive/%f && cp %p /postgresql/archive/%f"
postgresql:
listen: 0.0.0.0:5432
connect_address: 172.72.6.73:5432
data_dir: /var/lib/pgsql/13/data
pgpass: /home/postgres/.pgpass
pg_ctl_timeout: 60
use_pg_rewind: true
remove_data_directory_on_rewind_failure: false
remove_data_directory_on_diverged_timelines: true
authentication:
replication:
username: replhr
password: lhr
superuser:
username: postgres
password: lhr
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
EOF
参数解释
参考官方文档:https://patroni.readthedocs.io/en/latest/SETTINGS.html
代码语言:javascript复制[root@pgtest1 patroni]# cat patroni_config.yml
# 集群名称
scope: pg_cluster # 集群名称
namespace: /service # Patroni 将在其中保存有关集群的信息,这个路径是 etcd 存储数据的路径, Default value: "/service"
name: pgtest1 # 主机名,对于集群必须是唯一的
log:
level: INFO # 日志级别
traceback_level: ERROR
dir: /enmo/app/patroni/ # 日志写入的目录
file_num: 10 # 要保留的日志数量
file_size: 104857600 # 触发日志滚动的 patoni.log 文件的大小(以字节为单位)
restapi:
listen: 192.168.58.10:8008
connect_address: 192.168.58.10:8008
etcd:
# Provide host to do the initial discovery of the cluster topology:
# 必须指定host、hosts、url、proxy或 srv 之一
host: 192.168.58.10:2379 # etcd 端点的 host:port
bootstrap:
# this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster
# and all other cluster members will use it as a `global configuration`
dcs: # 动态配置(Dynamic configuration)的参数设置,动态配置存储在 DCS(分布式配置存储)中并应用于所有集群节点
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576 # 如果从库落后于主库超过一定数量的字节,则此设置可确保不会发生故障转移
maximum_lag_on_syncnode: -1
max_timelines_history: 0 # DCS 中保存的最大时间线历史项目数。默认值:0。当设置为 0 时,它会在 DCS 中保留完整的历史记录。
master_start_timeout: 300 # 在触发 failover 之前允许主服务器从故障中恢复的时间(单位:秒)
# master 故障的最坏情况故障转移时间是:loop_wait master_start_timeout loop_wait
master_stop_timeout: 0 # Patroni 停止 Postgres 时允许等待的秒数,仅在启用 synchronous_mode 时有效。超过参数值,则 Patroni 会向 postmaster 发送 SIGKILL。
synchronous_mode: false # 打开同步复制模式。在此模式下,一个从库将被选择为同步模式的从库,只有最新的领导者和同步从库才能参与领导者选举。
synchronous_mode_strict # 如果没有可用的同步副本,则防止禁用同步复制,从而阻止所有客户端写入主服务器。
# https://patroni.readthedocs.io/en/latest/replication_modes.html
postgresql:
use_pg_rewind: true # 是否使用 pg_rewind
use_slots: true # 是否使用复制槽
parameters:
max_connections: 3000
superuser_reserved_connections: 100 # Patroni 需要使用超级用户访问数据库才能正常运行
max_locks_per_transaction: 64
max_worker_processes: 2
max_prepared_transactions: 0
wal_level: logical
wal_log_hints: on
track_commit_timestamp: off
max_wal_senders: 10
max_replication_slots: 10
wal_keep_size: 4096MB # wal_keep_size = wal_keep_segments * wal_segment_size = 128 * 32MB
hot_standby: "on"
listen_addresses: "*"
port: 5432
cluster_name: "pg_cluster"
archive_mode: on
archive_command: "cp %p /enmo/pgarch/%f"
# standby_cluster: # 如果定义了这个部分,我们想要引导一个备用集群
# host: 127.0.0.1 # 远程主机的地址
# port: 1111 # 远程主机的端口
# primary_slot_name: patroni # 远程主服务器上用于复制的插槽。此参数是可选的,默认值来自实例名称
# create_replica_methods
# restore_command
# archive_cleanup_command
# recovery_min_apply_delay
initdb: # 列出要传递给 initdb 的选项
- data-checksums # 在9.3上需要 pg_rewind 时必须启用
- encoding: UTF8 # 新数据库的默认编码
- locale: UTF8 # 新数据库的默认语言环境
- wal-segsize: 32
- allow-group-access
- pgdata: /enmo/pgdata
- waldir: /enmo/pgwal
pg_hba: # 应该添加到 pg_hba.conf 的行列表
- host all all 0.0.0.0/0 md5
- host replication replicator 127.0.0.1/32 md5
users: # 初始化新集群后需要创建的一些额外用户
admin: # 用户名是admin
password: zalando
options: # CREATE USER 语句的选项列表
- createrole
- createdb
postgresql:
listen: 0.0.0.0:5432 # Postgres 监听的 IP 地址 端口
connect_address: 192.168.58.10:5432
data_dir: /enmo/pgdata # Postgres 数据目录的位置
# bin_dir: /software/pgsql/bin # PostgreSQL 二进制文件的路径,默认值是一个空字符串,这意味着 PATH 环境变量将用于查找可执行文件
# config_dir: /software/pgsql/data # Postgres 配置目录的位置,默认为 data_dir
pgpass: /home/postgres/.pgpass # 密码文件的路径
pg_ctl_timeout: 60 # pg_ctl 执行 start,stop 或 restart 时应等待多长时间. 默认值为 60 秒。
use_pg_rewind: true # 当它作为副本加入集群时,尝试在前领导者上使用 pg_rewind
remove_data_directory_on_rewind_failure: false # 如果启用此选项,Patroni 将删除 PostgreSQL 数据目录并重新创建副本。否则它会尝试跟随新的领导者。默认值为false
remove_data_directory_on_diverged_timelines: false # 如果 Patroni 注意到时间线正在发散并且以前的 master 无法从新 master 开始流式传输,则 Patroni 将删除 PostgreSQL 数据目录并重新创建副本。此选项在 pg_rewind 无法使用时很有用。默认值为false。
authentication: # 用户验证
replication: # 复制用户
username: replica
password: replica
superuser: # 超级用户
username: postgres
password: postgres
# rewind: # pg_rewind 用户
# username:
# password:
tags:
nofailover: false # 控制是否允许此节点参与领导者竞赛并成为领导者
noloadbalance: false
clonefrom: false
nosync: false
复制
所有节点创建 patroni 服务并启动
创建服务,注意需要配置环境变量,否则启动服务会出现这种报错(FATAL: Patroni requires psycopg2>=2.5.4 or psycopg2-binary) Requires,强制依赖 etcd.service 启动成功,可以视情况而定,etcd不启动,patroni起来后不会自动拉起数据库
代码语言:javascript复制cat > /usr/lib/systemd/system/patroni.service <<"EOF"
[Unit]
Description=patroni
After=network.target remote-fs.target nss-lookup.target
[Service]
Type=forking
User=postgres
Group=postgres
Environment="PGHOME=/usr/pgsql-13"
Environment="PGDATA=/var/lib/pgsql/13/data"
Environment="PGPORT=5432"
Environment="LD_LIBRARY_PATH=/usr/pgsql-13/lib"
Environment="PATH=/usr/pgsql-13/bin:/usr/local/bin:/usr/bin"
ExecStart=/bin/bash -c "patroni /app/patroni/etc/patroni_config.yml >> /app/patroni/log/patroni.log 2>&1 &"
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/usr/bin/killall patroni
KillMode=process
TimeoutSec=30
Restart=no
[Install]
WantedBy=multi-user.target
EOF
# 启动服务
systemctl daemon-reload
systemctl restart patroni.service
systemctl enable patroni.service
systemctl status patroni.service
# 因为 patroni 会检测 PostgreSQL 数据库是否正常运行,如果没有运行,会自动启动PostgreSQL 数据库,所以可以考虑禁用 PostgreSQL 服务,由 patroni 托管PG的启停
systemctl disable postgresql-13.service
所有节点设置patronictl别名,方便维护
代码语言:javascript复制echo "alias patronictl='patronictl -c /app/patroni/etc/patroni_config.yml'" >> /etc/profile
echo "alias patronictl='patronictl -c /app/patroni/etc/patroni_config.yml'" >> /root/.bashrc
source /etc/profile
source /root/.bashrc
在启动patroni之后,建议先使用patronictl禁掉auto failover功能,当启动完毕调试正常后再选择性的启动auto failover功能,因为启用auto failover功能后,killall patroni进程,会导致当前节点的数据库宕掉,如果主库处于生产状态,后果不堪设想。
代码语言:javascript复制# Disable auto failover ,如果没有启动 patroni,执行patronictl pause 会失败
patronictl pause
# Resume auto failover,重新启用
patronictl resume
配置HAProxy和Keepalived
- haproxy 使用C语言开发的一个开源软件,是一款具备高并发(一万以上)、高性能的TCP和HTTP负载均衡器,支持基于cookie的持久性,自动故障切换,支持正则表达式及web状态统计。
- 使用 haproxy 设置端口区分连接主库(5000)和只读从库(5001),且端口上有负载均衡的功能(两个从库)。
- haproxy 开源官网 https://www.haproxy.org/
- Keepalived 免费开源,用C编写,通过VRRP协议实现多台机器之间的故障转移服务
- keepalived 官方文档 https://www.keepalived.org/manpage.html
-- 安装
yum install -y haproxy keepalived
配置HAProxy
使用系统自带的RPM安装的默认配置文件是 /etc/haproxy/haproxy.cfg
代码语言:javascript复制cat > /etc/haproxy/haproxy.cfg <<"EOF"
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 5000
user haproxy
group haproxy
daemon
nbproc 2
defaults
mode tcp
log 127.0.0.1 local2 err
option tcplog
option dontlognull
option redispatch
retries 3
maxconn 5000
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout check 5s
listen status
bind *:1080
mode http
log global
stats enable
stats refresh 30s
stats uri /
stats realm Private lands
stats auth admin:admin
listen master
bind *:5000
mode tcp
option tcplog
balance roundrobin
option httpchk OPTIONS /master
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server pg1 172.72.6.71:5432 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2
server pg2 172.72.6.72:5432 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2
server pg3 172.72.6.73:5432 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2
listen replicas
bind *:5001
mode tcp
option tcplog
balance roundrobin
option httpchk OPTIONS /replica
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server pg1 172.72.6.71:5432 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2
server pg2 172.72.6.72:5432 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2
server pg3 172.72.6.73:5432 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2
EOF
参数说明:
代码语言:javascript复制# 全局定义
global
# log 127.0.0.1 local0 info # 全局的日志配置,使用log关键字,指定使用127.0.0.1上的syslog服务中的local0日志设备,记录日志等级为info的日志
log 127.0.0.1 local2 # 日志输出配置,所有日志都记录在本机,通过local0输出,需要在rsyslog做配置
chroot /var/lib/haproxy # 改变当前工作目录
pidfile /var/run/haproxy.pid # 进程PID文件
maxconn 3000 # 最大连接数
user haproxy # 所属用户
group haproxy # 所属组
daemon # 以后台形式运行haproxy
nbproc 1 # haproxy 启动时的进程数,<=CPU的核数,创建多个进程数,可以减少每个进程的任务队列,但是过多的进程数也可能会导致进程的崩溃。
stats socket /var/lib/haproxy/stats
# 默认部分的定义
defaults
mode tcp # 模式 mode {tcp|http|health}, tcp是4层, http是7层, health是健康检测, 只会返回ok
log 127.0.0.1 local2 err # 使用 127.0.0.1 上的 syslog 服务的 local2 设备记录错误信息
option tcplog # 如果将 mode 设置为 http,那么您必须将 tcplog 更改为 httplog
option dontlognull # 启用该项,日志中将不会记录空连接。所谓空连接就是在上游的负载均衡器或者监控系统为了探测该服务是否存活可用时,需要定期的连接或者获取某一固定的组件或页面,或者探测扫描端口是否在监听或开放等动作被称为空连接;官方文档中标注,如果该服务上游没有其他的负载均衡器的话,建议不要使用该参数,因为互联网上的恶意扫描或其他动作就不会被记录下来。
option redispatch # 当 serverId 对应的服务器挂掉后,强制定向到其他健康的服务器
option abortonclose # 当服务器负载很高的时候,自动结束掉当队列处理比较久的链接
retries 3 # 定义连接后端服务器的失败重连次数,连接失败次数超过此值后将会将对应后端服务器标记为不可用
maxconn 3000 # 默认最大连接数
timeout queue 1m # 当达到服务器的 maxconn 时,连接等待最大时长
timeout connect 10s # 连接超时
timeout client 1m # 客户端非活动状态的超时时长
timeout server 1m # 服务器超时
timeout check 5s # 心跳检测超时
# 配置haproxy web监控,查看统计信息
listen status
bind *:1080 # 定义统计页面的端口
mode http
log global
stats enable # 通过web看状态信息
stats refresh 30s # 统计页面自动刷新时间
maxconn 10 # 最大连接数
stats uri / # 统计页面url,http//ip:1080/ 访问
stats realm Private lands # 设置统计页面认证时的提示内容
stats auth admin:Admin2021 # 设置统计页面认证的用户和密码,如果要设置多个,另起一行写入即可
stats hide-version # 隐藏统计页面上HAProxy的版本信息
listen master
bind *:5000 # 定义haproxy前端部分监听的端口
mode tcp
option tcplog
balance roundrobin # 设置负载算法为:轮询算法
option httpchk OPTIONS /master
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions # inter:间隔3秒做一个检测,fall:3次失败会被踢掉,rise:检查2次
server pg1 172.72.6.71:5432 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2
server pg2 172.72.6.72:5432 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2
server pg3 172.72.6.73:5432 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2
启动HAProxy:
代码语言:javascript复制systemctl start haproxy
systemctl enable haproxy
systemctl status haproxy
浏览器访问 HAProxy
浏览器登录 http://192.168.66.35:1080 输入用户名 admin 密码 admin
配置keepalived
使用系统自带的RPM安装的配置文件是 /etc/keepalived/keepalived.conf
代码语言:javascript复制# keepalived 主节点配置文件
cat > /etc/keepalived/keepalived.conf << "EOF"
global_defs {
router_id pg1
script_user root
enable_script_security
}
vrrp_script chk_haproxy {
script "/usr/bin/killall -0 haproxy"
interval 2
weight 5
fall 3
rise 5
timeout 2
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 88
priority 100
advert_int 5
authentication {
auth_type PASS
auth_pass postgres
}
virtual_ipaddress {
172.72.6.70/24 dev eth0 label eth0:1
}
track_script {
chk_haproxy
}
}
EOF
# keepalived 备节点配置文件
cat > /etc/keepalived/keepalived.conf << "EOF"
global_defs {
router_id pg2
script_user root
enable_script_security
}
vrrp_script chk_haproxy {
script "/usr/bin/killall -0 haproxy"
interval 2
weight 5
fall 3
rise 5
timeout 2
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 88
priority 99
advert_int 5
authentication {
auth_type PASS
auth_pass postgres
}
virtual_ipaddress {
172.72.6.70/24 dev eth0 label eth0:1
}
track_script {
chk_haproxy
}
}
EOF
参数说明:
代码语言:javascript复制! Configuration File for keepalived
# 全局定义块
global_defs {
# 标识服务器的字符串,在局域网内应该是唯一的,不必是主机名,默认是本地主机名
router_id pgtest1
# 设置运行脚本默认用户和组
script_user root
# 如果脚本路径的任一部分对于非root用户来说,都具有可写权限,则不会以root身份运行脚本。
enable_script_security
}
# 周期性检查脚本
vrrp_script chk_haproxy {
# 指定要执行的脚本的路径或命令
# 通过向进程 haproxy 发送信号 0 ,然后根据返回值来判断 haproxy 进程是否存在
script "/usr/bin/killall -0 haproxy"
# 脚本调用间隔的秒数,(默认值:1s)
interval 2
# 指定在多少秒后,脚本被认为执行失败
timeout 2
# 调整权重优先级,默认为2
# keepalived 启动时就做权重运算,priority weight ,主备端哪边权重大,VIP就在哪边启动
weight 5
# 执行失败多少次才认为失败
fall 3
# 执行成功多少次才认为是成功
rise 5
}
# VRRP实例定义块
vrrp_instance VI_1 {
# 指定该keepalived节点的初始状态
state MASTER
# vrrp实例绑定的接口,用于发送VRRP包
interface ens33
# 指定VRRP实例ID,范围是0-255,主备机保持一致
virtual_router_id 88
# 指定优先级,优先级高的将成为MASTER,备机请填写小于主机的值
priority 100
# 指定发送VRRP通告的间隔。单位是秒
advert_int 5
# 指定认证方式
authentication {
auth_type PASS # PASS简单密码认证(推荐),AH:IPSEC认证(不推荐)
auth_pass postgres # 指定认证所使用的密码,最多8位。
}
# 指定VIP地址,主备机保持一致
virtual_ipaddress {
192.168.58.20/24 dev ens33 label ens33:1
}
# 添加一个 track 脚本( vrrp_script 配置的脚本)
track_script {
chk_haproxy
}
}
启动keepalived:
代码语言:javascript复制systemctl start keepalived
systemctl enable keepalived
systemctl status keepalived
# keepalived启动报错:IPVS: Can't initialize ipvs: Protocol not available
lsmod | grep ip_vs
modprobe ip_vs
modprobe ip_vs_wrr
lsmod | grep ip_vs
# 如果是容器,那么宿主机也需要加载ip_vs模块。
启动之后,会发现只有172.72.6.77节点有虚拟IP:
代码语言:javascript复制[root@lhrhaproxyka77 /]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.72.6.77 netmask 255.255.255.0 broadcast 172.72.6.255
ether 02:42:ac:48:06:4d txqueuelen 0 (Ethernet)
RX packets 72015 bytes 5904300 (5.6 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 117213 bytes 22467390 (21.4 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.72.6.70 netmask 255.255.255.0 broadcast 0.0.0.0
ether 02:42:ac:48:06:4d txqueuelen 0 (Ethernet)
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
RX packets 1234 bytes 223309 (218.0 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1234 bytes 223309 (218.0 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
HAProxy 和 Keepalived 日志都记录到 /var/log/messages。
配置普罗米修斯监控
搭建过程参考:
- 使用Docker分分钟搭建漂亮的prometheus grafana监控: https://www.xmmup.com/dbbao36shiyongdockerfenfenzhongdajianpiaoliangdeprometheusgrafanajiankong.html
- 监控利器PMM的使用–监控MySQL、PG、MongoDB、ProxySQL等:https://www.xmmup.com/dbbao41jiankongliqipmmdeshiyong-jiankongmysqlpgmongodbproxysqldeng.html
- https://www.modb.pro/db/154937
访问Grafana:http://172.72.6.79:3000/login 用户名和密码都为admin
export下载:https://prometheus.io/docs/instrumenting/exporters/
监控PG集群
https://github.com/prometheus-community/postgres_exporter
代码语言:javascript复制tar -zxvf postgres_exporter-0.10.1.linux-amd64.tar.gz
mv postgres_exporter-0.10.1.linux-amd64/postgres_exporter /usr/local/bin/postgres_exporter
useradd pg71
useradd pg72
useradd pg73
su - pg71
export DATA_SOURCE_NAME="postgresql://postgres:lhr@172.72.6.71:5432/postgres?sslmode=disable"
nohup postgres_exporter --web.listen-address=":9187" >/dev/null 2>&1 &
su - pg72
export DATA_SOURCE_NAME="postgresql://postgres:lhr@172.72.6.72:5432/postgres?sslmode=disable"
nohup postgres_exporter --web.listen-address=":9188" >/dev/null 2>&1 &
su - pg73
export DATA_SOURCE_NAME="postgresql://postgres:lhr@172.72.6.73:5432/postgres?sslmode=disable"
nohup postgres_exporter --web.listen-address=":9189" >/dev/null 2>&1 &
[root@lhrmonitor /]# netstat -tulnp | grep 918
tcp6 0 0 :::9187 :::* LISTEN 10829/postgres_expo
tcp6 0 0 :::9188 :::* LISTEN 10929/postgres_expo
tcp6 0 0 :::9189 :::* LISTEN 11027/postgres_expo
[root@lhrmonitor /]#
测试:
代码语言:javascript复制curl http://localhost:9187/metrics
curl http://localhost:9188/metrics
curl http://localhost:9189/metrics
Grafana导入监控模板:9628
监控HAProxy
https://github.com/prometheus/haproxy_exporter
代码语言:javascript复制tar -zxvf haproxy_exporter-0.13.0.linux-amd64.tar.gz
mv haproxy_exporter-0.13.0.linux-amd64/haproxy_exporter /usr/local/bin/haproxy_exporter
nohup haproxy_exporter --web.listen-address=":9101" --haproxy.scrape-uri="http://admin:admin@172.72.6.77:1080/haproxy?stats;csv" > /dev/null 2>&1 &
nohup haproxy_exporter --web.listen-address=":9102" --haproxy.scrape-uri="http://admin:admin@172.72.6.78:1080/haproxy?stats;csv" > /dev/null 2>&1 &
[root@lhrmonitor /]# netstat -tulnp | grep 91
tcp 0 0 127.0.0.1:5910 0.0.0.0:* LISTEN 1346/Xvnc
tcp6 0 0 :::9187 :::* LISTEN 10829/postgres_expo
tcp6 0 0 :::9188 :::* LISTEN 10929/postgres_expo
tcp6 0 0 :::9189 :::* LISTEN 11027/postgres_expo
tcp6 0 0 :::9100 :::* LISTEN 211/node_exporter
tcp6 0 0 :::9101 :::* LISTEN 11462/haproxy_expor
tcp6 0 0 :::9102 :::* LISTEN 11497/haproxy_expor
[root@lhrmonitor /]#
测试:
代码语言:javascript复制curl http://localhost:9101/metrics
curl http://localhost:9102/metrics
Grafana导入监控模板:12865
监控etcd集群
prometheus 自带了对etcd的监控,不用安装etcd的exporter。
Grafana导入监控模板:3070、10323
修改prometheus服务端配置文件prometheus.yml
prometheus服务端配置文件prometheus.yml添加客户端,严格注意yml文件配置的格式。
我这里的文件是:/usr/local/prometheus/prometheus.yml
代码语言:javascript复制# vi /usr/local/prometheus/prometheus.yml
# 加到文件最后,注意每行前面的空格不要删除
- job_name: "pg_monitor"
static_configs:
- targets:
- "127.0.0.1:9187"
- "127.0.0.1:9188"
- "127.0.0.1:9189"
- job_name: "haproxy_monitor"
static_configs:
- targets:
- "127.0.0.1:9101"
- "127.0.0.1:9102"
- job_name: "etcd_monitor"
static_configs:
- targets:
- "172.72.6.74:2379"
- "172.72.6.75:2379"
- "172.72.6.76:2379"
# 重新载入配置文件或重启prometheus
curl -X POST http://127.0.0.1:9090/-/reload
http://172.72.6.79:9090/targets
监控到的数据
PG:
HAProxy:
ETCD:
至此,PostgreSQL高可用之Patroni etcd HAProxy Keepalived Prometheus grafana 部署完成。
高可用测试
1. 高可用测试 Keepalived
注意:可以将2个节点的state都配置为backup,将priority都配置为100来避免自动来回切换,而影响客户体验。
测试类型 | 测试方式 | 测试命令 | 测试结果 |
---|---|---|---|
进程故障 | 1. 主端killall进程 | killall keepalived | VIP从主端自动转移到备端,5000端口和5001端口连接正常 |
- | - | systemctl start keepalived | VIP从备端自动转移到主端,5000端口和5001端口连接正常。 |
- | 2. 备端killall进程 | systemctl stop keepalived | VIP在主端正常运行,5000端口和5001端口连接正常 |
- | - | systemctl start keepalived | VIP在主端正常运行,5000端口和5001端口连接正常 |
- | 3. 主端同时kill所有进程 | - | 主端VIP未卸掉,备端也启动VIP,此时主备端均存在VIP(异常现象),5000端口和5001端口连接正常 |
- | - | systemctl start keepalived | VIP转移到主端正常运行,备端无VIP,5000端口和5001端口连接正常 |
- | 4. 主端只kill主进程 | kill -9 | VIP从主端自动转移到备端,VIP只在备端启动,5000端口和5001端口连接正常 |
- | - | systemctl start keepalived | VIP转移到主端正常运行,备端无VIP,5000端口和5001端口连接正常 |
- | 5. 主端只kill子进程 | - | VIP从主端自动转移到备端,等待主进程自动生成子进程后,VIP从备端自动转移到主端,5000端口和5001端口连接正常 |
- | 6. 备端kill 进程 | - | IP在主端正常运行,5000端口和5001端口连接正常 |
网卡故障 | 1. 主端down网卡 | ifdown eth0 | VIP从主端自动转移到备端,PostgreSQL发生故障转移到其中一个备库,5000端口和5001端口连接正常,patroni和etcd均不显示故障节点 |
- | 2. 主端up网卡 | ifup eth0 | VIP从备端自动转移到主端,故障节点以备库角色添加到集群,patroni和etcd节点状态显示正常,5000端口和5001端口连接正常 |
- | 3. 备端down网卡 | ifdown eth0 | VIP在主端正常运行,5000端口和5001端口连接正常,patroni和etcd均不显示故障节点,故障节点上的各个进程还在运行 |
- | 4. 备端up网卡 | ifup eth0 | patroni和etcd节点状态显示正常 |
2. 高可用测试 HAProxy
测试类型 | 测试方式 | 测试命令 | 测试结果 |
---|---|---|---|
进程故障 | 1. 主端killall进程 | killall haproxy | keepalived 未检测 haproxy 进程,自动将VIP从主端转移到备端,5000端口和5001端口连接正常 |
- | - | systemctl start haproxy | keepalived 检测到 haproxy 进程,自动将VIP从备端转移到主端,5000端口和5001端口连接正常 |
- | 2. 备端killall进程 | killall haproxy | VIP在主端正常运行,5000端口和5001端口连接正常 |
- | - | systemctl start haproxy | VIP在主端正常运行,5000端口和5001端口连接正常 |
- | 3. 主端同时kill所有进程 | - | keepalived 未检测 haproxy 进程,自动将VIP从主端转移到备端,5000端口和5001端口连接正常 |
- | - | systemctl start haproxy | keepalived 检测到 haproxy 进程,自动将VIP从备端转移到主端,5000端口和5001端口连接正常 |
- | 4. 主端只kill主进程 | - | keepalived 未检测 haproxy 进程,自动将VIP从主端转移到备端,5000端口和5001端口连接正常 |
- | - | systemctl start haproxy | keepalived 检测到 haproxy 进程,自动将VIP从备端转移到主端,5000端口和5001端口连接正常 |
- | 5. 主端只kill子进程 | - | haproxy 的所有进程都死了,keepalived 未检测 haproxy 进程,自动将VIP从主端转移到备端,5000端口和5001端口连接正常 |
- | - | systemctl start haproxy | keepalived 检测到 haproxy 进程,自动将VIP从备端转移到主端,5000端口和5001端口连接正常 |
3. 高可用测试 Patroni
以下是在Patroni开启了auto failover的情况下进行测试
代码语言:javascript复制[root@pgtest3 ~]# patronictl resume
测试类型 | 测试方式 | 测试命令 | 测试结果 |
---|---|---|---|
进程故障 | 1. 主端killall进程 | killall patroni | 1. 触发故障切换到备库其中一个节点,备库另一个节点同步新主库,切换时间在30秒内 2. 原主库(pgtest1)的 PostgreSQL 被关闭 3. etcd haproxy keepalived 在原主库正常运行,VIP 运行在原主库 4. VIP 5000端口连接切换后的新主库,VIP 5001端口连接另一个备库 |
- | - | systemctl start patroni | 原主库(pgtest1)变成新主库(pgtest2)的备库 |
- | 2. 主库kill patroni 进程 | kill -9 | 1. 触发故障切换到备库其中一个节点,备库另一个节点同步新主库,切换时间在30秒内 2. 原主库(pgtest1)的 PostgreSQL 还在运行,并且是读写模式 3. etcd haproxy keepalived 在原主库正常运行,VIP 运行在原主库 4. VIP 5000端口连接切换后的新主库,VIP 5001端口连接另一个备库 |
- | - | systemctl start patroni | 原主库(pgtest1)被 pg_rewind 成新主库(pgtest2)的备库 |
- | 3. 一个备库kill patroni 进程 | - | 1. 使用killall,将会同时关闭备库,使用kill,此备库的 PostgreSQL 还在以只读模式运行,且与主库正常同步数据 2. VIP 5000端口正常连接主库,VIP 5001端口不能连接此备库,可以连接另一个备库 3. 主库与另一个备库不受影响 4. 此备库上的 etcd haproxy keepalived 正常运行 |
- | - | systemctl start patroni | 自动恢复正常状态,与主库保持同步 |
- | 4. 两个备库kill patroni 进程 | - | 1. 使用killall,将会同时关闭备库,使用kill,两个备库的 PostgreSQL 还在以只读模式运行,且与主库正常同步数据 2. VIP 5000端口只连接主库,VIP 5001端口连接失败 3. 主库不受影响 4. 备库上的 etcd haproxy keepalived 正常运行 |
- | - | systemctl start patroni | 自动恢复正常状态,与主库保持同步 |
4. 高可用测试 etcd
测试类型 | 测试方式 | 测试命令 | 测试结果 |
---|---|---|---|
进程故障 | 1. 主库kill etcd 进程 | - | 不影响主库和备库, patroni 会连接其它节点上的etcd,VIP 5000/5001端口连接正常 |
- | 2. 一个备库停止 etcd 进程 | - | 不影响主库和备库, patroni 会连接其它节点上的etcd,VIP 5000/5001端口连接正常 |
- | 3. 两个备库停止 etcd 进程 | - | 此时超过了etcd的最大允许故障节点数,主备库3个节点均以只读模式运行,VIP 5000端口连接失败,VIP 5001端口轮询连接主备库3个节点 |
- | - | 先启动第一个备库的 etcd 进程 | 主库从只读模式切换成读写模式,主从数据同步恢复正常,VIP 5000/5001端口连接正常 |
- | - | 再启动第二个备库的 etcd 进程 | 自动恢复正常状态,与主库保持同步 |
5. 高可用测试 PostgreSQL
测试类型 | 测试方式 | 测试命令 | 测试结果 |
---|---|---|---|
- | 停主库PostgreSQL实例 | - | 主库被Patroni自动拉起,VIP 5000/5001端口连接正常 |
- | 停备库PostgreSQL实例 | - | 备库被Patroni自动拉起,VIP 5000/5001端口连接正常 |
6. 高可用测试 操作系统
测试类型 | 测试方式 | 测试命令 | 测试结果 |
---|---|---|---|
- | 停PostgreSQL主库主机(同时是haproxy keepalived 的主机) | reboot | 1. 触发故障切换到备库其中一个节点,备库另一个节点同步新主库,切换时间在30秒内 2. VIP漂移到备库 3. VIP 5000端口连接切换后的新主库,VIP 5001端口连接另一个备库 |
- | - | 启动 | 原主库(pgtest1)变成新主库(pgtest2)的备库,VIP从keepalived的备端自动转移到主端,5000端口和5001端口连接正常 |
- | 停备库的主机就不测试了 | - | - |
7. 读写分离 负载均衡测试
通过 5000 端口提供读写服务,通过 5001 端口提供只读服务。
代码语言:javascript复制[root@lhrpgpatroni71 /]# patronictl list
---------------- ------------- --------- --------- ---- -----------
| Member | Host | Role | State | TL | Lag in MB |
Cluster: pg_cluster (6981731393302868828) ------- ---- -----------
| lhrpgpatroni71 | 172.72.6.71 | Leader | running | 7 | |
| lhrpgpatroni72 | 172.72.6.72 | Replica | running | 7 | 0 |
| lhrpgpatroni73 | 172.72.6.73 | Replica | running | 7 | 0 |
---------------- ------------- --------- --------- ---- -----------
psql -U postgres -h 172.72.6.77 -p5000
psql -U postgres -h 172.72.6.78 -p5000
psql -U postgres -h 172.72.6.77 -p5001
psql -U postgres -h 172.72.6.78 -p5001
for i in $(seq 1 10); do psql 'host=172.72.6.70 port=5000 user=postgres dbname=postgres password=lhr' -c 'select inet_server_addr(),pg_is_in_recovery()'; done | egrep '172'
[root@db-rac ~]# for i in $(seq 1 10); do psql 'host=172.72.6.70 port=5000 user=postgres dbname=postgres password=lhr' -c 'select inet_server_addr(),pg_is_in_recovery()'; done | egrep '172'
172.72.6.71 | f
172.72.6.71 | f
172.72.6.71 | f
172.72.6.71 | f
172.72.6.71 | f
172.72.6.71 | f
172.72.6.71 | f
172.72.6.71 | f
172.72.6.71 | f
172.72.6.71 | f
[root@db-rac ~]# for i in $(seq 1 10); do psql 'host=172.72.6.70 port=5001 user=postgres dbname=postgres password=lhr' -c 'select inet_server_addr(),pg_is_in_recovery()'; done | egrep '172'
172.72.6.72 | t
172.72.6.73 | t
172.72.6.72 | t
172.72.6.72 | t
172.72.6.73 | t
172.72.6.73 | t
172.72.6.72 | t
172.72.6.72 | t
172.72.6.73 | t
172.72.6.72 | t
可以看出,5000是读写,而5001是只读 负载均衡。
8. switchover和failover切换
https://www.modb.pro/db/152388
https://www.modb.pro/db/155433
- failover endpoints 允许在没有健康节点时执行手动 failover ,但同时它不允许 switchover 。
- switchover endpoint 则相反。它仅在集群健康(有leader)时才起作用,并允许在指定时间安排切换。
switchover操作
代码语言:javascript复制[root@lhrpgpatroni71 /]# patronictl list
---------------- ------------- --------- --------- ---- -----------
| Member | Host | Role | State | TL | Lag in MB |
Cluster: pg_cluster (6981731393302868828) ------- ---- -----------
| lhrpgpatroni71 | 172.72.6.71 | Leader | running | 7 | |
| lhrpgpatroni72 | 172.72.6.72 | Replica | running | 7 | 0 |
| lhrpgpatroni73 | 172.72.6.73 | Replica | running | 7 | 0 |
---------------- ------------- --------- --------- ---- -----------
[root@lhrpgpatroni71 /]# patronictl switchover
Master [lhrpgpatroni71]:
Candidate ['lhrpgpatroni72', 'lhrpgpatroni73'] []: lhrpgpatroni72
When should the switchover take place (e.g. 2022-01-22T20:19 ) [now]:
Current cluster topology
---------------- ------------- --------- --------- ---- -----------
| Member | Host | Role | State | TL | Lag in MB |
Cluster: pg_cluster (6981731393302868828) ------- ---- -----------
| lhrpgpatroni71 | 172.72.6.71 | Leader | running | 7 | |
| lhrpgpatroni72 | 172.72.6.72 | Replica | running | 7 | 0 |
| lhrpgpatroni73 | 172.72.6.73 | Replica | running | 7 | 0 |
---------------- ------------- --------- --------- ---- -----------
Are you sure you want to switchover cluster pg_cluster, demoting current master lhrpgpatroni71? [y/N]: y
2022-01-22 19:19:49.35893 Successfully switched over to "lhrpgpatroni72"
[root@lhrpgpatroni71 /]# patronictl list
---------------- ------------- --------- --------- ---- -----------
| Member | Host | Role | State | TL | Lag in MB |
Cluster: pg_cluster (6981731393302868828) ------- ---- -----------
| lhrpgpatroni71 | 172.72.6.71 | Replica | running | 8 | 0 |
| lhrpgpatroni72 | 172.72.6.72 | Leader | running | 8 | |
| lhrpgpatroni73 | 172.72.6.73 | Replica | running | | unknown |
---------------- ------------- --------- --------- ---- -----------
[root@lhrpgpatroni71 /]# patronictl list
---------------- ------------- --------- --------- ---- -----------
| Member | Host | Role | State | TL | Lag in MB |
Cluster: pg_cluster (6981731393302868828) ------- ---- -----------
| lhrpgpatroni71 | 172.72.6.71 | Replica | running | 8 | 0 |
| lhrpgpatroni72 | 172.72.6.72 | Leader | running | 8 | |
| lhrpgpatroni73 | 172.72.6.73 | Replica | stopped | | unknown |
---------------- ------------- --------- --------- ---- -----------
[root@lhrpgpatroni71 /]# patronictl list
---------------- ------------- --------- --------- ---- -----------
| Member | Host | Role | State | TL | Lag in MB |
Cluster: pg_cluster (6981731393302868828) ------- ---- -----------
| lhrpgpatroni71 | 172.72.6.71 | Replica | running | 8 | 0 |
| lhrpgpatroni72 | 172.72.6.72 | Leader | running | 8 | |
| lhrpgpatroni73 | 172.72.6.73 | Replica | running | 8 | 0 |
---------------- ------------- --------- --------- ---- -----------
failover操作
代码语言:javascript复制[root@lhrpgpatroni71 /]# patronictl list
---------------- ------------- --------- --------- ---- -----------
| Member | Host | Role | State | TL | Lag in MB |
Cluster: pg_cluster (6981731393302868828) ------- ---- -----------
| lhrpgpatroni71 | 172.72.6.71 | Replica | running | 8 | 0 |
| lhrpgpatroni72 | 172.72.6.72 | Leader | running | 8 | |
| lhrpgpatroni73 | 172.72.6.73 | Replica | running | 8 | 0 |
---------------- ------------- --------- --------- ---- -----------
---------------- ------------- --------- --------- ---- -----------
[root@lhrpgpatroni71 /]#
[root@lhrpgpatroni71 /]# patronictl failover
Candidate ['lhrpgpatroni71', 'lhrpgpatroni73'] []: lhrpgpatroni73
Current cluster topology
---------------- ------------- --------- --------- ---- -----------
| Member | Host | Role | State | TL | Lag in MB |
Cluster: pg_cluster (6981731393302868828) ------- ---- -----------
| lhrpgpatroni71 | 172.72.6.71 | Replica | running | 8 | 0 |
| lhrpgpatroni72 | 172.72.6.72 | Leader | running | 8 | |
| lhrpgpatroni73 | 172.72.6.73 | Replica | running | 8 | 0 |
---------------- ------------- --------- --------- ---- -----------
Are you sure you want to failover cluster pg_cluster, demoting current master lhrpgpatroni72? [y/N]: y
2022-01-22 19:24:19.78911 Successfully failed over to "lhrpgpatroni73"
---------------- ------------- --------- --------- ---- -----------
| Member | Host | Role | State | TL | Lag in MB |
Cluster: pg_cluster (6981731393302868828) ------- ---- -----------
| lhrpgpatroni71 | 172.72.6.71 | Replica | running | 8 | 0 |
| lhrpgpatroni72 | 172.72.6.72 | Replica | stopped | | unknown |
| lhrpgpatroni73 | 172.72.6.73 | Leader | running | 8 | |
---------------- ------------- --------- --------- ---- -----------
[root@lhrpgpatroni71 /]# patronictl list
---------------- ------------- --------- --------- ---- -----------
| Member | Host | Role | State | TL | Lag in MB |
Cluster: pg_cluster (6981731393302868828) ------- ---- -----------
| lhrpgpatroni71 | 172.72.6.71 | Replica | stopped | | unknown |
| lhrpgpatroni72 | 172.72.6.72 | Replica | running | 9 | 0 |
| lhrpgpatroni73 | 172.72.6.73 | Leader | running | 9 | |
---------------- ------------- --------- --------- ---- -----------
[root@lhrpgpatroni71 /]# patronictl list
---------------- ------------- --------- --------- ---- -----------
| Member | Host | Role | State | TL | Lag in MB |
Cluster: pg_cluster (6981731393302868828) ------- ---- -----------
| lhrpgpatroni71 | 172.72.6.71 | Replica | running | 9 | 0 |
| lhrpgpatroni72 | 172.72.6.72 | Replica | running | 9 | 0 |
| lhrpgpatroni73 | 172.72.6.73 | Leader | running | 9 | |
---------------- ------------- --------- --------- ---- -----------
POST /switchover 和 POST /failover 分别对应 Patientictl switchover 和 Patonictl failover 命令。DELETE /switchover 对应 patronictl flush switchover 命令。
基本维护
查询 patroni 集群状态
代码语言:javascript复制[root@lhrpgpatroni71 /]# patronictl list
---------------- ------------- --------- --------- ---- -----------
| Member | Host | Role | State | TL | Lag in MB |
Cluster: pg_cluster (6981731393302868828) ------- ---- -----------
| lhrpgpatroni71 | 172.72.6.71 | Leader | running | 1 | |
| lhrpgpatroni72 | 172.72.6.72 | Replica | running | 1 | 0 |
| lhrpgpatroni73 | 172.72.6.73 | Replica | running | 1 | 0 |
---------------- ------------- --------- --------- ---- -----------
http://172.72.6.71:8008/
http://172.72.6.71:8008/patroni
http://172.72.6.71:8008/cluster
image-20220122103714924
http://172.72.6.71:8008/config
使用 etcdctl 查看存储在etcd中的 patroni 动态配置参数
代码语言:javascript复制[root@lhretcd74 /]# etcdctl get /service/pg_cluster/config | jq
{
"ttl": 30,
"loop_wait": 10,
"retry_timeout": 10,
"maximum_lag_on_failover": 1048576,
"max_timelines_history": 0,
"master_start_timeout": 300,
"master_stop_timeout": 0,
"synchronous_mode": false,
"postgresql": {
"use_pg_rewind": true,
"use_slots": true,
"parameters": {
"max_connections": 3000,
"superuser_reserved_connections": 100,
"max_locks_per_transaction": 64,
"max_worker_processes": 2,
"max_prepared_transactions": 0,
"wal_level": "logical",
"wal_log_hints": true,
"track_commit_timestamp": false,
"max_wal_senders": 10,
"max_replication_slots": 10,
"listen_addresses": "*",
"port": 5432,
"cluster_name": "pg_cluster",
"archive_mode": true,
"archive_command": "test ! -f /postgresql/archive/%f && cp %p /postgresql/archive/%f"
}
}
}
[root@lhrpgpatroni71 /]# curl -s http://172.72.6.71:8008/config | jq
{
"ttl": 30,
"loop_wait": 10,
"retry_timeout": 10,
"maximum_lag_on_failover": 1048576,
"max_timelines_history": 0,
"master_start_timeout": 300,
"master_stop_timeout": 0,
"synchronous_mode": false,
"postgresql": {
"use_pg_rewind": true,
"use_slots": true,
"parameters": {
"max_connections": 3000,
"superuser_reserved_connections": 100,
"max_locks_per_transaction": 64,
"max_worker_processes": 2,
"max_prepared_transactions": 0,
"wal_level": "logical",
"wal_log_hints": true,
"track_commit_timestamp": false,
"max_wal_senders": 10,
"max_replication_slots": 10,
"listen_addresses": "*",
"port": 5432,
"cluster_name": "pg_cluster",
"archive_mode": true,
"archive_command": "test ! -f /postgresql/archive/%f && cp %p /postgresql/archive/%f"
}
}
}
patronictl 查看 patroni 动态配置参数
代码语言:javascript复制[root@lhrpgpatroni71 ~]# patronictl show-config
loop_wait: 10
master_start_timeout: 300
master_stop_timeout: 0
max_timelines_history: 0
maximum_lag_on_failover: 1048576
postgresql:
parameters:
archive_command: test ! -f /postgresql/archive/%f && cp %p /postgresql/archive/%f
archive_mode: true
cluster_name: pg_cluster
listen_addresses: '*'
max_connections: 3000
max_locks_per_transaction: 64
max_prepared_transactions: 0
max_replication_slots: 10
max_wal_senders: 10
max_worker_processes: 2
port: 5432
superuser_reserved_connections: 100
track_commit_timestamp: false
wal_level: logical
wal_log_hints: true
use_pg_rewind: true
use_slots: true
retry_timeout: 10
synchronous_mode: false
ttl: 30
通过 patronictl 调整配置参数,在其中一个节点调整,其他节点也会自动调整,并且 patroni 自动进行 reload 操作
代码语言:javascript复制[postgres@lhrpgpatroni71 ~]$ patronictl edit-config
# 编辑文本
Apply these changes? [y/N]: y
Configuration changed
对于需要重启数据库生效的参数,为了减少对生产的影响,可以逐个节点重启,也可以在停机窗口通过 patronictl restart 对整个集群进行重启:
代码语言:javascript复制# 仅重启当前节点(--any)
[root@lhrpgpatroni71 ~]# patronictl restart pg_cluster --any
---------------- ------------- --------- --------- ---- -----------
| Member | Host | Role | State | TL | Lag in MB |
Cluster: pg_cluster (6981731393302868828) ------- ---- -----------
| lhrpgpatroni71 | 172.72.6.71 | Leader | running | 1 | |
| lhrpgpatroni72 | 172.72.6.72 | Replica | running | 1 | 0 |
| lhrpgpatroni73 | 172.72.6.73 | Replica | running | 1 | 0 |
---------------- ------------- --------- --------- ---- -----------
When should the restart take place (e.g. 2022-01-22T11:14) [now]:
Are you sure you want to restart members lhrpgpatroni71, lhrpgpatroni72, lhrpgpatroni73? [y/N]: y
Restart if the PostgreSQL version is less than provided (e.g. 9.5.2) []:
Success: restart on member lhrpgpatroni73
[root@lhrpgpatroni71 ~]# patronictl list
---------------- ------------- --------- --------- ---- ----------- -----------------
| Member | Host | Role | State | TL | Lag in MB | Pending restart |
Cluster: pg_cluster (6981731393302868828) ------- ---- ----------- -----------------
| lhrpgpatroni71 | 172.72.6.71 | Leader | running | 1 | | |
| lhrpgpatroni72 | 172.72.6.72 | Replica | running | 1 | 0 | |
| lhrpgpatroni73 | 172.72.6.73 | Replica | running | 1 | 0 | * |
---------------- ------------- --------- --------- ---- ----------- -----------------
# 如果节点是 pending 状态的,才会执行重启操作
[root@lhrpgpatroni71 /]# patronictl restart pg_cluster --any --pending
---------------- ------------- --------- --------- ---- ----------- -----------------
| Member | Host | Role | State | TL | Lag in MB | Pending restart |
Cluster: pg_cluster (6981731393302868828) ------- ---- ----------- -----------------
| lhrpgpatroni71 | 172.72.6.71 | Leader | running | 1 | | |
| lhrpgpatroni72 | 172.72.6.72 | Replica | running | 1 | 0 | |
| lhrpgpatroni73 | 172.72.6.73 | Replica | running | 1 | 0 | * |
---------------- ------------- --------- --------- ---- ----------- -----------------
When should the restart take place (e.g. 2022-01-22T11:23) [now]:
Are you sure you want to restart members lhrpgpatroni71, lhrpgpatroni72, lhrpgpatroni73? [y/N]: y
Restart if the PostgreSQL version is less than provided (e.g. 9.5.2) []:
Success: restart on member lhrpgpatroni73
# 重启所有成员
[root@lhrpgpatroni71 /]# patronictl restart pg_cluster
---------------- ------------- --------- --------- ---- -----------
| Member | Host | Role | State | TL | Lag in MB |
Cluster: pg_cluster (6981731393302868828) ------- ---- -----------
| lhrpgpatroni71 | 172.72.6.71 | Leader | running | 1 | |
| lhrpgpatroni72 | 172.72.6.72 | Replica | running | 1 | 0 |
| lhrpgpatroni73 | 172.72.6.73 | Replica | running | 1 | 0 |
---------------- ------------- --------- --------- ---- -----------
When should the restart take place (e.g. 2022-01-22T11:24) [now]:
Are you sure you want to restart members lhrpgpatroni73, lhrpgpatroni71, lhrpgpatroni72? [y/N]: y
Restart if the PostgreSQL version is less than provided (e.g. 9.5.2) []:
Success: restart on member lhrpgpatroni73
Success: restart on member lhrpgpatroni71
Success: restart on member lhrpgpatroni72
[root@lhrpgpatroni71 /]# patronictl list
---------------- ------------- --------- --------- ---- -----------
| Member | Host | Role | State | TL | Lag in MB |
Cluster: pg_cluster (6981731393302868828) ------- ---- -----------
| lhrpgpatroni71 | 172.72.6.71 | Leader | running | 1 | |
| lhrpgpatroni72 | 172.72.6.72 | Replica | running | 1 | 0 |
| lhrpgpatroni73 | 172.72.6.73 | Replica | running | 1 | 0 |
---------------- ------------- --------- --------- ---- -----------
重新初始化 Reinitialize endpoint
只允许在从节点上执行,一旦调用,它将删除数据目录并启动 pg_basebackup 重新初始化指定节点上的 PostgreSQL 数据目录。 The call might fail if Patroni is in a loop trying to recover (restart) a failed Postgres. In order to overcome this problem one can specify {“force”:true} in the request body.
代码语言:javascript复制curl -s http://172.72.6.73:8008/reinitialize -XPOST
patronictl reinit pg_cluster
重新加载 Reload endpoint
POST /reload 让 Patroni 重新读取和应用配置文件。这相当于向 Patroni 进程发送 SIGHUP 信号。如果您更改了一些需要重新启动 Postgres 的参数(如 shared_buffers),您仍然必须通过调用 POST /restart 或使用 patriotictl restart 明确地重新启动Postgres。
代码语言:javascript复制curl -s http://172.72.6.73:8008/reload -XPOST
patronictl reload pg_cluster
使用 patronictl 执行数据库查询操作
代码语言:javascript复制[root@lhrpgpatroni71 /]# cat a.sql
select name from pg_settings limit 2;
[root@lhrpgpatroni71 /]# patronictl query -f a.sql --username=postgres --password
Password:
name
allow_system_table_mods
application_name
参考
https://www.modb.pro/topic/152353
https://www.cnblogs.com/VicLiu/p/14333778.html
Patroni 官方文档:https://patroni.readthedocs.io/en/latest/index.html
将独立设备转换为 Patroni 集群:https://patroni.readthedocs.io/en/latest/existing_data.html
PostgreSQL 版本重大升级:https://patroni.readthedocs.io/en/latest/existing_data.html