运行在容器中Postgres数据库数据损坏后如何恢复?

2023-09-30 14:17:20 浏览数 (2)

前言

在使用 K8S 部署 RSS 全套自托管解决方案- RssHub Tiny Tiny Rss[1], 我介绍了将 RssHub Tiny Tiny RSS 部署到 K8s 集群中的方案. 其中 TTRSS 会用到 Postgres 存储数据, 也一并部署到 K8s 容器中.

但是最近, 由于一次错误操作, 导致 Postgres 数据库的 WAL 损坏, Postgres 的 Pod 频繁 CrashBackoffLoop. 具体报错如下:

Postgres shutdown exit code 1:

代码语言:javascript复制
2023-09-27 02:32:17.127 UTC [1] LOG:  received fast shutdown request
2023-09-27 02:32:17.181 UTC [1] LOG:  aborting any active transactions
2023-09-27 02:32:17.434 UTC [1] LOG:  background worker "logical replication launcher" (PID 26) exited with exit code 1
2023-09-27 02:32:17.481 UTC [21] LOG:  shutting down
2023-09-27 02:32:17.880 UTC [1] LOG:  database system is shut down

Postgres "invalid resource manager ID in primary checkpoint record" and "could not locate a valid checkpoint record"

代码语言:javascript复制
2023-09-27 02:33:23.189 UTC [1] LOG:  starting PostgreSQL 13.5 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027, 64-bit
2023-09-27 02:33:23.190 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-09-27 02:33:23.190 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2023-09-27 02:33:23.199 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-09-27 02:33:23.210 UTC [21] LOG:  database system was shut down at 2023-09-27 02:32:22 UTC
2023-09-27 02:33:23.210 UTC [21] LOG:  invalid resource manager ID in primary checkpoint record
2023-09-27 02:33:23.210 UTC [21] PANIC:  could not locate a valid checkpoint record
2023-09-27 02:33:24.657 UTC [1] LOG:  startup process (PID 21) was terminated by signal 6: Aborted
2023-09-27 02:33:24.657 UTC [1] LOG:  aborting startup due to startup process failure
2023-09-27 02:33:24.659 UTC [1] LOG:  database system is shut down

如上, WAL文件已损坏, 应该如何恢复?

恢复步骤

0 人点赞