主检查点记录中的资源管理器 ID 无效

发布于 2025-01-10 00:32:44 字数 1254 浏览 1 评论 0原文

我已将 Airbyte 映像从 0.35.2-alpha 更新为 0.35.37-alpha。 [在 kubernetes 中运行]

当系统推出时，db pod 不会终止，我[一个可怕的错误]删除了 pod。当它恢复时，我收到一个错误 -

PostgreSQL Database directory appears to contain a database; Skipping initialization

2022-02-24 20:19:44.065 UTC [1] LOG:  starting PostgreSQL 13.6 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027, 64-bit
2022-02-24 20:19:44.065 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2022-02-24 20:19:44.065 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2022-02-24 20:19:44.071 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-02-24 20:19:44.079 UTC [21] LOG:  database system was shut down at 2022-02-24 20:12:55 UTC
2022-02-24 20:19:44.079 UTC [21] LOG:  invalid resource manager ID in primary checkpoint record
2022-02-24 20:19:44.079 UTC [21] PANIC:  could not locate a valid checkpoint record
2022-02-24 20:19:44.530 UTC [1] LOG:  startup process (PID 21) was terminated by signal 6: Aborted
2022-02-24 20:19:44.530 UTC [1] LOG:  aborting startup due to startup process failure
2022-02-24 20:19:44.566 UTC [1] LOG:  database system is shut down

很确定 WAL 文件已损坏，但我不知道如何修复此问题。

原文

I've update my Airbyte image from 0.35.2-alpha to 0.35.37-alpha.
[running in kubernetes]

When the system rolled out the db pod wouldn't terminate and I [a terrible mistake] deleted the pod.
When it came back up, I get an error -

PostgreSQL Database directory appears to contain a database; Skipping initialization

2022-02-24 20:19:44.065 UTC [1] LOG:  starting PostgreSQL 13.6 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027, 64-bit
2022-02-24 20:19:44.065 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2022-02-24 20:19:44.065 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2022-02-24 20:19:44.071 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-02-24 20:19:44.079 UTC [21] LOG:  database system was shut down at 2022-02-24 20:12:55 UTC
2022-02-24 20:19:44.079 UTC [21] LOG:  invalid resource manager ID in primary checkpoint record
2022-02-24 20:19:44.079 UTC [21] PANIC:  could not locate a valid checkpoint record
2022-02-24 20:19:44.530 UTC [1] LOG:  startup process (PID 21) was terminated by signal 6: Aborted
2022-02-24 20:19:44.530 UTC [1] LOG:  aborting startup due to startup process failure
2022-02-24 20:19:44.566 UTC [1] LOG:  database system is shut down

Pretty sure the WAL file is corrupted, but I'm not sure how to fix this.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

腻橙味 2025-01-17 00:32:44

警告 - 可能会丢失数据

这是一个测试系统，因此我不关心保留最新事务，也没有备份。

首先，我重写了容器命令以保持容器运行，但不尝试启动 postgres。

...
    spec:
      containers:
        - name: airbyte-db-container
          image: airbyte/db
          command: ["sh"]
          args: ["-c", "while true; do echo $(date -u) >> /tmp/run.log; sleep 5; done"]
...

并在 Pod 上生成一个 shell -

kubectl exec -it -n airbyte airbyte-db-xxxx -- sh

运行 pg_reset_wal

# dry-run first
pg_resetwal --dry-run /var/lib/postgresql/data/pgdata

成功！

pg_resetwal /var/lib/postgresql/data/pgdata
Write-ahead log reset

然后去掉容器中的temp命令，postgres就正确启动了！

Warning - there is a potential for data loss

This is a test system, so I wasn't concerned with keeping the latest transactions, and had no backup.

First I overrode the container command to keep the container running but not try to start postgres.

...
    spec:
      containers:
        - name: airbyte-db-container
          image: airbyte/db
          command: ["sh"]
          args: ["-c", "while true; do echo $(date -u) >> /tmp/run.log; sleep 5; done"]
...

And spawned a shell on the pod -

kubectl exec -it -n airbyte airbyte-db-xxxx -- sh

Run pg_reset_wal

# dry-run first
pg_resetwal --dry-run /var/lib/postgresql/data/pgdata

Success!

pg_resetwal /var/lib/postgresql/data/pgdata
Write-ahead log reset

Then removed the temp command in the container, and postgres started up correctly!

回复收藏 0 原文

堇色安年 2025-01-17 00:32:44

su 命令会扰乱 PATH，因此最简单的解决方案是使用 gosu 从根目录删除到 postgres gosu postgres pg_resetxlog /var/lib/postgresql/data。希望这对你有用！

回复收藏 0 原文

等风也等你 2025-01-17 00:32:44

不幸的是今天早上我的系统也出现了同样的错误。

错误已成功解决，数据库再次稳定运行。未检测到数据丢失。

修复此错误的一些建议：

将数据文件夹备份到单独的区域以避免丢失。
使用技巧来缩短 postgres 的自动重启时间：
<前><代码>服务：
数据库：
图片：“postgres：13.4-buster”
入口点：[“tail”，“-f”，“/dev/null”]
...
访问容器并运行以下命令：

    > docker exec -it  $(docker ps -q -f "name=<container-name>") bash
    > pg_resetwal --dry-run /var/lib/postgresql/data/pgdata
    > pg_resetwal /var/lib/postgresql/data/pgdata

      Write-ahead log reset

如果没有问题，请删除步骤 2 中的代码行并重新启动服务。

祝你好运！

Unfortunately this morning my system also had the same error.

The error has been resolved successfully and the database is operating stably again. No data loss detected.

Some suggestions to fix this error:

Backup the data folder to a separate area to avoid loss.

Use the trick to shorten postgres's automatic restart:

services:
  database:
    image: "postgres:13.4-buster"
    entrypoint: ["tail", "-f", "/dev/null"]
    ...

Access the container and run the following commands:

    > docker exec -it  $(docker ps -q -f "name=<container-name>") bash
    > pg_resetwal --dry-run /var/lib/postgresql/data/pgdata
    > pg_resetwal /var/lib/postgresql/data/pgdata

      Write-ahead log reset