PostgreSQL:块中的页眉无效
出现错误
ERROR: invalid page header in block 411 of relation "t_value_time"
我的 PostgreSQL 数据库 。这种情况在不同的机器上不断发生。有没有办法防止这种情况发生,或者至少告诉 PSQL 忽略无效块上的数据并继续前进?
我宁愿丢失该块中的数据并让他跳过它,读取其余数据。有没有办法告诉 PSQL 跳过这个块?
I'm getting an Error
ERROR: invalid page header in block 411 of relation "t_value_time"
in my PostgreSQL database. This keeps happening on different machines. Is there a way to prevent it from happening, or at least telling PSQL to ignore the data on the invalid block and move on?
I'd rather lose the data from the block and have him skip over it, reading the rest of the data. Is there a way to tell PSQL to skip this block?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
警告:您将丢失一些数据!
我们通过发布以下命令成功克服了这个问题(崩溃的 DEV VM):
通过 pwkg.ork。
WARNING: You will lose some data!
We managed to get over it (crashed DEV VM) by issuing:
Fix via pwkg.ork.
每次都是同一个块吗?
据我所知,无效块的最常见原因是硬件。 Red Hat 有一个实用程序 pg_filedump,用于格式化“PostgreSQL 堆、索引和控制文件”转化为人类可读的形式”。我认为他们不支持任何高于 8.4.0 的 PostgreSQL 版本,但我可能是错的。
您希望通过使用可靠、彻底的磁盘、RAM 和 NIC 诊断来证明您的硬件良好。
Same block every time?
From what I've read, the most common cause of invalid blocks is hardware. Red Hat has a utility, pg_filedump, that formats "PostgreSQL heap, index, and control files into a human-readable form". I don't think they support any PostgreSQL version greater than 8.4.0, but I could be wrong.
You want to prove your hardware is good by using tough, thorough disk, RAM, and NIC diagnostics.
没有简单的方法可以做到这一点,但只需直接编辑数据文件(pg_class 条目的 relfilenode 给出文件名)就相当容易做到。
只需从文件中其他位置复制一个块来覆盖坏块即可。理想情况下,合成一个空块或更新您要覆盖的块,使其中没有有效的元组。
一旦获得不会产生该错误的内容,请转储表并重新加载以确保安全。
There's no simple way to do it, but it's reasonably easy to do just by editing the data file directly (relfilenode of the pg_class entry gives the filename).
Just copy a block from elsewhere in the file over the bad block. Ideally, synthesise an empty block or update the one you're overwriting to have no valid tuples in it.
Once you've got something that doesn't produce that error, dump the table and reload it for safety.
顺便说一句,这些几乎总是硬件问题。验证并测试 RAM、磁盘、CPU。确保您的环境良好(电源输入不良可能会导致问题以及过热)。这是防止它的最好方法。解决这个问题的最佳方法是从基本备份进行时间点恢复。
these are almost always hardware problems btw. Verify and test RAM, disk, CPU. Make sure your environment is good (bad power input can cause problems as can overheating). That's the best way to prevent it. Best way to address it is point in time recovery from a base backup.
如果您有从属设备,请将其上的 hot_standby_feedback 设置为“on”(如果尚未设置)。
执行 pg_dump 并将其写入 /dev/null 这样就不会消耗任何空间。
nohup pg_dump db_name -v -Fc -f /dev/null &
如果转储成功,那么你的奴隶就很好。进行故障转移。不会有数据丢失。
验证从服务器的另一种方法是解释 select count(*) from table_name;
如果成功并且使用序列扫描,那么您的从机就很好。
如果使用索引扫描,您可能不必考虑此选项。
注意:仅当您的主服务器受到存储级别损坏的影响时,此方法才有效。
我今天碰巧遇到了同样的问题,并且能够解决它。
If you have a slave, set hot_standby_feedback to 'on' on it if not already.
Do pg_dump and write it to /dev/null so that you don't consume any space.
nohup pg_dump db_name -v -Fc -f /dev/null &
If the dump succeeds, then your slave is fine. Do a failover. There will be no data loss.
Another way to validate your slave is to do, explain select count(*) from table_name;
If it succeeds and if it is using a sequence scan, then your slave is good.
You may not have to consider this option if it is using index scan.
Note: This works only if your master is affected with storage level corruption.
I happened to face same issue just today and i was able to fix it.