调试 postgres 9.0.1 表损坏
我有一个 9.0.1 数据库,其中有两个损坏的表(一个表有 20 行,另一个表有 140 行)。这些表似乎适合读取/选择操作,但更新表中的某些行会产生错误消息:
update media set updated_at = now() at time zone 'UTC';
错误:无法读取文件“base/16384/16485”中的块2:只读 0 of 8192字节
update media_status set updated_at = now() at time zone 'UTC';
2011-04-14 00:15:15 UTC 错误:无法读取文件“base/16384/16543”中的块 3:只读取 8192 字节中的 0
2011-04-14 00:15:15 UTC 声明:在时区“UTC”更新 media_status 设置 Updated_at = now();
检查文件系统(linux)中损坏的文件,它们不是零字节: ll 底座/16384/16485 -rwx------ 1 postgres postgres 16384 2011-04-07 09:43 base/16384/16485
我运行了“vacuum(FULL, VERBOSE) ”命令,并且损坏(或至少是更新时的错误)消失了。是否期望“vacuum(FULL)”命令能够修复表损坏?这是否为可能发生的事情提供了任何线索?
有没有办法确定这种损坏是如何/何时发生的?
我怀疑它可能是在文件系统级备份(即 pg_start_backup()、tar -czf...、pg_stop_backup())期间发生的,因为我执行了备份并将数据库移动到不同的系统。恢复文件并启动 postgres 后,我开始收到这些错误。我尝试使用相同的 tar 存档进行多次恢复,但结果相同(在不同的系统上)。
谢谢, 担
I have a 9.0.1 database with two corrupted tables (one table has 20 rows, the other 140). The tables seem fine for read/select operations, but updating certain rows in the table produce error messages:
update media set updated_at = now() at time zone 'UTC';
ERROR: could not read block 2 in file "base/16384/16485": read only 0 of 8192 bytes
update media_status set updated_at = now() at time zone 'UTC';
2011-04-14 00:15:15 UTC ERROR: could not read block 3 in file "base/16384/16543": read only 0 of 8192 bytes
2011-04-14 00:15:15 UTC STATEMENT: update media_status set updated_at = now() at time zone 'UTC';
Examining the corrupted files in the filesystem (linux), they aren't zero bytes:
ll base/16384/16485
-rwx------ 1 postgres postgres 16384 2011-04-07 09:43 base/16384/16485
I ran a "vacuum(FULL, VERBOSE) " command and the corruption (or at least the errors on update) has disappeared. Is it expected that a "vacuum(FULL)" command would/could fix table corruption? Does that providing any clues as to what may have happened?
Is there any way to determine how/when this corruption may have occurred?
I suspect that it may have occurred during a filesystem level backup (ie pg_start_backup(), tar -czf..., pg_stop_backup()), as I performed a backup and moved the database to a different system. After restoring the files and starting postgres I began getting these errors. I have tried restoring multiple times with the same tar archive with the same results (on different systems).
Thanks,
Dan
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
对于这种情况很难做出明确的说法,但我会调查存储层。像错误消息这样的短读取通常表明在本地通常连接的存储上“不可能发生”。因此,如果您有 SAN、NAS、NFS、一些重要的 RAID 配置或其他感兴趣的东西,请检查那里的日志或错误计数器。
如果文件在那里,则意味着 PostgreSQL 内部没有损坏。
尝试手动读取文件并看看会发生什么是一件很有趣的事情,但现在可能已经太晚了。
VACUUM FULL 修复它的事实可能只是因为它将表完全重写到新文件中,因此导致旧文件出现问题的任何原因都消失了。
It's hard to make a definite statement about this situation, but I would investigate the storage layer. Short reads like the error message indicates usually "can't happen" on local, normally attached storage. So if you have a SAN, NAS, NFS, some nontrivial RAID configuration, or something else of interest, check the logs or error counters there.
If the files are there, then that means that it's not a corruption inside PostgreSQL.
One thing that would have been interesting to try, but it's probably too late now, is to try to read the files manually and see what happens.
The fact that VACUUM FULL fixed it was probably just because that rewrites the table completely into new files, so whatever was causing hickups with the old files is gone.