如何更改 Postgres 中的一组分区表?

发布于 2024-09-06 19:40:25 字数 226 浏览 9 评论 0原文

我在 Postgres 中创建了一组分区表,并开始通过主表插入大量行。当加载过程让我崩溃时,我意识到我应该将 id 行声明为 BIGSERIAL(带有序列的 BIGINT,在幕后),但无意中将其设置为 SERIAL(INTEGER)。现在我已经加载了几十亿行,我正在尝试将列更改为 BIGINT。该过程似乎有效,但需要很长时间。所以,实际上,我真的不知道它是在工作还是挂起。我不想再次重新启动整个加载过程。

有什么建议吗?

I created a set of partitioned tables in Postgres, and started inserting a lot of rows via the master table. When the load process blew up on me, I realized I should have declared the id row BIGSERIAL (BIGINT with a sequence, behind the scenes), but inadvertently set it as SERIAL (INTEGER). Now that I have a couple of billion rows loaded, I am trying to ALTER the column to BIGINT. The process seems to be working, but is taking a long time. So, in reality, I don't really know if it is working or it is hung. I'd rather not restart the entire load process again.

Any suggestions?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

ぇ气 2024-09-13 19:40:25

当您在 PostgreSQL 中更新一行以更改它时,会写出该行的新副本,然后进行一些清理以删除原始行。这意味着尝试通过更新来解决问题可能比从头开始重新加载所有数据要花费更长的时间——这比加载新副本需要更多的磁盘 I/O,并且还需要一些额外的处理时间。您想要执行更新而不是重新加载的唯一情况是当原始加载效率非常低时,例如,如果缓慢的客户端程序正在插入数据并且它是进程的瓶颈。

要确定该进程是否仍在工作,请在运行 top(UNIX 类系统)或任务管理器 (Windows) 时查看它是否正在使用 CPU。在 Linux 上,“top -c”甚至会显示 PostgreSQL 客户端进程正在做什么。您可能只是期望它比原始加载花费的时间更少,但事实并非如此,而且它仍在运行而不是挂起。

When you update a row to alter it in PostgreSQL, that writes out a new copy of the row and then does some cleanup later to remove the original. This means that trying to fix the problem by doing updates can take longer than just loading all the data in from scratch again--it's more disk I/O than loading a new copy, and some extra processing time too. The only situation where you'd want to do an update instead of a reload is when the original load was very inefficient, for example if a slow client programs is inserting the data and it's the bottleneck on the process.

To figure out if the process is still working, see if it's using CPU when you run top (UNIX-ish systems) or the Task Manager (Windows). On Linux, "top -c" will even show you what the PostgreSQL client processes are doing. You probably just expected it to take less time than the original load, which it won't, and it's still running rather than hung up.

飘逸的'云 2024-09-13 19:40:25

重新启动它(澄清编辑:再次重新启动整个加载过程)。

更改列值需要新的行版本,并且所有指向旧版本的索引都要更新为指向新版本。

此外,请查看您可以遵循多少有关填充数据库的建议


@archnid 的更正:

改变列的类型会触发表重写,因此行版本控制不是一个大问题,但仍然会暂时占用大量磁盘空间。您通常可以通过查看数据库目录中的哪些文件被附加到来监视进度...

Restart it (clarifying edit: restart the entire load process again).

Altering a column value requires a new row version, and all indexes pointing to the old version to be updated to point to the new version.

Additionally, see how much of the advise on populating databases you can follow.


Correction from @archnid:

altering the type of the column will trigger a table rewrite, so the row versioning isn't a big problem, but it will still take lots of disk space temporarily. you can usually monitor progress by looking at which files in the database directory are being appended to...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文