将串行列添加到具有 100,000,000 行的现有表
我有一个大约有 100,000,000 行的表。我们需要删除其中大约 80,000 个来进行修复。
为了防止停机,我设置了一个作业来获取需要删除的记录,然后以 100 条为一组进行删除。但是,即使处理前 100 条也需要很长时间。
该表上没有主 ID,我可以可靠地引用每一行的唯一方法是使用名为 tx
的唯一列,它是 varchar(250)`(尽管该字段永远不会超过 18- 20 个字符)。我在这一行上创建了索引,但仍然需要大约 4-6 秒来选择一行。
似乎 varchar 导致了问题,所以我想添加一个新的 id bigint serial
列,但试图弄清楚这样做是否会锁定表,直到它能够填充所有ID 的。
我知道只要没有默认值,alter table add columns
就是非阻塞的。但是 Serial 是否算作默认
值?
我在文档中找不到这个问题的答案。我们使用的是 Postgres 12。
I have a table with roughly 100,000,000 rows. We need to delete around 80,000 of them for a remediation.
In order to prevent downtime, I have a job setup to grab the records that needs to be deleted and then processes the delete in chunks of 100. However, even processing the first 100 is taking forever.
There is no primary ID on this table and the only way I can reliably reference each row is with a unique column called tx
which is a varchar(250)` (though the field is never longer than 18-20 characters). I created an index on this row, but still takes roughly 4-6s to select a row.
Seemed likely the varchar was causing the problem, so I wanted to add a new id bigint serial
column, but was trying to figure out whether or not doing this would lock the table until it's able to populate all of the ID's.
I know alter table add column
is non blocking as long as there is no default value. But does Serial count as a default
value?
I couldn't find an answer to this in the documentation. We're on Postgres 12.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
添加具有序列生成值的新列将重写表,这将导致停机。如果小心一些,它可以在没有停机时间的情况下完成,但是如果您已经有一个带有不包含 NULL 值的唯一索引的
varchar
列,那么这很复杂并且不值得付出努力。搜索具有现有索引的行应该只需几毫秒。如果不是,那就是你必须解决的问题。您可以将查询的
EXPLAIN (ANALYZE, BUFFERS)
输出添加到问题中吗?Adding a new column with a sequence-generated value will rewrite the table, which will cause down time. With some care, it could be done without down time, but that is complicated and not worth the effort if you already have a
varchar
column with a unique index on it that does not contain NULL values.Searching for rows with the existing index should be a matter of milliseconds. If it isn't, that's the problem you have to solve. Can you add
EXPLAIN (ANALYZE, BUFFERS)
output for the query to the question?