如何避免频繁更新造成的数据库存储碎片?
当我有下表时:
CREATE TABLE test
(
"id" integer NOT NULL,
"myval" text NOT NULL,
CONSTRAINT "test-id-pkey" PRIMARY KEY ("id")
)
当执行大量如下查询时:
UPDATE "test" set "myval" = "myval" || 'foobar' where "id" = 12345
那么行 myval 将随着时间的推移变得越来越大。 postgresql 会做什么?它将从哪里获得空间?
我可以避免 postgresql 需要多次查找来读取特定的 myval 列吗?
postgresql 会自动执行此操作吗?
我知道通常我应该尝试更多地标准化数据。但我需要一次读取该值。每次更新(添加数据)时,Myval 都会增大约 20 个字节。有些专栏会有1-2个更新,有些则有1000个更新。 通常我只会使用一个新行而不是更新。但随后选择变得越来越慢。 所以我想到了非规范化的想法。
When I have the following table:
CREATE TABLE test
(
"id" integer NOT NULL,
"myval" text NOT NULL,
CONSTRAINT "test-id-pkey" PRIMARY KEY ("id")
)
When doing a lot of queries like the following:
UPDATE "test" set "myval" = "myval" || 'foobar' where "id" = 12345
Then the row myval will get larger and larger over time.
What will postgresql do? Where will it get the space from?
Can I avoid that postgresql needs more than one seek to read a particular myval-column?
Will postgresql do this automatically?
I know that normally I should try to normalize the data much more. But I need to read the value with one seek. Myval will enlarge by about 20 bytes with each update (that adds data). Some colums will have 1-2 updates, some 1000 updates.
Normally I would just use one new row instead of an update. But then selecting is getting slow.
So I came to the idea of denormalizing.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这与有关 TEXT in PostgreSQL 的问题相关,或者至少答案是相似的。 PostgreSQL 将大列存储在远离主表存储的位置:
因此,您可以预期
TEXT
(或BYTEA
或大型VARCHAR
)列始终存储在远离主表的位置,例如SELECT id, myval FROM test WHERE id = 12345
将进行两次搜索以将两列从磁盘上拉出(以及更多的搜索来解析它们的位置)。如果您的 UPDATE 确实导致您的 SELECT 变慢,那么也许您需要检查您的
This is related to this question about TEXT in PostgreSQL, or at least the answer is similar. PostgreSQL stores large columns away from the main table storage:
So you can expect a
TEXT
(orBYTEA
or largeVARCHAR
) column to always be stored away from the main table and something likeSELECT id, myval FROM test WHERE id = 12345
will take two seeks to pull both columns off the disk (and more seeks to resolve their locations).If your UPDATEs really are causing your SELECTs to slow down then perhaps you need to review your vacuuming strategy.
更改表的 FILLFACTOR 以为将来的更新创建空间。这也可以是 HOT 更新,因为文本字段没有索引,以使更新更快,并且 autovacuum 开销更低,因为 HOT 更新使用 microvacuum。 CREATE TABLE 语句包含有关 FILLFACTOR 的一些信息。
值 70 并不是完美的设置,这取决于您的独特情况。也许您对 90 感到满意,也可能是 40 或其他。
Change the FILLFACTOR of the table to create space for future updates. This can also be HOT updates because the text field doesn't have an index, to make the update faster and autovacuum overhead lower because HOT updates use a microvacuum. The CREATE TABLE statement has some information about the FILLFACTOR.
The value 70 is not the perfect setting, it depends on your unique situation. Maybe you're fine with 90, it could also be 40 or something else.