Linux 上 PostgreSQL 中的配置参数 work_mem
我必须通过调整基本 PostgreSQL 服务器配置参数来优化查询。在文档中,我遇到了 work_mem
参数。然后我检查了更改此参数将如何影响查询的性能(使用排序)。我使用各种 work_mem
设置测量了查询执行时间,结果非常失望。
我执行查询的表包含 10,000,000 行,并且有 430 MB 的数据需要排序。 (排序方法:外部合并磁盘:430112kB
)。
当 work_mem = 1MB
时,EXPLAIN
输出为:
Total runtime: 29950.571 ms (sort takes about 19300 ms).
Sort (cost=4032588.78..4082588.66 rows=19999954 width=8)
(actual time=22577.149..26424.951 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
对于 work_mem = 5MB
:
Total runtime: 36282.729 ms (sort: 25400 ms).
Sort (cost=3485713.78..3535713.66 rows=19999954 width=8)
(actual time=25062.383..33246.561 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
对于 work_mem = 64MB
:
Total runtime: 42566.538 ms (sort: 31000 ms).
Sort (cost=3212276.28..3262276.16 rows=19999954 width=8)
(actual time=28599.611..39454.279 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
谁能解释一下为什么性能会变差?或者建议任何其他方法通过更改服务器参数来加快查询执行速度?
我的查询(我知道这不是最佳的,但我必须对这种查询进行基准测试):
SELECT n
FROM (
SELECT n + 1 AS n FROM table_name
EXCEPT
SELECT n FROM table_name) AS q1
ORDER BY n DESC;
完整的执行计划:
Sort (cost=5805421.81..5830421.75 rows=9999977 width=8) (actual time=30405.682..30405.682 rows=1 loops=1)
Sort Key: q1.n
Sort Method: quicksort Memory: 25kB
-> Subquery Scan q1 (cost=4032588.78..4232588.32 rows=9999977 width=8) (actual time=30405.636..30405.637 rows=1 loops=1)
-> SetOp Except (cost=4032588.78..4132588.55 rows=9999977 width=8) (actual time=30405.634..30405.634 rows=1 loops=1)
-> Sort (cost=4032588.78..4082588.66 rows=19999954 width=8) (actual time=23046.478..27733.020 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
-> Append (cost=0.00..513495.02 rows=19999954 width=8) (actual time=0.040..8191.185 rows=20000000 loops=1)
-> Subquery Scan "*SELECT* 1" (cost=0.00..269247.48 rows=9999977 width=8) (actual time=0.039..3651.506 rows=10000000 loops=1)
-> Seq Scan on table_name (cost=0.00..169247.71 rows=9999977 width=8) (actual time=0.038..2258.323 rows=10000000 loops=1)
-> Subquery Scan "*SELECT* 2" (cost=0.00..244247.54 rows=9999977 width=8) (actual time=0.008..2697.546 rows=10000000 loops=1)
-> Seq Scan on table_name (cost=0.00..144247.77 rows=9999977 width=8) (actual time=0.006..1079.561 rows=10000000 loops=1)
Total runtime: 30496.100 ms
I have to optimize queries by tuning basic PostgreSQL server configuration parameters. In documentation I've came across the work_mem
parameter. Then I checked how changing this parameter would influence performance of my query (using sort). I measured query execution time with various work_mem
settings and was very disappointed.
The table on which I perform my query contains 10,000,000 rows and there are 430 MB of data to sort. (Sort Method: external merge Disk: 430112kB
).
With work_mem = 1MB
, EXPLAIN
output is:
Total runtime: 29950.571 ms (sort takes about 19300 ms).
Sort (cost=4032588.78..4082588.66 rows=19999954 width=8)
(actual time=22577.149..26424.951 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
With work_mem = 5MB
:
Total runtime: 36282.729 ms (sort: 25400 ms).
Sort (cost=3485713.78..3535713.66 rows=19999954 width=8)
(actual time=25062.383..33246.561 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
With work_mem = 64MB
:
Total runtime: 42566.538 ms (sort: 31000 ms).
Sort (cost=3212276.28..3262276.16 rows=19999954 width=8)
(actual time=28599.611..39454.279 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
Can anyone explain why performance gets worse? Or suggest any other methods to makes queries execution faster by changing server parameters?
My query (I know it's not optimal, but I have to benchmark this kind of query):
SELECT n
FROM (
SELECT n + 1 AS n FROM table_name
EXCEPT
SELECT n FROM table_name) AS q1
ORDER BY n DESC;
Full execution plan:
Sort (cost=5805421.81..5830421.75 rows=9999977 width=8) (actual time=30405.682..30405.682 rows=1 loops=1)
Sort Key: q1.n
Sort Method: quicksort Memory: 25kB
-> Subquery Scan q1 (cost=4032588.78..4232588.32 rows=9999977 width=8) (actual time=30405.636..30405.637 rows=1 loops=1)
-> SetOp Except (cost=4032588.78..4132588.55 rows=9999977 width=8) (actual time=30405.634..30405.634 rows=1 loops=1)
-> Sort (cost=4032588.78..4082588.66 rows=19999954 width=8) (actual time=23046.478..27733.020 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
-> Append (cost=0.00..513495.02 rows=19999954 width=8) (actual time=0.040..8191.185 rows=20000000 loops=1)
-> Subquery Scan "*SELECT* 1" (cost=0.00..269247.48 rows=9999977 width=8) (actual time=0.039..3651.506 rows=10000000 loops=1)
-> Seq Scan on table_name (cost=0.00..169247.71 rows=9999977 width=8) (actual time=0.038..2258.323 rows=10000000 loops=1)
-> Subquery Scan "*SELECT* 2" (cost=0.00..244247.54 rows=9999977 width=8) (actual time=0.008..2697.546 rows=10000000 loops=1)
-> Seq Scan on table_name (cost=0.00..144247.77 rows=9999977 width=8) (actual time=0.006..1079.561 rows=10000000 loops=1)
Total runtime: 30496.100 ms
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
except 查询相当于下面的 NOT EXISTS 形式,它在这里生成不同的查询计划(但结果相同)(9.0.1beta 等)
(带有递归 CTE 也可能是可能的:-)
编辑:查询计划。全部为 100K 记录,删除了 0.2%
原始查询:
NOT EXISTS-version with CTE:
NOT EXISTS-version without CTE
我的结论是“NOT EXISTS”版本导致 postgres 生成更好的计划。
The except query is equivalent to the following NOT EXISTS form, which generates a different query plan (but the same results) here ( 9.0.1beta something)
(a version with a recursive CTE might also be possible :-)
EDIT: the query plans. all for 100K records with 0.2 % deleted
Original query:
NOT EXISTS-version with CTE:
NOT EXISTS-version without CTE
My conclusion is that the "NOT EXISTS" versions cause postgres to produce better plans.
我在 explain.depesz.com 上发布了您的查询计划,请查看。
查询规划器的估计在某些地方是非常错误的。
您最近运行过
ANALYZE
吗?阅读手册中有关规划器使用的统计数据和< a href="https://www.postgresql.org/docs/current/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS" rel="nofollow noreferrer">规划器成本常量。请特别注意有关
random_page_cost
和default_statistics_target
的章节。您可以尝试:
或者对于具有 10M 行的表,甚至更高。这取决于数据分布和实际查询。实验。默认值为 100,最大值为 10000。
对于该大小的数据库,只需 1 或 5 MB
work_mem
通常是不够的。阅读@aleroot 链接到的有关调整 Postgres 的 Postgres Wiki 页面。根据
EXPLAIN
输出,您的查询需要 430104kB 磁盘内存,因此您必须将work_mem
设置为 500MB > 或更多以允许内存中排序。数据在内存中的表示比在磁盘上的表示需要更多的空间。您可能对 Tom Lane 最近发布的关于此事的内容感兴趣< /a>.就像您尝试的那样,稍微增加
work_mem
不会有太大帮助,甚至会减慢速度。在全局范围内将其设置为高甚至可能会造成伤害,尤其是在并发访问的情况下。多个会话可能会导致彼此缺乏资源。如果资源有限,为一个目的分配更多内存会占用另一个目的的内存。最佳设置取决于整体情况。为了避免副作用,仅在会话中将其本地设置得足够高,并暂时用于查询:
之后将其重置为默认值:
或使用
SET LOCAL
仅针对当前事务开始进行设置。I posted your query plan on explain.depesz.com, have a look.
The query planner's estimates are terribly wrong in some places.
Have you run
ANALYZE
recently?Read the chapters in the manual on Statistics Used by the Planner and Planner Cost Constants. Pay special attention to the chapters on
random_page_cost
anddefault_statistics_target
.You might try:
Or go even a higher for a table with 10M rows. It depends on data distribution and actual queries. Experiment. Default is 100, maximum is 10000.
For a database of that size, only 1 or 5 MB of
work_mem
are generally not enough. Read the Postgres Wiki page on Tuning Postgres that @aleroot linked to.As your query needs 430104kB of memory on disk according to
EXPLAIN
output, you have to setwork_mem
to something like 500MB or more to allow in-memory sorting. In-memory representation of data needs some more space than on-disk representation. You may be interested in what Tom Lane posted on that matter recently.Increasing
work_mem
by just a little, like you tried, won't help much or can even slow down. Setting it to high globally can even hurt, especially with concurrent access. Multiple sessions might starve one another for resources. Allocating more for one purpose takes away memory from another if the resource is limited. The best setup depends on the complete situation.To avoid side effects, only set it high enough locally in your session, and temporarily for the query:
Reset it to your default afterwards:
Or use
SET LOCAL
to set it just for the current transaction to begin with.