Postgresql - 使用enable_nestloop=false 查询运行速度更快。为什么规划者没有做正确的事情?
当我使用默认的enable_nestloop = true和enable_nestloop = false(〜10秒)运行它时,我有一个查询运行速度慢得多(〜5分钟)。
解释两种情况的分析结果:
机器 A Nestloop=true - http://explain.depesz.com/s/nkj0 (约 5 分钟) 机器 A Nestloop=false - http://explain.depesz.com/s/wBM(约 10 秒
)不同的稍慢的机器,复制数据库并保留默认的enable_nestloop=true需要大约20秒。
机器 B Nestloop=true - (~ 20secs)
对于上述所有情况,我确保在运行查询之前进行了分析。没有其他查询并行运行。
两台机器都运行 Postgres 8.4。机器 A 运行 Ubuntu 10.04 32 位,而机器 B 运行 Ubuntu 8.04 32 位。
实际的查询可以在这里找到。这是一个具有许多连接的报告查询,因为数据库主要用于事务处理。
如果不诉诸于诸如物化视图之类的东西,我该怎么做才能使规划器完成我通过设置enable_nestloop=false所实现的目标?
根据我所做的研究,规划器选择看似不理想的查询的原因似乎是因为估计行与实际行之间存在巨大差异。我怎样才能让这个数字更接近?
如果我应该重写查询,我应该更改什么?
为什么规划器似乎为机器 B 做了正确的事情。我应该在两台机器中比较什么?
I have a query that runs a lot slower (~5 minutes) when I run it with the default enable_nestloop=true and enable_nestloop=false (~10 secs).
Explain analyse result for both cases:
Machine A nestloop=true - http://explain.depesz.com/s/nkj0 (~5 minutes)
Machine A nestloop=false - http://explain.depesz.com/s/wBM (~10 secs)
On a different slightly slower machine, copying the database over and leaving the default enable_nestloop=true it takes ~20 secs.
Machine B nestloop=true - (~ 20secs)
For all the cases above I ensured that I did an ANALYZE before running the queries. There were no other queries running in parallel.
Both machines are running Postgres 8.4. Machine A is running Ubuntu 10.04 32 bit while Machine B is running Ubuntu 8.04 32 bit.
The actual query is available here . It is a reporting query with many joins as the database is mainly used for transaction processing.
Without resorting to putting in something like materialized views what can I do to make the planner do what I achieved by setting enable_nestloop=false ?
From the research I have done it seems to be that the reason the planner is choosing the seemingly unoptimal query is because of the huge difference between the estimated and actual rows. How can I get this figure closer ?
If I should rewrite the query, what should I change ?
Why is it that the planner seems to be doing the right thing for Machine B. What should I be comparing in both the machines ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果查询规划器选择次优查询计划,则很可能有不完整或误导性的信息可供使用。
请参阅有关服务器调整的 PostgreSQL Wiki 页面。特别要注意random_page_cost和default_statistics_target章节。
另请阅读手册中有关规划器使用的统计信息 和 规划器成本常量。
更具体地说,它可能有助于增加以下列的
统计目标
:这些列涉及过滤器,从而导致
还有更多。检查刨床与估计偏差较大的每一列。默认值为 100。仅对具有 >> 的表有意义。 1000 行。尝试设置。随后在表上运行 ANALYZE 以使更改生效。
它还可能有助于在
postgres(sales_orders.retailer_id) WHERE Retailer_id IS NOT NULL
上创建部分索引(取决于 NULL 值的常见程度)。另一件可能对您有帮助的事情是升级到最新版本 9.1。该领域已经取得了许多重大改进。
If the query planner chooses suboptimal query plans, then chances are it has incomplete or misleading information to work with.
See this PostgreSQL Wiki page on server tuning. Especially pay attention to the chapters on random_page_cost and default_statistics_target.
Also read the corresponding chapters in the manual on Statistics Used by the Planner and Planner Cost Constants.
More specifically, it might help to increase the
statistics target
for the following columns:These are involved in the filters resulting in the
There are more. Check every column where the planer deviates a lot from the estimate. Default is just 100. Makes only sense for tables with >> 1000 rows. Experiment with the setting. Run
ANALYZE
on the tables afterwards for the changes to take effect.It might also help to create a partial index on
postgres(sales_orders.retailer_id) WHERE retailer_id IS NOT NULL
(depending on how common NULL values are).Another thing that may help you is to upgrade to the latest version 9.1. There have been a number of substantial improvements in this area.
事实证明重写查询是最好的解决办法。该查询的编写方式严重依赖左连接并且有很多连接。我利用我对查询所连接的表中数据的连接性质的了解,将其展平并减少了左连接。我想经验法则是,如果规划者给出了真正蹩脚的估计,则可能有更好的方法来编写查询。
Turns out rewriting the query was the best fix. The query was written in a way that it relied heavily on left joins and had many joins. I flattened it out and reduced the left joins by using my knowledge of the join nature of the data in the tables the query was joining. I guess the rule of thumb is if the planner is coming out with real crappy estimates, there might be a better way of writing the query.
阅读以下内容可能会有用:
关于显式 JOIN 的 PostgreSQL 教程。
查询规划器尝试分析 JOIN 顺序以找到最佳的 JOIN 顺序。
我看到您的查询至少有 15 个 JOIN。可能的 JOIN 订单数量以阶乘 (n!) 的形式增加。因此,如果有 15 个 JOIN,查询规划器尝试找到最佳 JOIN 顺序是不合理的 - 它必须查看 15 个! = 1307674368000 个不同的计划。
因此它使用遗传查询优化器来代替。请参阅 查询规划:遗传查询优化器参数。参数“geqo_threshold”确定查询规划器必须存在多少个 JOIN 才能使用遗传查询优化器。
这样,PostgreSQL 规划器只会查看一小部分可能的变体,并尝试(随机)找到最好的变体。因此,每次运行分析时,它可能会提出更好的计划。
我认为一般来说,如果您有这么多表要 JOIN,您最好像您所做的那样:将查询重写为最佳 JOIN 顺序。
This might be useful reading:
PostgreSQL tutorial about explicit JOINs.
Query planner tries to analyze the JOIN order to find the best order for JOINing.
I saw Your query had at least 15 JOINs. The number of possible JOIN orders goes up as factorial (n!). So it is not reasonable for query planner to try to find the best JOIN order if there are 15 JOINs - it would have to look at 15! = 1307674368000 different plans.
So it uses Genetic Query Optimizer instead. See Query Planning: Genetic Query Optimizer parameters. The parameter "geqo_threshold" determines how many JOINs must be present for query planner to use Genetic Query Optimizer.
This way the PostgreSQL planner looks at only small part of possible variants and tries to find the best one (randomly). So each time You run the ANALYZE, it might come up with better plan.
I think that generally, if You have so many tables to JOIN, You are better off doing as You did: rewriting the query to optimal JOIN order.
在具有相同 PostgreSQL 的两台服务器上对相同数据和相同查询执行不同计划通常只有一个原因。这是不同的配置 - 主要是 work_mem 的值。哈希连接通常速度更快,但需要大量可用内存。
There are usually only one reason for different plan for same data and same queries on two servers with same PostgreSQL. This is different configuration - mainly value of work_mem. Hash join is usually faster, but needs a lot of available memory.