将 Postgres 查询封装在视图中使其变得非常慢
我有一个查询在 Postgres 8.4 上运行大约 5 秒。它从连接到其他一些表的视图中选择数据,但也使用 lag() 窗口函数,即。
SELECT *, lag(column1) OVER (PARTITION BY key1 ORDER BY ...), lag(...)
FROM view1 v
JOIN othertables USING (...)
WHERE ...
为了方便起见,我创建了一个新视图,只需
SELECT *, lag(column1) OVER (PARTITION BY key1 ORDER BY ...), lag(...)
FROM view1 v
像以前一样使用所有其他联接和过滤器,然后从中进行选择。令我惊讶的是,这个查询在 12 分钟内没有完成(我在那时停止了它)。显然 Postgres 选择了不同的执行计划。我怎样才能不这样做,即。使用与原始查询相同的计划?我本以为视图不应该改变执行计划,但显然它确实如此。
编辑:更重要的是,我发现即使我将第一个视图的内容复制到第二个视图中,它仍然不会返回。
编辑 2:好的,我已经充分简化了查询以发布计划。
使用视图(这不会在任何合理的时间内返回):
Subquery Scan sp (cost=5415201.23..5892463.97 rows=88382 width=370)
Filter: (((sp.ticker)::text ~~ 'Some Ticker'::text) AND (sp.price_date >= '2010-06-01'::date))
-> WindowAgg (cost=5415201.23..5680347.20 rows=53029193 width=129)
-> Sort (cost=5415201.23..5441715.83 rows=53029193 width=129)
Sort Key: sp.stock_id, sp.price_date
-> Hash Join (cost=847.87..1465139.61 rows=53029193 width=129)
Hash Cond: (sp.stock_id = s.stock_id)
-> Seq Scan on stock_prices sp (cost=0.00..1079829.20 rows=53029401 width=115)
-> Hash (cost=744.56..744.56 rows=29519 width=18)
-> Seq Scan on stocks s (cost=0.00..744.56 rows=29519 width=18)
将窗口函数从视图中取出并放入查询本身(这会立即返回):
WindowAgg (cost=34.91..34.95 rows=7 width=129)
-> Sort (cost=34.91..34.92 rows=7 width=129)
Sort Key: sp.stock_id, sp.price_date
-> Nested Loop (cost=0.00..34.89 rows=7 width=129)
-> Index Scan using stocks_ticker_unique on stocks s (cost=0.00..4.06 rows=1 width=18)
Index Cond: ((ticker)::text = 'Some Ticker'::text)
Filter: ((ticker)::text ~~ 'Some Ticker'::text)
-> Index Scan using stock_prices_id_date_idx on stock_prices sp (cost=0.00..30.79 rows=14 width=115)
Index Cond: ((sp.stock_id = s.stock_id) AND (sp.price_date >= '2010-06-01'::date))
所以似乎在缓慢的情况下,它试图首先将窗口函数应用于所有数据,然后对其进行过滤,这可能是问题所在。但我不知道为什么要这样做。
I have a query that runs in about 5 seconds on Postgres 8.4. It selects data from a view joined to some other tables, but also uses the lag() window function, ie.
SELECT *, lag(column1) OVER (PARTITION BY key1 ORDER BY ...), lag(...)
FROM view1 v
JOIN othertables USING (...)
WHERE ...
For convenience I created a new view that simply has
SELECT *, lag(column1) OVER (PARTITION BY key1 ORDER BY ...), lag(...)
FROM view1 v
and then SELECT from that, using all the other JOINs and filters as before. To my surprise this query doesn't complete in 12 minutes (I stopped it at that point). Obviously Postgres has chosen a different execution plan. How do I get it to not do that, ie. use the same plan as in the original query? I would have thought that a view shouldn't change the execution plan, but apparently it does.
Edit: what's more, I found that even if I copy the contents of the first view into the second it still doesn't return.
Edit 2: OK, I've simplified the query sufficiently to post the plans.
Using the view (this doesn't return in any reasonable time):
Subquery Scan sp (cost=5415201.23..5892463.97 rows=88382 width=370)
Filter: (((sp.ticker)::text ~~ 'Some Ticker'::text) AND (sp.price_date >= '2010-06-01'::date))
-> WindowAgg (cost=5415201.23..5680347.20 rows=53029193 width=129)
-> Sort (cost=5415201.23..5441715.83 rows=53029193 width=129)
Sort Key: sp.stock_id, sp.price_date
-> Hash Join (cost=847.87..1465139.61 rows=53029193 width=129)
Hash Cond: (sp.stock_id = s.stock_id)
-> Seq Scan on stock_prices sp (cost=0.00..1079829.20 rows=53029401 width=115)
-> Hash (cost=744.56..744.56 rows=29519 width=18)
-> Seq Scan on stocks s (cost=0.00..744.56 rows=29519 width=18)
Taking the window function out of the view and putting into the query itself (this returns instantly):
WindowAgg (cost=34.91..34.95 rows=7 width=129)
-> Sort (cost=34.91..34.92 rows=7 width=129)
Sort Key: sp.stock_id, sp.price_date
-> Nested Loop (cost=0.00..34.89 rows=7 width=129)
-> Index Scan using stocks_ticker_unique on stocks s (cost=0.00..4.06 rows=1 width=18)
Index Cond: ((ticker)::text = 'Some Ticker'::text)
Filter: ((ticker)::text ~~ 'Some Ticker'::text)
-> Index Scan using stock_prices_id_date_idx on stock_prices sp (cost=0.00..30.79 rows=14 width=115)
Index Cond: ((sp.stock_id = s.stock_id) AND (sp.price_date >= '2010-06-01'::date))
So it seems that in the slow case it's trying to apply the window function to all the data first and then filter it, which is probably the issue. I don't know why it's doing that, though.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
两个计划之间的差异来自于聚合的加入。这可以防止使用嵌套循环计划。当你在你的观点中使用聚合时,你就把自己置于不利的境地。
例如,这几乎总是会导致两个表上的合并或散列连接计划,然后进行前 n 排序:
Your difference between the two plans comes from joining with an aggregate. This prevents the use a nested loop plan. When you use the aggregate in your view, you put yourself in that unfavorable scenario.
This, for instance, will almost always lead to a merge or hash join plan on the two tables followed by a top-n sort:
也许您可以考虑使用 公用表表达式 (CTE)而不是视图。我可以以与使用视图类似的方式帮助使查询更清晰,但似乎不会以相同的方式影响执行计划。
我在这个问题中遇到了类似的问题,并且使用 CTE 而不是视图使执行计划更加高效。
Perhaps you could consider using a Common Table Expression (CTE) instead of a view. I can help making the query clearer in a similar way to using a view, but doesn't seem to affect the execution plan in the same way.
I had a similar problem in this question and using a CTE instead of a view made the execution plan much more efficient.