是否可以将 LIMIT 子句分配给子查询?
我正在将此查询的结果加入
SELECT
twitter_personas.id
, 'TwitterPersona'
, twitter_personas.name
FROM twitter_personas
UNION ALL
SELECT
facebook_personas.id
, 'FacebookPersona'
, facebook_personas.name
-- and more UNION ALL statements pertaining to my other services
到评分表中。 JOIN 本身不是问题,但查询计划是“错误的”:PostgreSQL 找到前 50 个分数,然后 JOIN 到上面的完整视图,这意味着它徒劳地做了很多工作,因为我们只感兴趣在前 50 名中。但 50 是一个变量 - 它可能会改变(取决于 UI 问题,并且可能在某个时候分页,yada,yada)。
我通过直接在子查询中限制结果集来使查询速度非常快:
SELECT
personas.id
, personas.type
, personas.name
, xs.value
FROM (
SELECT
twitter_personas.id
, 'TwitterPersona'
, twitter_personas.name
FROM twitter_personas
WHERE id IN (
SELECT persona_id
FROM xs
ORDER BY
xs.value DESC
LIMIT 50)
UNION ALL
SELECT
facebook_personas.id
, 'FacebookPersona'
, facebook_personas.name
FROM facebook_personas
WHERE id IN (
SELECT persona_id
FROM xs
ORDER BY
xs.value DESC
LIMIT 50)) AS personas(id, type, name)
INNER JOIN xs ON xs.persona_id = personas.id
ORDER BY
xs.value DESC
LIMIT 50
我的问题是如何将上面的 50 从外部查询分配到内部查询?与原始合并到 UNION ALL 的完整结果集相比,此查询执行速度非常快(90 毫秒),后者在 15 秒内执行。也许有更好的方法来做到这一点?
以下是我的查询计划,仅供参考。首先,“糟糕”的一个,花费了近 15 秒:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..31072.27 rows=50 width=176) (actual time=304.299..14403.551 rows=50 loops=1)
-> Subquery Scan personas_ranked (cost=0.00..253116556.67 rows=407303 width=176) (actual time=304.298..14403.511 rows=50 loops=1)
-> Nested Loop Left Join (cost=0.00..253112483.64 rows=407303 width=112) (actual time=304.297..14403.474 rows=50 loops=1)
-> Nested Loop (cost=0.00..252998394.22 rows=407303 width=108) (actual time=304.283..14402.815 rows=50 loops=1)
Join Filter: ("*SELECT* 1".id = xs.persona_id)
-> Index Scan Backward using xs_value_index on xs xs (cost=0.00..459.97 rows=10275 width=12) (actual time=0.013..0.208 rows=50 loops=1)
-> Append (cost=0.00..15458.35 rows=407303 width=88) (actual time=0.006..244.217 rows=398435 loops=50)
-> Subquery Scan "*SELECT* 1" (cost=0.00..15420.65 rows=406562 width=88) (actual time=0.006..199.945 rows=398434 loops=50)
-> Seq Scan on twitter_personas (cost=0.00..11355.02 rows=406562 width=88) (actual time=0.005..134.607 rows=398434 loops=50)
-> Subquery Scan "*SELECT* 2" (cost=0.00..14.88 rows=150 width=502) (actual time=0.002..0.002 rows=0 loops=49)
-> Seq Scan on email_personas (cost=0.00..13.38 rows=150 width=502) (actual time=0.001..0.001 rows=0 loops=49)
-> Subquery Scan "*SELECT* 3" (cost=0.00..21.80 rows=590 width=100) (actual time=0.001..0.001 rows=0 loops=49)
-> Seq Scan on facebook_personas (cost=0.00..15.90 rows=590 width=100) (actual time=0.001..0.001 rows=0 loops=49)
-> Subquery Scan "*SELECT* 4" (cost=0.00..1.03 rows=1 width=25) (actual time=0.018..0.019 rows=1 loops=49)
-> Seq Scan on web_personas (cost=0.00..1.02 rows=1 width=25) (actual time=0.017..0.018 rows=1 loops=49)
-> Index Scan using people_personas_pkey on people_personas (cost=0.00..0.27 rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=50)
Index Cond: (people_personas.persona_id = "*SELECT* 1".id)
Total runtime: 14403.711 ms
重写的查询,只花费了 90 毫秒:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2830.93..2831.05 rows=50 width=108) (actual time=83.914..83.925 rows=50 loops=1)
-> Sort (cost=2830.93..2832.30 rows=551 width=108) (actual time=83.912..83.918 rows=50 loops=1)
Sort Key: xs.value
Sort Method: top-N heapsort Memory: 28kB
-> Hash Join (cost=875.60..2812.62 rows=551 width=108) (actual time=8.394..79.326 rows=10275 loops=1)
Hash Cond: ("*SELECT* 1".id = xs.persona_id)
-> Append (cost=588.41..2509.59 rows=551 width=4) (actual time=5.078..69.901 rows=10275 loops=1)
-> Subquery Scan "*SELECT* 1" (cost=588.41..1184.14 rows=200 width=4) (actual time=5.078..42.428 rows=10274 loops=1)
-> Nested Loop (cost=588.41..1182.14 rows=200 width=4) (actual time=5.078..40.220 rows=10274 loops=1)
-> HashAggregate (cost=588.41..590.41 rows=200 width=4) (actual time=5.066..7.900 rows=10275 loops=1)
-> Index Scan Backward using xs_value_index on xs xs (cost=0.00..459.97 rows=10275 width=12) (actual time=0.005..2.079 rows=10275 loops=1)
-> Index Scan using twitter_personas_id_index on twitter_personas (cost=0.00..2.95 rows=1 width=4) (actual time=0.002..0.003 rows=1 loops=10275)
Index Cond: (twitter_personas.id = xs.persona_id)
-> Subquery Scan "*SELECT* 2" (cost=588.41..649.27 rows=200 width=4) (actual time=13.017..13.017 rows=0 loops=1)
-> Nested Loop (cost=588.41..647.27 rows=200 width=4) (actual time=13.016..13.016 rows=0 loops=1)
-> HashAggregate (cost=588.41..590.41 rows=200 width=4) (actual time=5.267..6.909 rows=10275 loops=1)
-> Index Scan Backward using xs_value_index on xs xs (cost=0.00..459.97 rows=10275 width=12) (actual time=0.007..2.292 rows=10275 loops=1)
-> Index Scan using facebook_personas_id_index on facebook_personas (cost=0.00..0.27 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=10275)
Index Cond: (facebook_personas.id = xs.persona_id)
-> Subquery Scan "*SELECT* 3" (cost=588.41..648.77 rows=150 width=4) (actual time=12.568..12.568 rows=0 loops=1)
-> Nested Loop (cost=588.41..647.27 rows=150 width=4) (actual time=12.566..12.566 rows=0 loops=1)
-> HashAggregate (cost=588.41..590.41 rows=200 width=4) (actual time=5.015..6.538 rows=10275 loops=1)
-> Index Scan Backward using xs_value_index on xs xs (cost=0.00..459.97 rows=10275 width=12) (actual time=0.002..2.065 rows=10275 loops=1)
-> Index Scan using email_personas_id_index on email_personas (cost=0.00..0.27 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=10275)
Index Cond: (email_personas.id = xs.persona_id)
-> Subquery Scan "*SELECT* 4" (cost=0.00..27.41 rows=1 width=4) (actual time=0.629..0.630 rows=1 loops=1)
-> Nested Loop Semi Join (cost=0.00..27.40 rows=1 width=4) (actual time=0.628..0.628 rows=1 loops=1)
Join Filter: (web_personas.id = xs.persona_id)
-> Seq Scan on web_personas (cost=0.00..1.01 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=1)
-> Index Scan Backward using xs_value_index on xs xs (cost=0.00..459.97 rows=10275 width=12) (actual time=0.002..0.421 rows=1518 loops=1)
-> Hash (cost=158.75..158.75 rows=10275 width=12) (actual time=3.307..3.307 rows=10275 loops=1)
-> Seq Scan on xs xs (cost=0.00..158.75 rows=10275 width=12) (actual time=0.006..1.563 rows=10275 loops=1)
Total runtime: 84.066 ms
I'm JOINing the results of this query:
SELECT
twitter_personas.id
, 'TwitterPersona'
, twitter_personas.name
FROM twitter_personas
UNION ALL
SELECT
facebook_personas.id
, 'FacebookPersona'
, facebook_personas.name
-- and more UNION ALL statements pertaining to my other services
to a scoring table. The JOIN itself isn't the problem, but the query plan is "wrong": PostgreSQL finds the top 50 scores, then JOINs to the full view above, meaning it's doing a whole lot of work for nothing, since we're only interested in the top 50. But 50 is a variable - it might change (depends on UI concerns, and might be paginated at some point, yada, yada).
I made the query very fast by limiting my result set directly in the subquery:
SELECT
personas.id
, personas.type
, personas.name
, xs.value
FROM (
SELECT
twitter_personas.id
, 'TwitterPersona'
, twitter_personas.name
FROM twitter_personas
WHERE id IN (
SELECT persona_id
FROM xs
ORDER BY
xs.value DESC
LIMIT 50)
UNION ALL
SELECT
facebook_personas.id
, 'FacebookPersona'
, facebook_personas.name
FROM facebook_personas
WHERE id IN (
SELECT persona_id
FROM xs
ORDER BY
xs.value DESC
LIMIT 50)) AS personas(id, type, name)
INNER JOIN xs ON xs.persona_id = personas.id
ORDER BY
xs.value DESC
LIMIT 50
My question is how can I distribute the 50 above from the outer query to the inner query? This query executes very fast (90ms) compared to the original merge to the full result set of the UNION ALL, which executes in 15 seconds. Maybe there's an even better way to do this?
Here are, for reference purposes, my query plans. First, the "bad" one, taking nearly 15 seconds:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..31072.27 rows=50 width=176) (actual time=304.299..14403.551 rows=50 loops=1)
-> Subquery Scan personas_ranked (cost=0.00..253116556.67 rows=407303 width=176) (actual time=304.298..14403.511 rows=50 loops=1)
-> Nested Loop Left Join (cost=0.00..253112483.64 rows=407303 width=112) (actual time=304.297..14403.474 rows=50 loops=1)
-> Nested Loop (cost=0.00..252998394.22 rows=407303 width=108) (actual time=304.283..14402.815 rows=50 loops=1)
Join Filter: ("*SELECT* 1".id = xs.persona_id)
-> Index Scan Backward using xs_value_index on xs xs (cost=0.00..459.97 rows=10275 width=12) (actual time=0.013..0.208 rows=50 loops=1)
-> Append (cost=0.00..15458.35 rows=407303 width=88) (actual time=0.006..244.217 rows=398435 loops=50)
-> Subquery Scan "*SELECT* 1" (cost=0.00..15420.65 rows=406562 width=88) (actual time=0.006..199.945 rows=398434 loops=50)
-> Seq Scan on twitter_personas (cost=0.00..11355.02 rows=406562 width=88) (actual time=0.005..134.607 rows=398434 loops=50)
-> Subquery Scan "*SELECT* 2" (cost=0.00..14.88 rows=150 width=502) (actual time=0.002..0.002 rows=0 loops=49)
-> Seq Scan on email_personas (cost=0.00..13.38 rows=150 width=502) (actual time=0.001..0.001 rows=0 loops=49)
-> Subquery Scan "*SELECT* 3" (cost=0.00..21.80 rows=590 width=100) (actual time=0.001..0.001 rows=0 loops=49)
-> Seq Scan on facebook_personas (cost=0.00..15.90 rows=590 width=100) (actual time=0.001..0.001 rows=0 loops=49)
-> Subquery Scan "*SELECT* 4" (cost=0.00..1.03 rows=1 width=25) (actual time=0.018..0.019 rows=1 loops=49)
-> Seq Scan on web_personas (cost=0.00..1.02 rows=1 width=25) (actual time=0.017..0.018 rows=1 loops=49)
-> Index Scan using people_personas_pkey on people_personas (cost=0.00..0.27 rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=50)
Index Cond: (people_personas.persona_id = "*SELECT* 1".id)
Total runtime: 14403.711 ms
The rewritten query, taking only 90 ms:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2830.93..2831.05 rows=50 width=108) (actual time=83.914..83.925 rows=50 loops=1)
-> Sort (cost=2830.93..2832.30 rows=551 width=108) (actual time=83.912..83.918 rows=50 loops=1)
Sort Key: xs.value
Sort Method: top-N heapsort Memory: 28kB
-> Hash Join (cost=875.60..2812.62 rows=551 width=108) (actual time=8.394..79.326 rows=10275 loops=1)
Hash Cond: ("*SELECT* 1".id = xs.persona_id)
-> Append (cost=588.41..2509.59 rows=551 width=4) (actual time=5.078..69.901 rows=10275 loops=1)
-> Subquery Scan "*SELECT* 1" (cost=588.41..1184.14 rows=200 width=4) (actual time=5.078..42.428 rows=10274 loops=1)
-> Nested Loop (cost=588.41..1182.14 rows=200 width=4) (actual time=5.078..40.220 rows=10274 loops=1)
-> HashAggregate (cost=588.41..590.41 rows=200 width=4) (actual time=5.066..7.900 rows=10275 loops=1)
-> Index Scan Backward using xs_value_index on xs xs (cost=0.00..459.97 rows=10275 width=12) (actual time=0.005..2.079 rows=10275 loops=1)
-> Index Scan using twitter_personas_id_index on twitter_personas (cost=0.00..2.95 rows=1 width=4) (actual time=0.002..0.003 rows=1 loops=10275)
Index Cond: (twitter_personas.id = xs.persona_id)
-> Subquery Scan "*SELECT* 2" (cost=588.41..649.27 rows=200 width=4) (actual time=13.017..13.017 rows=0 loops=1)
-> Nested Loop (cost=588.41..647.27 rows=200 width=4) (actual time=13.016..13.016 rows=0 loops=1)
-> HashAggregate (cost=588.41..590.41 rows=200 width=4) (actual time=5.267..6.909 rows=10275 loops=1)
-> Index Scan Backward using xs_value_index on xs xs (cost=0.00..459.97 rows=10275 width=12) (actual time=0.007..2.292 rows=10275 loops=1)
-> Index Scan using facebook_personas_id_index on facebook_personas (cost=0.00..0.27 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=10275)
Index Cond: (facebook_personas.id = xs.persona_id)
-> Subquery Scan "*SELECT* 3" (cost=588.41..648.77 rows=150 width=4) (actual time=12.568..12.568 rows=0 loops=1)
-> Nested Loop (cost=588.41..647.27 rows=150 width=4) (actual time=12.566..12.566 rows=0 loops=1)
-> HashAggregate (cost=588.41..590.41 rows=200 width=4) (actual time=5.015..6.538 rows=10275 loops=1)
-> Index Scan Backward using xs_value_index on xs xs (cost=0.00..459.97 rows=10275 width=12) (actual time=0.002..2.065 rows=10275 loops=1)
-> Index Scan using email_personas_id_index on email_personas (cost=0.00..0.27 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=10275)
Index Cond: (email_personas.id = xs.persona_id)
-> Subquery Scan "*SELECT* 4" (cost=0.00..27.41 rows=1 width=4) (actual time=0.629..0.630 rows=1 loops=1)
-> Nested Loop Semi Join (cost=0.00..27.40 rows=1 width=4) (actual time=0.628..0.628 rows=1 loops=1)
Join Filter: (web_personas.id = xs.persona_id)
-> Seq Scan on web_personas (cost=0.00..1.01 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=1)
-> Index Scan Backward using xs_value_index on xs xs (cost=0.00..459.97 rows=10275 width=12) (actual time=0.002..0.421 rows=1518 loops=1)
-> Hash (cost=158.75..158.75 rows=10275 width=12) (actual time=3.307..3.307 rows=10275 loops=1)
-> Seq Scan on xs xs (cost=0.00..158.75 rows=10275 width=12) (actual time=0.006..1.563 rows=10275 loops=1)
Total runtime: 84.066 ms
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这不起作用的原因是 ORDER BY xs.value DESC 是在限制之前处理的,并且为了知道前(或最后)50 个条目,它必须(逻辑上)首先计算所有条目。如果你将限制放在联盟的分支中,你只能获得那些已经在其角色类型前 50 名之内的人中的前 50 个条目,这可能会有所不同。如果您可以接受,您可以像以前那样手动重写查询,但数据库系统无法为您执行此操作。
The reason why this wouldn't work is that the
ORDER BY xs.value DESC
is processed before the limit, and in order to know the first (or last) 50 entries, it has to (logically) calculate all the entries first. If you put the limits in the branches of the union, you only get the first 50 entries among those who were already within the top 50 of their persona kind, which might be different. If that is acceptable to you, you can manually rewrite the query as you did, but the database system can't do that for you.规划器必须将 xs 中的所有行连接到 UNION 中的每个表,因为规划器无法提前知道连接不会影响结果数据集(这可能会影响哪些行位于前 50 名中) )。
您可以像这样使用临时表执行两步操作吗?
The planner has to join all the rows in xs to each table in the UNION, because there's no way for the planner to know in advance that the join won't affect the resulting data set (which could affect which rows are in the top 50).
Can you do a two-step with a temporary table, like this?