是否可以将 LIMIT 子句分配给子查询？

发布于 2024-11-01 19:46:15 字数 8317 浏览 0 评论 0原文

我正在将此查询的结果加入

  SELECT
      twitter_personas.id
    , 'TwitterPersona'
    , twitter_personas.name
  FROM twitter_personas
UNION ALL
  SELECT
      facebook_personas.id
    , 'FacebookPersona'
    , facebook_personas.name
-- and more UNION ALL statements pertaining to my other services

到评分表中。 JOIN 本身不是问题，但查询计划是“错误的”：PostgreSQL 找到前 50 个分数，然后 JOIN 到上面的完整视图，这意味着它徒劳地做了很多工作，因为我们只感兴趣在前 50 名中。但 50 是一个变量 - 它可能会改变（取决于 UI 问题，并且可能在某个时候分页，yada，yada）。

我通过直接在子查询中限制结果集来使查询速度非常快：

SELECT
    personas.id
  , personas.type
  , personas.name
  , xs.value
FROM (
  SELECT
      twitter_personas.id
    , 'TwitterPersona'
    , twitter_personas.name
  FROM twitter_personas
  WHERE id IN (
    SELECT persona_id
    FROM xs
    ORDER BY
      xs.value DESC
    LIMIT 50)
UNION ALL
  SELECT
      facebook_personas.id
    , 'FacebookPersona'
    , facebook_personas.name
  FROM facebook_personas
  WHERE id IN (
    SELECT persona_id
    FROM xs
    ORDER BY
      xs.value DESC
    LIMIT 50)) AS personas(id, type, name)
  INNER JOIN xs ON xs.persona_id = personas.id
ORDER BY
  xs.value DESC
LIMIT 50

我的问题是如何将上面的 50 从外部查询分配到内部查询？与原始合并到 UNION ALL 的完整结果集相比，此查询执行速度非常快（90 毫秒），后者在 15 秒内执行。也许有更好的方法来做到这一点？

以下是我的查询计划，仅供参考。首先，“糟糕”的一个，花费了近 15 秒：

                                                                                QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.00..31072.27 rows=50 width=176) (actual time=304.299..14403.551 rows=50 loops=1)
  ->  Subquery Scan personas_ranked  (cost=0.00..253116556.67 rows=407303 width=176) (actual time=304.298..14403.511 rows=50 loops=1)
        ->  Nested Loop Left Join  (cost=0.00..253112483.64 rows=407303 width=112) (actual time=304.297..14403.474 rows=50 loops=1)
              ->  Nested Loop  (cost=0.00..252998394.22 rows=407303 width=108) (actual time=304.283..14402.815 rows=50 loops=1)
                    Join Filter: ("*SELECT* 1".id = xs.persona_id)
                    ->  Index Scan Backward using xs_value_index on xs xs  (cost=0.00..459.97 rows=10275 width=12) (actual time=0.013..0.208 rows=50 loops=1)
                    ->  Append  (cost=0.00..15458.35 rows=407303 width=88) (actual time=0.006..244.217 rows=398435 loops=50)
                          ->  Subquery Scan "*SELECT* 1"  (cost=0.00..15420.65 rows=406562 width=88) (actual time=0.006..199.945 rows=398434 loops=50)
                                ->  Seq Scan on twitter_personas  (cost=0.00..11355.02 rows=406562 width=88) (actual time=0.005..134.607 rows=398434 loops=50)
                          ->  Subquery Scan "*SELECT* 2"  (cost=0.00..14.88 rows=150 width=502) (actual time=0.002..0.002 rows=0 loops=49)
                                ->  Seq Scan on email_personas  (cost=0.00..13.38 rows=150 width=502) (actual time=0.001..0.001 rows=0 loops=49)
                          ->  Subquery Scan "*SELECT* 3"  (cost=0.00..21.80 rows=590 width=100) (actual time=0.001..0.001 rows=0 loops=49)
                                ->  Seq Scan on facebook_personas  (cost=0.00..15.90 rows=590 width=100) (actual time=0.001..0.001 rows=0 loops=49)
                          ->  Subquery Scan "*SELECT* 4"  (cost=0.00..1.03 rows=1 width=25) (actual time=0.018..0.019 rows=1 loops=49)
                                ->  Seq Scan on web_personas  (cost=0.00..1.02 rows=1 width=25) (actual time=0.017..0.018 rows=1 loops=49)
              ->  Index Scan using people_personas_pkey on people_personas  (cost=0.00..0.27 rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=50)
                    Index Cond: (people_personas.persona_id = "*SELECT* 1".id)
Total runtime: 14403.711 ms

重写的查询，只花费了 90 毫秒：

                                                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=2830.93..2831.05 rows=50 width=108) (actual time=83.914..83.925 rows=50 loops=1)
  ->  Sort  (cost=2830.93..2832.30 rows=551 width=108) (actual time=83.912..83.918 rows=50 loops=1)
        Sort Key: xs.value
        Sort Method:  top-N heapsort  Memory: 28kB
        ->  Hash Join  (cost=875.60..2812.62 rows=551 width=108) (actual time=8.394..79.326 rows=10275 loops=1)
              Hash Cond: ("*SELECT* 1".id = xs.persona_id)
              ->  Append  (cost=588.41..2509.59 rows=551 width=4) (actual time=5.078..69.901 rows=10275 loops=1)
                    ->  Subquery Scan "*SELECT* 1"  (cost=588.41..1184.14 rows=200 width=4) (actual time=5.078..42.428 rows=10274 loops=1)
                          ->  Nested Loop  (cost=588.41..1182.14 rows=200 width=4) (actual time=5.078..40.220 rows=10274 loops=1)
                                ->  HashAggregate  (cost=588.41..590.41 rows=200 width=4) (actual time=5.066..7.900 rows=10275 loops=1)
                                      ->  Index Scan Backward using xs_value_index on xs xs  (cost=0.00..459.97 rows=10275 width=12) (actual time=0.005..2.079 rows=10275 loops=1)
                                ->  Index Scan using twitter_personas_id_index on twitter_personas  (cost=0.00..2.95 rows=1 width=4) (actual time=0.002..0.003 rows=1 loops=10275)
                                      Index Cond: (twitter_personas.id = xs.persona_id)
                    ->  Subquery Scan "*SELECT* 2"  (cost=588.41..649.27 rows=200 width=4) (actual time=13.017..13.017 rows=0 loops=1)
                          ->  Nested Loop  (cost=588.41..647.27 rows=200 width=4) (actual time=13.016..13.016 rows=0 loops=1)
                                ->  HashAggregate  (cost=588.41..590.41 rows=200 width=4) (actual time=5.267..6.909 rows=10275 loops=1)
                                      ->  Index Scan Backward using xs_value_index on xs xs  (cost=0.00..459.97 rows=10275 width=12) (actual time=0.007..2.292 rows=10275 loops=1)
                                ->  Index Scan using facebook_personas_id_index on facebook_personas  (cost=0.00..0.27 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=10275)
                                      Index Cond: (facebook_personas.id = xs.persona_id)
                    ->  Subquery Scan "*SELECT* 3"  (cost=588.41..648.77 rows=150 width=4) (actual time=12.568..12.568 rows=0 loops=1)
                          ->  Nested Loop  (cost=588.41..647.27 rows=150 width=4) (actual time=12.566..12.566 rows=0 loops=1)
                                ->  HashAggregate  (cost=588.41..590.41 rows=200 width=4) (actual time=5.015..6.538 rows=10275 loops=1)
                                      ->  Index Scan Backward using xs_value_index on xs xs  (cost=0.00..459.97 rows=10275 width=12) (actual time=0.002..2.065 rows=10275 loops=1)
                                ->  Index Scan using email_personas_id_index on email_personas  (cost=0.00..0.27 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=10275)
                                      Index Cond: (email_personas.id = xs.persona_id)
                    ->  Subquery Scan "*SELECT* 4"  (cost=0.00..27.41 rows=1 width=4) (actual time=0.629..0.630 rows=1 loops=1)
                          ->  Nested Loop Semi Join  (cost=0.00..27.40 rows=1 width=4) (actual time=0.628..0.628 rows=1 loops=1)
                                Join Filter: (web_personas.id = xs.persona_id)
                                ->  Seq Scan on web_personas  (cost=0.00..1.01 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=1)
                                ->  Index Scan Backward using xs_value_index on xs xs  (cost=0.00..459.97 rows=10275 width=12) (actual time=0.002..0.421 rows=1518 loops=1)
              ->  Hash  (cost=158.75..158.75 rows=10275 width=12) (actual time=3.307..3.307 rows=10275 loops=1)
                    ->  Seq Scan on xs xs  (cost=0.00..158.75 rows=10275 width=12) (actual time=0.006..1.563 rows=10275 loops=1)
Total runtime: 84.066 ms

原文

I'm JOINing the results of this query:

  SELECT
      twitter_personas.id
    , 'TwitterPersona'
    , twitter_personas.name
  FROM twitter_personas
UNION ALL
  SELECT
      facebook_personas.id
    , 'FacebookPersona'
    , facebook_personas.name
-- and more UNION ALL statements pertaining to my other services

to a scoring table. The JOIN itself isn't the problem, but the query plan is "wrong": PostgreSQL finds the top 50 scores, then JOINs to the full view above, meaning it's doing a whole lot of work for nothing, since we're only interested in the top 50. But 50 is a variable - it might change (depends on UI concerns, and might be paginated at some point, yada, yada).

I made the query very fast by limiting my result set directly in the subquery:

SELECT
    personas.id
  , personas.type
  , personas.name
  , xs.value
FROM (
  SELECT
      twitter_personas.id
    , 'TwitterPersona'
    , twitter_personas.name
  FROM twitter_personas
  WHERE id IN (
    SELECT persona_id
    FROM xs
    ORDER BY
      xs.value DESC
    LIMIT 50)
UNION ALL
  SELECT
      facebook_personas.id
    , 'FacebookPersona'
    , facebook_personas.name
  FROM facebook_personas
  WHERE id IN (
    SELECT persona_id
    FROM xs
    ORDER BY
      xs.value DESC
    LIMIT 50)) AS personas(id, type, name)
  INNER JOIN xs ON xs.persona_id = personas.id
ORDER BY
  xs.value DESC
LIMIT 50

My question is how can I distribute the 50 above from the outer query to the inner query? This query executes very fast (90ms) compared to the original merge to the full result set of the UNION ALL, which executes in 15 seconds. Maybe there's an even better way to do this?

Here are, for reference purposes, my query plans. First, the "bad" one, taking nearly 15 seconds:

                                                                                QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.00..31072.27 rows=50 width=176) (actual time=304.299..14403.551 rows=50 loops=1)
  ->  Subquery Scan personas_ranked  (cost=0.00..253116556.67 rows=407303 width=176) (actual time=304.298..14403.511 rows=50 loops=1)
        ->  Nested Loop Left Join  (cost=0.00..253112483.64 rows=407303 width=112) (actual time=304.297..14403.474 rows=50 loops=1)
              ->  Nested Loop  (cost=0.00..252998394.22 rows=407303 width=108) (actual time=304.283..14402.815 rows=50 loops=1)
                    Join Filter: ("*SELECT* 1".id = xs.persona_id)
                    ->  Index Scan Backward using xs_value_index on xs xs  (cost=0.00..459.97 rows=10275 width=12) (actual time=0.013..0.208 rows=50 loops=1)
                    ->  Append  (cost=0.00..15458.35 rows=407303 width=88) (actual time=0.006..244.217 rows=398435 loops=50)
                          ->  Subquery Scan "*SELECT* 1"  (cost=0.00..15420.65 rows=406562 width=88) (actual time=0.006..199.945 rows=398434 loops=50)
                                ->  Seq Scan on twitter_personas  (cost=0.00..11355.02 rows=406562 width=88) (actual time=0.005..134.607 rows=398434 loops=50)
                          ->  Subquery Scan "*SELECT* 2"  (cost=0.00..14.88 rows=150 width=502) (actual time=0.002..0.002 rows=0 loops=49)
                                ->  Seq Scan on email_personas  (cost=0.00..13.38 rows=150 width=502) (actual time=0.001..0.001 rows=0 loops=49)
                          ->  Subquery Scan "*SELECT* 3"  (cost=0.00..21.80 rows=590 width=100) (actual time=0.001..0.001 rows=0 loops=49)
                                ->  Seq Scan on facebook_personas  (cost=0.00..15.90 rows=590 width=100) (actual time=0.001..0.001 rows=0 loops=49)
                          ->  Subquery Scan "*SELECT* 4"  (cost=0.00..1.03 rows=1 width=25) (actual time=0.018..0.019 rows=1 loops=49)
                                ->  Seq Scan on web_personas  (cost=0.00..1.02 rows=1 width=25) (actual time=0.017..0.018 rows=1 loops=49)
              ->  Index Scan using people_personas_pkey on people_personas  (cost=0.00..0.27 rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=50)
                    Index Cond: (people_personas.persona_id = "*SELECT* 1".id)
Total runtime: 14403.711 ms

The rewritten query, taking only 90 ms:

                                                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=2830.93..2831.05 rows=50 width=108) (actual time=83.914..83.925 rows=50 loops=1)
  ->  Sort  (cost=2830.93..2832.30 rows=551 width=108) (actual time=83.912..83.918 rows=50 loops=1)
        Sort Key: xs.value
        Sort Method:  top-N heapsort  Memory: 28kB
        ->  Hash Join  (cost=875.60..2812.62 rows=551 width=108) (actual time=8.394..79.326 rows=10275 loops=1)
              Hash Cond: ("*SELECT* 1".id = xs.persona_id)
              ->  Append  (cost=588.41..2509.59 rows=551 width=4) (actual time=5.078..69.901 rows=10275 loops=1)
                    ->  Subquery Scan "*SELECT* 1"  (cost=588.41..1184.14 rows=200 width=4) (actual time=5.078..42.428 rows=10274 loops=1)
                          ->  Nested Loop  (cost=588.41..1182.14 rows=200 width=4) (actual time=5.078..40.220 rows=10274 loops=1)
                                ->  HashAggregate  (cost=588.41..590.41 rows=200 width=4) (actual time=5.066..7.900 rows=10275 loops=1)
                                      ->  Index Scan Backward using xs_value_index on xs xs  (cost=0.00..459.97 rows=10275 width=12) (actual time=0.005..2.079 rows=10275 loops=1)
                                ->  Index Scan using twitter_personas_id_index on twitter_personas  (cost=0.00..2.95 rows=1 width=4) (actual time=0.002..0.003 rows=1 loops=10275)
                                      Index Cond: (twitter_personas.id = xs.persona_id)
                    ->  Subquery Scan "*SELECT* 2"  (cost=588.41..649.27 rows=200 width=4) (actual time=13.017..13.017 rows=0 loops=1)
                          ->  Nested Loop  (cost=588.41..647.27 rows=200 width=4) (actual time=13.016..13.016 rows=0 loops=1)
                                ->  HashAggregate  (cost=588.41..590.41 rows=200 width=4) (actual time=5.267..6.909 rows=10275 loops=1)
                                      ->  Index Scan Backward using xs_value_index on xs xs  (cost=0.00..459.97 rows=10275 width=12) (actual time=0.007..2.292 rows=10275 loops=1)
                                ->  Index Scan using facebook_personas_id_index on facebook_personas  (cost=0.00..0.27 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=10275)
                                      Index Cond: (facebook_personas.id = xs.persona_id)
                    ->  Subquery Scan "*SELECT* 3"  (cost=588.41..648.77 rows=150 width=4) (actual time=12.568..12.568 rows=0 loops=1)
                          ->  Nested Loop  (cost=588.41..647.27 rows=150 width=4) (actual time=12.566..12.566 rows=0 loops=1)
                                ->  HashAggregate  (cost=588.41..590.41 rows=200 width=4) (actual time=5.015..6.538 rows=10275 loops=1)
                                      ->  Index Scan Backward using xs_value_index on xs xs  (cost=0.00..459.97 rows=10275 width=12) (actual time=0.002..2.065 rows=10275 loops=1)
                                ->  Index Scan using email_personas_id_index on email_personas  (cost=0.00..0.27 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=10275)
                                      Index Cond: (email_personas.id = xs.persona_id)
                    ->  Subquery Scan "*SELECT* 4"  (cost=0.00..27.41 rows=1 width=4) (actual time=0.629..0.630 rows=1 loops=1)
                          ->  Nested Loop Semi Join  (cost=0.00..27.40 rows=1 width=4) (actual time=0.628..0.628 rows=1 loops=1)
                                Join Filter: (web_personas.id = xs.persona_id)
                                ->  Seq Scan on web_personas  (cost=0.00..1.01 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=1)
                                ->  Index Scan Backward using xs_value_index on xs xs  (cost=0.00..459.97 rows=10275 width=12) (actual time=0.002..0.421 rows=1518 loops=1)
              ->  Hash  (cost=158.75..158.75 rows=10275 width=12) (actual time=3.307..3.307 rows=10275 loops=1)
                    ->  Seq Scan on xs xs  (cost=0.00..158.75 rows=10275 width=12) (actual time=0.006..1.563 rows=10275 loops=1)
Total runtime: 84.066 ms

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

为人所爱 2024-11-08 19:46:15

这不起作用的原因是 ORDER BY xs.value DESC 是在限制之前处理的，并且为了知道前（或最后）50 个条目，它必须（逻辑上）首先计算所有条目。如果你将限制放在联盟的分支中，你只能获得那些已经在其角色类型前 50 名之内的人中的前 50 个条目，这可能会有所不同。如果您可以接受，您可以像以前那样手动重写查询，但数据库系统无法为您执行此操作。

回复收藏 0 原文

亽野灬性zι浪 2024-11-08 19:46:15

规划器必须将 xs 中的所有行连接到 UNION 中的每个表，因为规划器无法提前知道连接不会影响结果数据集（这可能会影响哪些行位于前 50 名中））。

您可以像这样使用临时表执行两步操作吗？

create temporary table top50 as
select xs.persona_id
, xs.value
from xs
order by value desc
limit 50;

select *
from top50
join personas_view on top50.persona_id = personas_view.id;

The planner has to join all the rows in xs to each table in the UNION, because there's no way for the planner to know in advance that the join won't affect the resulting data set (which could affect which rows are in the top 50).

Can you do a two-step with a temporary table, like this?

create temporary table top50 as
select xs.persona_id
, xs.value
from xs
order by value desc
limit 50;

select *
from top50
join personas_view on top50.persona_id = personas_view.id;

回复收藏 0 原文

~没有更多了~