为什么 postgresql planner 不使用索引扫描而不是在存储函数中使用显式排序？

发布于 2024-12-01 12:26:16 字数 8779 浏览 1 评论 0原文

我有一个使用 ORDER BY 子句执行查询的函数。当我调用这个函数时，postgres 会卡住，但是当我使用传递给函数的值执行查询时，它会立即响应。该函数如下所示：

CREATE OR REPLACE FUNCTION get_all_synset_tags_by_name(
    lim   int,
    offs  int
) RETURNS SETOF ctags AS $$
        SELECT concepts.id, expressions.name,
            array_to_string(
                array(
                    SELECT descr FROM ctags WHERE id IN (SELECT unnest(
                        concept_get_expr_synonyms(concepts.id)
                ))),'; '
            ), (
                SELECT sum(freq) FROM ctags WHERE id IN (SELECT unnest(
                    concept_get_expr_synonyms(concepts.id)
            )))::integer
            FROM concepts,expressions
            WHERE
                concepts.is_dropped=FALSE AND expressions.is_dropped=FALSE AND expressions.id=concepts.expr_id AND
                concepts.id=(concept_get_expr_synonyms(concepts.id))[1]
            ORDER BY name
            LIMIT $1 OFFSET $2;
$$ language sql STABLE;

CREATE OR REPLACE FUNCTION get_top_tags_all_name(
    user_id int,
    lim   int,
    offs  int,
    at    bool,
    dt    bool,
    mt    bool
)
    RETURNS
        TABLE(
            id          int,
            name        text,
            descr       text,
            freq        int,
            foll_status bool,
            ad_perm     bool,
            rm_perm     bool,
            ed_perm     bool,
            n_folls     bigint,
            n_quests    bigint,
            n_opins     bigint

    ) AS $$
    SELECT ctags.id,ctags.name,ctags.descr,ctags.freq,
            (SELECT users_tags.is_ignored
                    FROM users_tags
                        WHERE users_tags.tag=ctags.id AND users_tags.follower=$1),
            $4,$5,$6,
            (SELECT count(tag) FROM users_tags
                WHERE users_tags.tag=ctags.id AND users_tags.is_ignored=FALSE),
            (SELECT count(question_tags.question_id) FROM question_tags
                LEFT JOIN questions ON question_tags.question_id=questions.id
                WHERE question_tags.tag_id=ctags.id AND questions.qtype='quest' ),
            (SELECT count(question_tags.question_id) FROM question_tags
                LEFT JOIN questions ON question_tags.question_id=questions.id
                WHERE question_tags.tag_id=ctags.id AND questions.qtype='opin' )
            FROM get_all_synset_tags_by_name($2,$3) AS ctags;
$$ language sql STRICT;

当我使用 EXPLAIN 从 get_top_tags_all_name 执行查询时，它会说：

qa=# EXPLAIN SELECT ctags.id,ctags.name,ctags.descr,ctags.freq,
qa-#             (SELECT users_tags.is_ignored
qa(#                     FROM users_tags
qa(#                         WHERE users_tags.tag=ctags.id AND users_tags.follower=1),
qa-#             FALSE,FALSE,FALSE,
qa-#             (SELECT count(tag) FROM users_tags
qa(#                 WHERE users_tags.tag=ctags.id AND users_tags.is_ignored=FALSE),
qa-#             (SELECT count(question_tags.question_id) FROM question_tags
qa(#                 LEFT JOIN questions ON question_tags.question_id=questions.id
qa(#                 WHERE question_tags.tag_id=ctags.id AND questions.qtype='quest' ),
qa-#             (SELECT count(question_tags.question_id) FROM question_tags
qa(#                 LEFT JOIN questions ON question_tags.question_id=questions.id
qa(#                 WHERE question_tags.tag_id=ctags.id AND questions.qtype='opin' )
qa-#             FROM get_all_synset_tags_by_name(10,0) AS ctags;
                                                                  QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan ctags  (cost=0.00..21323.00 rows=10 width=72)
   ->  Limit  (cost=0.00..20434.38 rows=10 width=36)
         ->  Nested Loop  (cost=0.00..19645616.99 rows=9614 width=36)
               ->  Index Scan using expressions_name_key on expressions  (cost=0.00..996449.61 rows=1504967 width=36)
                     Filter: (NOT is_dropped)
               ->  Index Scan using concepts_expr_id_idx on concepts  (cost=0.00..12.08 rows=1 width=8)
                     Index Cond: (public.concepts.expr_id = public.expressions.id)
                     Filter: ((NOT public.concepts.is_dropped) AND (public.concepts.id = (concept_get_expr_synonyms(public.concepts.id))[1]))
               SubPlan 5
                 ->  Nested Loop  (cost=0.28..23.65 rows=1 width=43)
                       ->  Nested Loop  (cost=0.28..17.63 rows=1 width=47)
                             ->  Nested Loop  (cost=0.28..8.96 rows=1 width=8)
                                   ->  HashAggregate  (cost=0.28..0.29 rows=1 width=4)
                                         ->  Result  (cost=0.00..0.26 rows=1 width=0)
                                   ->  Index Scan using tags_conc_id_key on tags  (cost=0.00..8.66 rows=1 width=4)
                                         Index Cond: (public.tags.conc_id = (unnest(concept_get_expr_synonyms($1))))
                             ->  Index Scan using concepts_id_key1 on concepts  (cost=0.00..8.66 rows=1 width=51)
                                   Index Cond: (public.concepts.id = public.tags.conc_id)
                       ->  Index Scan using expressions_pkey on expressions  (cost=0.00..6.01 rows=1 width=4)
                             Index Cond: (public.expressions.id = public.concepts.expr_id)
               SubPlan 6
                 ->  Aggregate  (cost=23.66..23.67 rows=1 width=4)
                       ->  Nested Loop  (cost=0.28..23.65 rows=1 width=4)
                             ->  Nested Loop  (cost=0.28..17.63 rows=1 width=8)
                                   ->  Nested Loop  (cost=0.28..8.96 rows=1 width=12)
                                         ->  HashAggregate  (cost=0.28..0.29 rows=1 width=4)
                                               ->  Result  (cost=0.00..0.26 rows=1 width=0)
                                         ->  Index Scan using tags_conc_id_key on tags  (cost=0.00..8.66 rows=1 width=8)
                                               Index Cond: (public.tags.conc_id = (unnest(concept_get_expr_synonyms($1))))
                                   ->  Index Scan using concepts_id_key1 on concepts  (cost=0.00..8.66 rows=1 width=8)
                                         Index Cond: (public.concepts.id = public.tags.conc_id)
                             ->  Index Scan using expressions_pkey on expressions  (cost=0.00..6.01 rows=1 width=4)
                                   Index Cond: (public.expressions.id = public.concepts.expr_id)
   SubPlan 1
     ->  Index Scan using users_tags_tag_key on users_tags  (cost=0.00..8.27 rows=1 width=1)
           Index Cond: ((tag = $0) AND (follower = 1))
   SubPlan 2
     ->  Aggregate  (cost=14.90..14.91 rows=1 width=4)
           ->  Bitmap Heap Scan on users_tags  (cost=4.33..14.88 rows=5 width=4)
                 Recheck Cond: (tag = $0)
                 Filter: (NOT is_ignored)
                 ->  Bitmap Index Scan on users_tags_tag_key  (cost=0.00..4.33 rows=10 width=0)
                       Index Cond: (tag = $0)
   SubPlan 3
     ->  Aggregate  (cost=32.83..32.84 rows=1 width=4)
           ->  Nested Loop  (cost=0.00..32.82 rows=2 width=4)
                 ->  Seq Scan on questions  (cost=0.00..16.25 rows=2 width=4)
                       Filter: (qtype = 'quest'::quest_type)
                 ->  Index Scan using question_tags_question_id_key on question_tags  (cost=0.00..8.27 rows=1 width=4)
                       Index Cond: ((public.question_tags.question_id = public.questions.id) AND (public.question_tags.tag_id = $0))
   SubPlan 4
     ->  Aggregate  (cost=32.83..32.84 rows=1 width=4)
           ->  Nested Loop  (cost=0.00..32.82 rows=2 width=4)
                 ->  Seq Scan on questions  (cost=0.00..16.25 rows=2 width=4)
                       Filter: (qtype = 'opin'::quest_type)
                 ->  Index Scan using question_tags_question_id_key on question_tags  (cost=0.00..8.27 rows=1 width=4)
                       Index Cond: ((public.question_tags.question_id = public.questions.id) AND (public.question_tags.tag_id = $0))
(57 rows)

因此，当我调用此 select 时，它会立即响应。但是当我调用该函数时，它被卡住了：（

SELECT * FROM get_top_tags_all_name(1,10,0,FALSE,FALSE,FALSE);
^CCancel request sent
ERROR:  canceling statement due to user request
CONTEXT:  PL/pgSQL function "concept_get_expr_synonyms" line 6 at assignment
SQL function "get_top_tags_all_name" statement 1

我不得不取消这个请求，因为它思考了超过5分钟）

所以我认为查询规划器当时并不依赖索引。任何帮助将不胜感激。

原文

I have a function that executes a query with ORDER BY clause. When I call this function, postgres gets stuck, but when I execute the query with the values I pass to the function, it responds immediately.
The function looks like:

CREATE OR REPLACE FUNCTION get_all_synset_tags_by_name(
    lim   int,
    offs  int
) RETURNS SETOF ctags AS $
        SELECT concepts.id, expressions.name,
            array_to_string(
                array(
                    SELECT descr FROM ctags WHERE id IN (SELECT unnest(
                        concept_get_expr_synonyms(concepts.id)
                ))),'; '
            ), (
                SELECT sum(freq) FROM ctags WHERE id IN (SELECT unnest(
                    concept_get_expr_synonyms(concepts.id)
            )))::integer
            FROM concepts,expressions
            WHERE
                concepts.is_dropped=FALSE AND expressions.is_dropped=FALSE AND expressions.id=concepts.expr_id AND
                concepts.id=(concept_get_expr_synonyms(concepts.id))[1]
            ORDER BY name
            LIMIT $1 OFFSET $2;
$ language sql STABLE;

CREATE OR REPLACE FUNCTION get_top_tags_all_name(
    user_id int,
    lim   int,
    offs  int,
    at    bool,
    dt    bool,
    mt    bool
)
    RETURNS
        TABLE(
            id          int,
            name        text,
            descr       text,
            freq        int,
            foll_status bool,
            ad_perm     bool,
            rm_perm     bool,
            ed_perm     bool,
            n_folls     bigint,
            n_quests    bigint,
            n_opins     bigint

    ) AS $
    SELECT ctags.id,ctags.name,ctags.descr,ctags.freq,
            (SELECT users_tags.is_ignored
                    FROM users_tags
                        WHERE users_tags.tag=ctags.id AND users_tags.follower=$1),
            $4,$5,$6,
            (SELECT count(tag) FROM users_tags
                WHERE users_tags.tag=ctags.id AND users_tags.is_ignored=FALSE),
            (SELECT count(question_tags.question_id) FROM question_tags
                LEFT JOIN questions ON question_tags.question_id=questions.id
                WHERE question_tags.tag_id=ctags.id AND questions.qtype='quest' ),
            (SELECT count(question_tags.question_id) FROM question_tags
                LEFT JOIN questions ON question_tags.question_id=questions.id
                WHERE question_tags.tag_id=ctags.id AND questions.qtype='opin' )
            FROM get_all_synset_tags_by_name($2,$3) AS ctags;
$ language sql STRICT;

When I execute the query from get_top_tags_all_name with EXPLAIN, it says:

qa=# EXPLAIN SELECT ctags.id,ctags.name,ctags.descr,ctags.freq,
qa-#             (SELECT users_tags.is_ignored
qa(#                     FROM users_tags
qa(#                         WHERE users_tags.tag=ctags.id AND users_tags.follower=1),
qa-#             FALSE,FALSE,FALSE,
qa-#             (SELECT count(tag) FROM users_tags
qa(#                 WHERE users_tags.tag=ctags.id AND users_tags.is_ignored=FALSE),
qa-#             (SELECT count(question_tags.question_id) FROM question_tags
qa(#                 LEFT JOIN questions ON question_tags.question_id=questions.id
qa(#                 WHERE question_tags.tag_id=ctags.id AND questions.qtype='quest' ),
qa-#             (SELECT count(question_tags.question_id) FROM question_tags
qa(#                 LEFT JOIN questions ON question_tags.question_id=questions.id
qa(#                 WHERE question_tags.tag_id=ctags.id AND questions.qtype='opin' )
qa-#             FROM get_all_synset_tags_by_name(10,0) AS ctags;
                                                                  QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan ctags  (cost=0.00..21323.00 rows=10 width=72)
   ->  Limit  (cost=0.00..20434.38 rows=10 width=36)
         ->  Nested Loop  (cost=0.00..19645616.99 rows=9614 width=36)
               ->  Index Scan using expressions_name_key on expressions  (cost=0.00..996449.61 rows=1504967 width=36)
                     Filter: (NOT is_dropped)
               ->  Index Scan using concepts_expr_id_idx on concepts  (cost=0.00..12.08 rows=1 width=8)
                     Index Cond: (public.concepts.expr_id = public.expressions.id)
                     Filter: ((NOT public.concepts.is_dropped) AND (public.concepts.id = (concept_get_expr_synonyms(public.concepts.id))[1]))
               SubPlan 5
                 ->  Nested Loop  (cost=0.28..23.65 rows=1 width=43)
                       ->  Nested Loop  (cost=0.28..17.63 rows=1 width=47)
                             ->  Nested Loop  (cost=0.28..8.96 rows=1 width=8)
                                   ->  HashAggregate  (cost=0.28..0.29 rows=1 width=4)
                                         ->  Result  (cost=0.00..0.26 rows=1 width=0)
                                   ->  Index Scan using tags_conc_id_key on tags  (cost=0.00..8.66 rows=1 width=4)
                                         Index Cond: (public.tags.conc_id = (unnest(concept_get_expr_synonyms($1))))
                             ->  Index Scan using concepts_id_key1 on concepts  (cost=0.00..8.66 rows=1 width=51)
                                   Index Cond: (public.concepts.id = public.tags.conc_id)
                       ->  Index Scan using expressions_pkey on expressions  (cost=0.00..6.01 rows=1 width=4)
                             Index Cond: (public.expressions.id = public.concepts.expr_id)
               SubPlan 6
                 ->  Aggregate  (cost=23.66..23.67 rows=1 width=4)
                       ->  Nested Loop  (cost=0.28..23.65 rows=1 width=4)
                             ->  Nested Loop  (cost=0.28..17.63 rows=1 width=8)
                                   ->  Nested Loop  (cost=0.28..8.96 rows=1 width=12)
                                         ->  HashAggregate  (cost=0.28..0.29 rows=1 width=4)
                                               ->  Result  (cost=0.00..0.26 rows=1 width=0)
                                         ->  Index Scan using tags_conc_id_key on tags  (cost=0.00..8.66 rows=1 width=8)
                                               Index Cond: (public.tags.conc_id = (unnest(concept_get_expr_synonyms($1))))
                                   ->  Index Scan using concepts_id_key1 on concepts  (cost=0.00..8.66 rows=1 width=8)
                                         Index Cond: (public.concepts.id = public.tags.conc_id)
                             ->  Index Scan using expressions_pkey on expressions  (cost=0.00..6.01 rows=1 width=4)
                                   Index Cond: (public.expressions.id = public.concepts.expr_id)
   SubPlan 1
     ->  Index Scan using users_tags_tag_key on users_tags  (cost=0.00..8.27 rows=1 width=1)
           Index Cond: ((tag = $0) AND (follower = 1))
   SubPlan 2
     ->  Aggregate  (cost=14.90..14.91 rows=1 width=4)
           ->  Bitmap Heap Scan on users_tags  (cost=4.33..14.88 rows=5 width=4)
                 Recheck Cond: (tag = $0)
                 Filter: (NOT is_ignored)
                 ->  Bitmap Index Scan on users_tags_tag_key  (cost=0.00..4.33 rows=10 width=0)
                       Index Cond: (tag = $0)
   SubPlan 3
     ->  Aggregate  (cost=32.83..32.84 rows=1 width=4)
           ->  Nested Loop  (cost=0.00..32.82 rows=2 width=4)
                 ->  Seq Scan on questions  (cost=0.00..16.25 rows=2 width=4)
                       Filter: (qtype = 'quest'::quest_type)
                 ->  Index Scan using question_tags_question_id_key on question_tags  (cost=0.00..8.27 rows=1 width=4)
                       Index Cond: ((public.question_tags.question_id = public.questions.id) AND (public.question_tags.tag_id = $0))
   SubPlan 4
     ->  Aggregate  (cost=32.83..32.84 rows=1 width=4)
           ->  Nested Loop  (cost=0.00..32.82 rows=2 width=4)
                 ->  Seq Scan on questions  (cost=0.00..16.25 rows=2 width=4)
                       Filter: (qtype = 'opin'::quest_type)
                 ->  Index Scan using question_tags_question_id_key on question_tags  (cost=0.00..8.27 rows=1 width=4)
                       Index Cond: ((public.question_tags.question_id = public.questions.id) AND (public.question_tags.tag_id = $0))
(57 rows)

So when I call this select it responds immediately.
But when I call the function, it gets stuck:

SELECT * FROM get_top_tags_all_name(1,10,0,FALSE,FALSE,FALSE);
^CCancel request sent
ERROR:  canceling statement due to user request
CONTEXT:  PL/pgSQL function "concept_get_expr_synonyms" line 6 at assignment
SQL function "get_top_tags_all_name" statement 1

(I had to cancel this request because it thinked more than 5 minutes)

So I suppose that the query planner didn't rely on the index that time. Any help would be appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

嘿咻 2024-12-08 12:26:16

我可以回答我自己的问题吗？

解决方案是

SET enable_sort=FALSE;

在调用函数之前（或在连接开始时）发出此指令，在其他方法可用时禁用显式排序。
默认情况下，postgresql 尝试通过显式对数据进行排序来优化小表上的大索引扫描。如果表很小，并且需要大部分数据，那么显式排序比索引扫描更快。问题是，就我而言，表很大，需要一小部分数据，但 postgresql 在准备我的函数时并不知道这一点。

May I answer my own question?

The solution is to

SET enable_sort=FALSE;

This directive, being issued before calling the functions (or at the beginning of the connection), disables explicit sorts in cases when other methods are available.
By default, postgresql tries to optimize big index scans on small tables by explicitly sorting the data. If the table is small, and the majority of data is needed, then explicit sort is faster than index scan. The thing is that in my case the tables are huge, and small parts of data are needed, but postgresql doesn't know it when it prepares my functions.

回复收藏 0 原文

断桥再见 2024-12-08 12:26:16

我认为问题在于 PG 为该函数制定了查询计划，而统计数据还不够好，无法优化。鉴于您的函数实际上只是一行 SELECT 语句，您可能会使用晦涩的 PREPARE 和 EXECUTE 做得更好。 org/docs/9.0/static/sql-prepare.html" rel="nofollow">命令。

回复收藏 0 原文

~没有更多了~