为什么 postgresql planner 不使用索引扫描而不是在存储函数中使用显式排序?
我有一个使用 ORDER BY 子句执行查询的函数。当我调用这个函数时,postgres 会卡住,但是当我使用传递给函数的值执行查询时,它会立即响应。 该函数如下所示:
CREATE OR REPLACE FUNCTION get_all_synset_tags_by_name(
lim int,
offs int
) RETURNS SETOF ctags AS $$
SELECT concepts.id, expressions.name,
array_to_string(
array(
SELECT descr FROM ctags WHERE id IN (SELECT unnest(
concept_get_expr_synonyms(concepts.id)
))),'; '
), (
SELECT sum(freq) FROM ctags WHERE id IN (SELECT unnest(
concept_get_expr_synonyms(concepts.id)
)))::integer
FROM concepts,expressions
WHERE
concepts.is_dropped=FALSE AND expressions.is_dropped=FALSE AND expressions.id=concepts.expr_id AND
concepts.id=(concept_get_expr_synonyms(concepts.id))[1]
ORDER BY name
LIMIT $1 OFFSET $2;
$$ language sql STABLE;
CREATE OR REPLACE FUNCTION get_top_tags_all_name(
user_id int,
lim int,
offs int,
at bool,
dt bool,
mt bool
)
RETURNS
TABLE(
id int,
name text,
descr text,
freq int,
foll_status bool,
ad_perm bool,
rm_perm bool,
ed_perm bool,
n_folls bigint,
n_quests bigint,
n_opins bigint
) AS $$
SELECT ctags.id,ctags.name,ctags.descr,ctags.freq,
(SELECT users_tags.is_ignored
FROM users_tags
WHERE users_tags.tag=ctags.id AND users_tags.follower=$1),
$4,$5,$6,
(SELECT count(tag) FROM users_tags
WHERE users_tags.tag=ctags.id AND users_tags.is_ignored=FALSE),
(SELECT count(question_tags.question_id) FROM question_tags
LEFT JOIN questions ON question_tags.question_id=questions.id
WHERE question_tags.tag_id=ctags.id AND questions.qtype='quest' ),
(SELECT count(question_tags.question_id) FROM question_tags
LEFT JOIN questions ON question_tags.question_id=questions.id
WHERE question_tags.tag_id=ctags.id AND questions.qtype='opin' )
FROM get_all_synset_tags_by_name($2,$3) AS ctags;
$$ language sql STRICT;
当我使用 EXPLAIN 从 get_top_tags_all_name 执行查询时,它会说:
qa=# EXPLAIN SELECT ctags.id,ctags.name,ctags.descr,ctags.freq,
qa-# (SELECT users_tags.is_ignored
qa(# FROM users_tags
qa(# WHERE users_tags.tag=ctags.id AND users_tags.follower=1),
qa-# FALSE,FALSE,FALSE,
qa-# (SELECT count(tag) FROM users_tags
qa(# WHERE users_tags.tag=ctags.id AND users_tags.is_ignored=FALSE),
qa-# (SELECT count(question_tags.question_id) FROM question_tags
qa(# LEFT JOIN questions ON question_tags.question_id=questions.id
qa(# WHERE question_tags.tag_id=ctags.id AND questions.qtype='quest' ),
qa-# (SELECT count(question_tags.question_id) FROM question_tags
qa(# LEFT JOIN questions ON question_tags.question_id=questions.id
qa(# WHERE question_tags.tag_id=ctags.id AND questions.qtype='opin' )
qa-# FROM get_all_synset_tags_by_name(10,0) AS ctags;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan ctags (cost=0.00..21323.00 rows=10 width=72)
-> Limit (cost=0.00..20434.38 rows=10 width=36)
-> Nested Loop (cost=0.00..19645616.99 rows=9614 width=36)
-> Index Scan using expressions_name_key on expressions (cost=0.00..996449.61 rows=1504967 width=36)
Filter: (NOT is_dropped)
-> Index Scan using concepts_expr_id_idx on concepts (cost=0.00..12.08 rows=1 width=8)
Index Cond: (public.concepts.expr_id = public.expressions.id)
Filter: ((NOT public.concepts.is_dropped) AND (public.concepts.id = (concept_get_expr_synonyms(public.concepts.id))[1]))
SubPlan 5
-> Nested Loop (cost=0.28..23.65 rows=1 width=43)
-> Nested Loop (cost=0.28..17.63 rows=1 width=47)
-> Nested Loop (cost=0.28..8.96 rows=1 width=8)
-> HashAggregate (cost=0.28..0.29 rows=1 width=4)
-> Result (cost=0.00..0.26 rows=1 width=0)
-> Index Scan using tags_conc_id_key on tags (cost=0.00..8.66 rows=1 width=4)
Index Cond: (public.tags.conc_id = (unnest(concept_get_expr_synonyms($1))))
-> Index Scan using concepts_id_key1 on concepts (cost=0.00..8.66 rows=1 width=51)
Index Cond: (public.concepts.id = public.tags.conc_id)
-> Index Scan using expressions_pkey on expressions (cost=0.00..6.01 rows=1 width=4)
Index Cond: (public.expressions.id = public.concepts.expr_id)
SubPlan 6
-> Aggregate (cost=23.66..23.67 rows=1 width=4)
-> Nested Loop (cost=0.28..23.65 rows=1 width=4)
-> Nested Loop (cost=0.28..17.63 rows=1 width=8)
-> Nested Loop (cost=0.28..8.96 rows=1 width=12)
-> HashAggregate (cost=0.28..0.29 rows=1 width=4)
-> Result (cost=0.00..0.26 rows=1 width=0)
-> Index Scan using tags_conc_id_key on tags (cost=0.00..8.66 rows=1 width=8)
Index Cond: (public.tags.conc_id = (unnest(concept_get_expr_synonyms($1))))
-> Index Scan using concepts_id_key1 on concepts (cost=0.00..8.66 rows=1 width=8)
Index Cond: (public.concepts.id = public.tags.conc_id)
-> Index Scan using expressions_pkey on expressions (cost=0.00..6.01 rows=1 width=4)
Index Cond: (public.expressions.id = public.concepts.expr_id)
SubPlan 1
-> Index Scan using users_tags_tag_key on users_tags (cost=0.00..8.27 rows=1 width=1)
Index Cond: ((tag = $0) AND (follower = 1))
SubPlan 2
-> Aggregate (cost=14.90..14.91 rows=1 width=4)
-> Bitmap Heap Scan on users_tags (cost=4.33..14.88 rows=5 width=4)
Recheck Cond: (tag = $0)
Filter: (NOT is_ignored)
-> Bitmap Index Scan on users_tags_tag_key (cost=0.00..4.33 rows=10 width=0)
Index Cond: (tag = $0)
SubPlan 3
-> Aggregate (cost=32.83..32.84 rows=1 width=4)
-> Nested Loop (cost=0.00..32.82 rows=2 width=4)
-> Seq Scan on questions (cost=0.00..16.25 rows=2 width=4)
Filter: (qtype = 'quest'::quest_type)
-> Index Scan using question_tags_question_id_key on question_tags (cost=0.00..8.27 rows=1 width=4)
Index Cond: ((public.question_tags.question_id = public.questions.id) AND (public.question_tags.tag_id = $0))
SubPlan 4
-> Aggregate (cost=32.83..32.84 rows=1 width=4)
-> Nested Loop (cost=0.00..32.82 rows=2 width=4)
-> Seq Scan on questions (cost=0.00..16.25 rows=2 width=4)
Filter: (qtype = 'opin'::quest_type)
-> Index Scan using question_tags_question_id_key on question_tags (cost=0.00..8.27 rows=1 width=4)
Index Cond: ((public.question_tags.question_id = public.questions.id) AND (public.question_tags.tag_id = $0))
(57 rows)
因此,当我调用此 select 时,它会立即响应。 但是当我调用该函数时,它被卡住了:(
SELECT * FROM get_top_tags_all_name(1,10,0,FALSE,FALSE,FALSE);
^CCancel request sent
ERROR: canceling statement due to user request
CONTEXT: PL/pgSQL function "concept_get_expr_synonyms" line 6 at assignment
SQL function "get_top_tags_all_name" statement 1
我不得不取消这个请求,因为它思考了超过5分钟)
所以我认为查询规划器当时并不依赖索引。任何帮助将不胜感激。
I have a function that executes a query with ORDER BY clause. When I call this function, postgres gets stuck, but when I execute the query with the values I pass to the function, it responds immediately.
The function looks like:
CREATE OR REPLACE FUNCTION get_all_synset_tags_by_name(
lim int,
offs int
) RETURNS SETOF ctags AS $
SELECT concepts.id, expressions.name,
array_to_string(
array(
SELECT descr FROM ctags WHERE id IN (SELECT unnest(
concept_get_expr_synonyms(concepts.id)
))),'; '
), (
SELECT sum(freq) FROM ctags WHERE id IN (SELECT unnest(
concept_get_expr_synonyms(concepts.id)
)))::integer
FROM concepts,expressions
WHERE
concepts.is_dropped=FALSE AND expressions.is_dropped=FALSE AND expressions.id=concepts.expr_id AND
concepts.id=(concept_get_expr_synonyms(concepts.id))[1]
ORDER BY name
LIMIT $1 OFFSET $2;
$ language sql STABLE;
CREATE OR REPLACE FUNCTION get_top_tags_all_name(
user_id int,
lim int,
offs int,
at bool,
dt bool,
mt bool
)
RETURNS
TABLE(
id int,
name text,
descr text,
freq int,
foll_status bool,
ad_perm bool,
rm_perm bool,
ed_perm bool,
n_folls bigint,
n_quests bigint,
n_opins bigint
) AS $
SELECT ctags.id,ctags.name,ctags.descr,ctags.freq,
(SELECT users_tags.is_ignored
FROM users_tags
WHERE users_tags.tag=ctags.id AND users_tags.follower=$1),
$4,$5,$6,
(SELECT count(tag) FROM users_tags
WHERE users_tags.tag=ctags.id AND users_tags.is_ignored=FALSE),
(SELECT count(question_tags.question_id) FROM question_tags
LEFT JOIN questions ON question_tags.question_id=questions.id
WHERE question_tags.tag_id=ctags.id AND questions.qtype='quest' ),
(SELECT count(question_tags.question_id) FROM question_tags
LEFT JOIN questions ON question_tags.question_id=questions.id
WHERE question_tags.tag_id=ctags.id AND questions.qtype='opin' )
FROM get_all_synset_tags_by_name($2,$3) AS ctags;
$ language sql STRICT;
When I execute the query from get_top_tags_all_name with EXPLAIN, it says:
qa=# EXPLAIN SELECT ctags.id,ctags.name,ctags.descr,ctags.freq,
qa-# (SELECT users_tags.is_ignored
qa(# FROM users_tags
qa(# WHERE users_tags.tag=ctags.id AND users_tags.follower=1),
qa-# FALSE,FALSE,FALSE,
qa-# (SELECT count(tag) FROM users_tags
qa(# WHERE users_tags.tag=ctags.id AND users_tags.is_ignored=FALSE),
qa-# (SELECT count(question_tags.question_id) FROM question_tags
qa(# LEFT JOIN questions ON question_tags.question_id=questions.id
qa(# WHERE question_tags.tag_id=ctags.id AND questions.qtype='quest' ),
qa-# (SELECT count(question_tags.question_id) FROM question_tags
qa(# LEFT JOIN questions ON question_tags.question_id=questions.id
qa(# WHERE question_tags.tag_id=ctags.id AND questions.qtype='opin' )
qa-# FROM get_all_synset_tags_by_name(10,0) AS ctags;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan ctags (cost=0.00..21323.00 rows=10 width=72)
-> Limit (cost=0.00..20434.38 rows=10 width=36)
-> Nested Loop (cost=0.00..19645616.99 rows=9614 width=36)
-> Index Scan using expressions_name_key on expressions (cost=0.00..996449.61 rows=1504967 width=36)
Filter: (NOT is_dropped)
-> Index Scan using concepts_expr_id_idx on concepts (cost=0.00..12.08 rows=1 width=8)
Index Cond: (public.concepts.expr_id = public.expressions.id)
Filter: ((NOT public.concepts.is_dropped) AND (public.concepts.id = (concept_get_expr_synonyms(public.concepts.id))[1]))
SubPlan 5
-> Nested Loop (cost=0.28..23.65 rows=1 width=43)
-> Nested Loop (cost=0.28..17.63 rows=1 width=47)
-> Nested Loop (cost=0.28..8.96 rows=1 width=8)
-> HashAggregate (cost=0.28..0.29 rows=1 width=4)
-> Result (cost=0.00..0.26 rows=1 width=0)
-> Index Scan using tags_conc_id_key on tags (cost=0.00..8.66 rows=1 width=4)
Index Cond: (public.tags.conc_id = (unnest(concept_get_expr_synonyms($1))))
-> Index Scan using concepts_id_key1 on concepts (cost=0.00..8.66 rows=1 width=51)
Index Cond: (public.concepts.id = public.tags.conc_id)
-> Index Scan using expressions_pkey on expressions (cost=0.00..6.01 rows=1 width=4)
Index Cond: (public.expressions.id = public.concepts.expr_id)
SubPlan 6
-> Aggregate (cost=23.66..23.67 rows=1 width=4)
-> Nested Loop (cost=0.28..23.65 rows=1 width=4)
-> Nested Loop (cost=0.28..17.63 rows=1 width=8)
-> Nested Loop (cost=0.28..8.96 rows=1 width=12)
-> HashAggregate (cost=0.28..0.29 rows=1 width=4)
-> Result (cost=0.00..0.26 rows=1 width=0)
-> Index Scan using tags_conc_id_key on tags (cost=0.00..8.66 rows=1 width=8)
Index Cond: (public.tags.conc_id = (unnest(concept_get_expr_synonyms($1))))
-> Index Scan using concepts_id_key1 on concepts (cost=0.00..8.66 rows=1 width=8)
Index Cond: (public.concepts.id = public.tags.conc_id)
-> Index Scan using expressions_pkey on expressions (cost=0.00..6.01 rows=1 width=4)
Index Cond: (public.expressions.id = public.concepts.expr_id)
SubPlan 1
-> Index Scan using users_tags_tag_key on users_tags (cost=0.00..8.27 rows=1 width=1)
Index Cond: ((tag = $0) AND (follower = 1))
SubPlan 2
-> Aggregate (cost=14.90..14.91 rows=1 width=4)
-> Bitmap Heap Scan on users_tags (cost=4.33..14.88 rows=5 width=4)
Recheck Cond: (tag = $0)
Filter: (NOT is_ignored)
-> Bitmap Index Scan on users_tags_tag_key (cost=0.00..4.33 rows=10 width=0)
Index Cond: (tag = $0)
SubPlan 3
-> Aggregate (cost=32.83..32.84 rows=1 width=4)
-> Nested Loop (cost=0.00..32.82 rows=2 width=4)
-> Seq Scan on questions (cost=0.00..16.25 rows=2 width=4)
Filter: (qtype = 'quest'::quest_type)
-> Index Scan using question_tags_question_id_key on question_tags (cost=0.00..8.27 rows=1 width=4)
Index Cond: ((public.question_tags.question_id = public.questions.id) AND (public.question_tags.tag_id = $0))
SubPlan 4
-> Aggregate (cost=32.83..32.84 rows=1 width=4)
-> Nested Loop (cost=0.00..32.82 rows=2 width=4)
-> Seq Scan on questions (cost=0.00..16.25 rows=2 width=4)
Filter: (qtype = 'opin'::quest_type)
-> Index Scan using question_tags_question_id_key on question_tags (cost=0.00..8.27 rows=1 width=4)
Index Cond: ((public.question_tags.question_id = public.questions.id) AND (public.question_tags.tag_id = $0))
(57 rows)
So when I call this select it responds immediately.
But when I call the function, it gets stuck:
SELECT * FROM get_top_tags_all_name(1,10,0,FALSE,FALSE,FALSE);
^CCancel request sent
ERROR: canceling statement due to user request
CONTEXT: PL/pgSQL function "concept_get_expr_synonyms" line 6 at assignment
SQL function "get_top_tags_all_name" statement 1
(I had to cancel this request because it thinked more than 5 minutes)
So I suppose that the query planner didn't rely on the index that time. Any help would be appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我可以回答我自己的问题吗?
解决方案是
在调用函数之前(或在连接开始时)发出此指令,在其他方法可用时禁用显式排序。
默认情况下,postgresql 尝试通过显式对数据进行排序来优化小表上的大索引扫描。如果表很小,并且需要大部分数据,那么显式排序比索引扫描更快。问题是,就我而言,表很大,需要一小部分数据,但 postgresql 在准备我的函数时并不知道这一点。
May I answer my own question?
The solution is to
This directive, being issued before calling the functions (or at the beginning of the connection), disables explicit sorts in cases when other methods are available.
By default, postgresql tries to optimize big index scans on small tables by explicitly sorting the data. If the table is small, and the majority of data is needed, then explicit sort is faster than index scan. The thing is that in my case the tables are huge, and small parts of data are needed, but postgresql doesn't know it when it prepares my functions.
我认为问题在于 PG 为该函数制定了查询计划,而统计数据还不够好,无法优化。鉴于您的函数实际上只是一行 SELECT 语句,您可能会使用晦涩的
PREPARE
和EXECUTE
做得更好。 org/docs/9.0/static/sql-prepare.html" rel="nofollow">命令。I believe the issue is that PG makes its query plan for the function while the statistics are not yet good enough to optimize. Given that your functions are actually just one-line SELECT statements, you might do better with the obscure
PREPARE
andEXECUTE
commands.