涉及 OR 条件的高级索引 (pgsql)
我开始更好地掌握 PostgreSQL 索引,但我遇到了 OR 条件的问题,我不知道如何优化索引以实现更快的查询。
我有 6 个条件,当单独运行时,它们的成本似乎很小。以下是修剪查询的示例,包括查询计划计算时间。
(注意:为了降低复杂性,我没有输出下面这些查询的实际查询计划,但它们都使用嵌套循环左连接
和索引扫描
正如我所期望的那样,如果有必要,我可以包含查询计划以获得更有意义的响应。)
EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.*
FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
WHERE (conditions1)
LIMIT 10;
QUERY PLAN
-------------------------------------------------------------------------------------
Limit (cost=0.25..46.69 rows=1 width=171) (actual time=0.031..0.031 rows=0 loops=1)
EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.*
FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
WHERE (conditions2)
LIMIT 10;
QUERY PLAN
-------------------------------------------------------------------------------------
Limit (cost=0.76..18.97 rows=1 width=171) (actual time=14.764..14.764 rows=0 loops=1)
/* snip */
EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.*
FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
WHERE (conditions6)
LIMIT 10;
QUERY PLAN
-------------------------------------------------------------------------------------
Limit (cost=0.51..24.48 rows=1 width=171) (actual time=0.252..5.332 rows=10 loops=1)
我的问题是我想将这 6 个条件与 OR 运算符连接在一起,使每个条件都成为可能。 。我的组合查询看起来更像是这样的:
EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.*
FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
WHERE (conditions1 OR conditions2 OR conditions3 OR conditions4 OR conditions5 OR conditions 6)
LIMIT 10;
不幸的是,这导致查询计划大量增加,该计划似乎不再使用我的索引(相反,选择执行散列左连接
而不是嵌套循环左连接
,并对之前使用的索引扫描
执行各种序列扫描
)。
Limit (cost=142.62..510755.78 rows=1 width=171) (actual time=30.591..30.986 rows=10 loops=1)
关于 OR 条件的索引,我应该知道什么特别的事情可以改善我的最终查询?
更新:如果我对每个单独的 SELECT 使用 UNION,这似乎会加快查询速度。但是,如果我将来选择订购结果,这会阻止我订购结果吗?以下是我通过 UNION 加速查询所做的事情:
EXPLAIN ANALYZE
SELECT t1.*, t2.*, t3.*
FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
WHERE (conditions1)
UNION
SELECT t1.*, t2.*, t3.*
FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
WHERE (conditions2)
UNION
SELECT t1.*, t2.*, t3.*
FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
WHERE (conditions3)
UNION
SELECT t1.*, t2.*, t3.*
FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
WHERE (conditions4)
UNION
SELECT t1.*, t2.*, t3.*
FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
WHERE (conditions5)
UNION
SELECT t1.*, t2.*, t3.*
FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
WHERE (conditions6)
LIMIT 10;
QUERY PLAN
-------------------------------------------------------------------------------------
Limit (cost=219.14..219.49 rows=6 width=171) (actual time=125.579..125.653 rows=10 loops=1)
I'm starting to get a much better grasp on PostgreSQL indexing, but I've run into an issue with the OR conditional, where I don't know how to go about optimizing my indexes for a faster query.
I have 6 conditionals that, when run individually, appear to have a small cost. Here's an example of the trimmed queries, including query plan calculated times.
(NOTE: I haven't output the actual query plans for these queries below for the sake of reducing complexity, but they all use nested loop left joins
and index scans
as I would expect with proper indexing. If necessary, I can include the query plans for a more meaningful response.)
EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.*
FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
WHERE (conditions1)
LIMIT 10;
QUERY PLAN
-------------------------------------------------------------------------------------
Limit (cost=0.25..46.69 rows=1 width=171) (actual time=0.031..0.031 rows=0 loops=1)
EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.*
FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
WHERE (conditions2)
LIMIT 10;
QUERY PLAN
-------------------------------------------------------------------------------------
Limit (cost=0.76..18.97 rows=1 width=171) (actual time=14.764..14.764 rows=0 loops=1)
/* snip */
EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.*
FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
WHERE (conditions6)
LIMIT 10;
QUERY PLAN
-------------------------------------------------------------------------------------
Limit (cost=0.51..24.48 rows=1 width=171) (actual time=0.252..5.332 rows=10 loops=1)
My problem is that I want to join these 6 conditions together with OR operators, making each condition a possibility. My combined query appears more like this:
EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.*
FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
WHERE (conditions1 OR conditions2 OR conditions3 OR conditions4 OR conditions5 OR conditions 6)
LIMIT 10;
Unfortunately, this results in a MASSIVE increase on the query plan, which no longer seems to be using my indexes (instead, choosing to do a hash left join
rather than a nested loop left join
, and performing various sequence scans
over the previously used index scans
).
Limit (cost=142.62..510755.78 rows=1 width=171) (actual time=30.591..30.986 rows=10 loops=1)
Is there anything special I should know about indexing with regards to OR-ed conditions that would improve my final query?
UPDATE: If I use a UNION for each individual SELECT, that seems to speed up the query. However, will that prevent me from ordering my results if I choose to in the future? Here's what I did to speed up the query via UNION:
EXPLAIN ANALYZE
SELECT t1.*, t2.*, t3.*
FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
WHERE (conditions1)
UNION
SELECT t1.*, t2.*, t3.*
FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
WHERE (conditions2)
UNION
SELECT t1.*, t2.*, t3.*
FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
WHERE (conditions3)
UNION
SELECT t1.*, t2.*, t3.*
FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
WHERE (conditions4)
UNION
SELECT t1.*, t2.*, t3.*
FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
WHERE (conditions5)
UNION
SELECT t1.*, t2.*, t3.*
FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
WHERE (conditions6)
LIMIT 10;
QUERY PLAN
-------------------------------------------------------------------------------------
Limit (cost=219.14..219.49 rows=6 width=171) (actual time=125.579..125.653 rows=10 loops=1)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
根据条件的不同,逻辑上可能不可能使用任何索引来帮助使用
OR
表达式的复杂条件。与 MySQL 一样,PostgreSQL 8.0 及更早版本在其索引文档中声明:
在 PostgreSQL 8.1 中,这一点已更改。
但是,如果这没有帮助,您可以使用您尝试过的 UNION 解决方案(这是 MySQL 用户的常见解决方案,它仍然存在每表一个索引的限制)。
您应该能够对
UNION
查询的结果进行排序,但必须使用括号来指定ORDER BY
适用于UNION< 的结果/code>,不仅仅是链中的最后一个子查询。
我希望这会有所帮助;我不是 PostgreSQL 优化器方面的专家。您可以尝试搜索邮件列表档案,或者在IRC 频道。
Depending on the conditions, it may be logically impossible to use any index to help a complex condition using
OR
expressions.Like MySQL, PostgreSQL 8.0 and earlier states in their docs on indexes:
With PostgreSQL 8.1, this has changed.
However, if this doesn't help, you can use the
UNION
solution you tried (this is a common solution for MySQL users, which continues to have a one-index-per-table limitation).You should be able to order the results of a
UNION
query, but you have to use parentheses to specify that thatORDER BY
applies to the result of theUNION
, not merely to the last subquery in the chain.I hope this helps; I'm not an expert on the PostgreSQL optimizer. You might try searching the mailing list archives, or asking on the IRC channel.
(抱歉 - 不知道如何回复回复,所以这是顶级的)
为了澄清 - PG 过去只使用单个索引进行单个表扫描。如果您有一个连接三个表的查询,并且每个表都有一个有用的索引,那么它总是足够聪明,可以使用所有三个表。
在您的特定情况下,可能发生的情况是您的 ORed 条件之间存在某种联系。 PostgreSQL 不知道这一点,因此最终假设它将匹配比实际更多的行。足够的行来更改您的查询计划。
此外,您的 UNIONed 查询与单个查询并不完全相同,因为您单独限制每个小查询,而不是使用 UNION 限制整个结果集。
这是不对的 - ORDER BY 适用于整个结果。
华泰
(Sorry - don't know how to reply to a reply, so this is going top level)
To clarify - PG used to only use a single index for a single table-scan. If you have a query joining three tables and each has a useful index it was always smart enough to use all three.
In your particular case what is probably happening is that you have some connection between your ORed conditions. PostgreSQL doesn't know this, and so ends up assuming it will match more rows than it actually does. Enough rows to change your query-plan.
Also your UNIONed queries aren't quite the same as the individual ones since you LIMIT each small one separately rather than the whole result-set with the UNION.
This isn't right - the ORDER BY applies to the whole result.
HTH