涉及 OR 条件的高级索引 (pgsql)

发布于 2024-08-10 16:59:02 字数 3022 浏览 1 评论 0原文

我开始更好地掌握 PostgreSQL 索引,但我遇到了 OR 条件的问题,我不知道如何优化索引以实现更快的查询。

我有 6 个条件,当单独运行时,它们的成本似乎很小。以下是修剪查询的示例,包括查询计划计算时间。

注意:为了降低复杂性,我没有输出下面这些查询的实际查询计划,但它们都使用嵌套循环左连接索引扫描 正如我所期望的那样,如果有必要,我可以包含查询计划以获得更有意义的响应。

EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions1)
 LIMIT 10;

QUERY PLAN
-------------------------------------------------------------------------------------
Limit  (cost=0.25..46.69 rows=1 width=171) (actual time=0.031..0.031 rows=0 loops=1)

EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions2)
 LIMIT 10;

QUERY PLAN
-------------------------------------------------------------------------------------
Limit  (cost=0.76..18.97 rows=1 width=171) (actual time=14.764..14.764 rows=0 loops=1)

/* snip */

EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions6)
 LIMIT 10;

QUERY PLAN
-------------------------------------------------------------------------------------
Limit  (cost=0.51..24.48 rows=1 width=171) (actual time=0.252..5.332 rows=10 loops=1)

我的问题是我想将这 6 个条件与 OR 运算符连接在一起,使每个条件都成为可能。 。我的组合查询看起来更像是这样的:

EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions1 OR conditions2 OR conditions3 OR conditions4 OR conditions5 OR conditions 6)
 LIMIT 10;

不幸的是,这导致查询计划大量增加,该计划似乎不再使用我的索引(相反,选择执行散列左连接而不是嵌套循环左连接,并对之前使用的索引扫描执行各种序列扫描)。

Limit  (cost=142.62..510755.78 rows=1 width=171) (actual time=30.591..30.986 rows=10 loops=1)

关于 OR 条件的索引,我应该知道什么特别的事情可以改善我的最终查询?

更新:如果我对每个单独的 SELECT 使用 UNION,这似乎会加快查询速度。但是,如果我将来选择订购结果,这会阻止我订购结果吗?以下是我通过 UNION 加速查询所做的事情:

EXPLAIN ANALYZE
SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions1)
UNION
SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions2)
UNION
SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions3)
UNION
SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions4)
UNION
SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions5)
UNION
SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions6)
 LIMIT 10;

QUERY PLAN
-------------------------------------------------------------------------------------
Limit  (cost=219.14..219.49 rows=6 width=171) (actual time=125.579..125.653 rows=10 loops=1)

I'm starting to get a much better grasp on PostgreSQL indexing, but I've run into an issue with the OR conditional, where I don't know how to go about optimizing my indexes for a faster query.

I have 6 conditionals that, when run individually, appear to have a small cost. Here's an example of the trimmed queries, including query plan calculated times.

(NOTE: I haven't output the actual query plans for these queries below for the sake of reducing complexity, but they all use nested loop left joins and index scans as I would expect with proper indexing. If necessary, I can include the query plans for a more meaningful response.)

EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions1)
 LIMIT 10;

QUERY PLAN
-------------------------------------------------------------------------------------
Limit  (cost=0.25..46.69 rows=1 width=171) (actual time=0.031..0.031 rows=0 loops=1)

EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions2)
 LIMIT 10;

QUERY PLAN
-------------------------------------------------------------------------------------
Limit  (cost=0.76..18.97 rows=1 width=171) (actual time=14.764..14.764 rows=0 loops=1)

/* snip */

EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions6)
 LIMIT 10;

QUERY PLAN
-------------------------------------------------------------------------------------
Limit  (cost=0.51..24.48 rows=1 width=171) (actual time=0.252..5.332 rows=10 loops=1)

My problem is that I want to join these 6 conditions together with OR operators, making each condition a possibility. My combined query appears more like this:

EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions1 OR conditions2 OR conditions3 OR conditions4 OR conditions5 OR conditions 6)
 LIMIT 10;

Unfortunately, this results in a MASSIVE increase on the query plan, which no longer seems to be using my indexes (instead, choosing to do a hash left join rather than a nested loop left join, and performing various sequence scans over the previously used index scans).

Limit  (cost=142.62..510755.78 rows=1 width=171) (actual time=30.591..30.986 rows=10 loops=1)

Is there anything special I should know about indexing with regards to OR-ed conditions that would improve my final query?

UPDATE: If I use a UNION for each individual SELECT, that seems to speed up the query. However, will that prevent me from ordering my results if I choose to in the future? Here's what I did to speed up the query via UNION:

EXPLAIN ANALYZE
SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions1)
UNION
SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions2)
UNION
SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions3)
UNION
SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions4)
UNION
SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions5)
UNION
SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions6)
 LIMIT 10;

QUERY PLAN
-------------------------------------------------------------------------------------
Limit  (cost=219.14..219.49 rows=6 width=171) (actual time=125.579..125.653 rows=10 loops=1)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

思念绕指尖 2024-08-17 16:59:02

根据条件的不同,逻辑上可能不可能使用任何索引来帮助使用 OR 表达式的复杂条件。

与 MySQL 一样,PostgreSQL 8.0 及更早版本在其索引文档中声明:

请注意,查询或数据操作命令每个表最多可以使用一个索引。

在 PostgreSQL 8.1 中,这一点已更改

但是,如果这没有帮助,您可以使用您尝试过的 UNION 解决方案(这是 MySQL 用户的常见解决方案,它仍然存在每表一个索引的限制)。

您应该能够对 UNION 查询的结果进行排序,但必须使用括号来指定 ORDER BY 适用于 UNION< 的结果/code>,不仅仅是链中的最后一个子查询。

(SELECT ... )
UNION
(SELECT ... )
UNION
(SELECT ... )
ORDER BY columnname;

我希望这会有所帮助;我不是 PostgreSQL 优化器方面的专家。您可以尝试搜索邮件列表档案,或者在IRC 频道

Depending on the conditions, it may be logically impossible to use any index to help a complex condition using OR expressions.

Like MySQL, PostgreSQL 8.0 and earlier states in their docs on indexes:

Note that a query or data manipulation command can use at most one index per table.

With PostgreSQL 8.1, this has changed.

However, if this doesn't help, you can use the UNION solution you tried (this is a common solution for MySQL users, which continues to have a one-index-per-table limitation).

You should be able to order the results of a UNION query, but you have to use parentheses to specify that that ORDER BY applies to the result of the UNION, not merely to the last subquery in the chain.

(SELECT ... )
UNION
(SELECT ... )
UNION
(SELECT ... )
ORDER BY columnname;

I hope this helps; I'm not an expert on the PostgreSQL optimizer. You might try searching the mailing list archives, or asking on the IRC channel.

朱染 2024-08-17 16:59:02

(抱歉 - 不知道如何回复回复,所以这是顶级的)

为了澄清 - PG 过去只使用单个索引进行单个表扫描。如果您有一个连接三个表的查询,并且每个表都有一个有用的索引,那么它总是足够聪明,可以使用所有三个表。

在您的特定情况下,可能发生的情况是您的 ORed 条件之间存在某种联系。 PostgreSQL 不知道这一点,因此最终假设它将匹配比实际更多的行。足够的行来更改您的查询计划。

此外,您的 UNIONed 查询与单个查询并不完全相同,因为您单独限制每个小查询,而不是使用 UNION 限制整个结果集。

您应该能够订购
UNION 查询的结果,但您有
使用括号来指定
ORDER BY 适用于以下结果
联盟,不仅仅是直到最后
链中的子查询。

这是不对的 - ORDER BY 适用于整个结果。

华泰

(Sorry - don't know how to reply to a reply, so this is going top level)

To clarify - PG used to only use a single index for a single table-scan. If you have a query joining three tables and each has a useful index it was always smart enough to use all three.

In your particular case what is probably happening is that you have some connection between your ORed conditions. PostgreSQL doesn't know this, and so ends up assuming it will match more rows than it actually does. Enough rows to change your query-plan.

Also your UNIONed queries aren't quite the same as the individual ones since you LIMIT each small one separately rather than the whole result-set with the UNION.

You should be able to order the
results of a UNION query, but you have
to use parentheses to specify that
that ORDER BY applies to the result of
the UNION, not merely to the last
subquery in the chain.

This isn't right - the ORDER BY applies to the whole result.

HTH

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文