Postgresql:当 where 子句中缺少索引子集时,多列索引的最佳使用
我将使用类似于此的 where 子句对数据库进行查询:
SELECT * FROM table WHERE a = 'string_value' AND b = 'other_string_value' AND t > <timestamp>
但很少这样做:
SELECT * FROM table WHERE a = 'string_value' AND t > <timestamp>
我已按该顺序在 a、b 和 t 上创建了多列索引。但是,我不确定它对于我的第二个不太频繁的查询是否是最佳的。
该索引会在 b 上进行索引扫描还是跳过它并立即移动到 t 索引? (说实话,我不确定索引扫描到底是如何工作的)。我是否应该仅为第二个查询在 a 和 t 上创建第二个多列索引?
文档指出
“当前导(最左边)列受到限制时,索引效率最高”
但在示例中,它没有突出显示 where 子句中缺少“b”等式列的情况。
I will be having queries on my database with where clauses similar to this:
SELECT * FROM table WHERE a = 'string_value' AND b = 'other_string_value' AND t > <timestamp>
and less often to this:
SELECT * FROM table WHERE a = 'string_value' AND t > <timestamp>
I have created a multicolumn index on a, b and t on that order. However I am not sure if it will be optimal for my second -less frequent- query.
Will this index do an index scan on b or skip it and move to the t index immediately? (To be honest Im not sure how index scans work exactly). Should I create a second multi-column index on a and t only for the second query?
The docs state that
'the index is most efficient when there are constraints on the leading (leftmost) columns'
But in the example it doesn't highlight my case where the 'b' equality column is missing in the where clause.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
对于
(a,b,t)
上的 btree 索引,第二个查询的效率会低得多,因为缺少b
意味着t
不能有效使用(它仍然可以用作索引内过滤器,但这远不如用作起始/停止点)。(a,t)
上的索引将能够更有效地支持第二个查询。但这并不意味着您也必须创建该索引。索引占用空间并且必须维护,因此远非免费。对于第二个查询,最好只接受不太理想的计划,因为该查询的使用“频率较低”。另一方面,你确实费心发帖谈论它,所以也许“不那么频繁”仍然很常见。因此,您可能最好只构建额外的索引并花时间担心其他事情。
btree 索引可以被认为是一本电话簿,它按照姓氏排序,然后是名字,然后是中间名。您的第一个查询就像搜索“名为 Mary Smith 的中间名小于 Cathy 的人”,您可以使用二分搜索来有效地找到第一个“Mary Smith”,然后扫描这些人,直到中间名 > > 。 “凯茜”,你就完成了。与“姓史密斯但中间名小于凯茜的人”进行比较。现在你必须扫描所有史密斯的。你不能停在第一个中间名>凯茜,因为名字的任何变化都会重置中间名的顺序。
鉴于
b
只有 10 个不同的值,您可以想象在跳跃扫描中非常有效地使用(a,b,t)
索引。但 PostgreSQL 尚未原生实现跳跃扫描。你可以模仿它们,但这很脆弱、丑陋、工作量很大,而且很容易搞砸。你在这里所说的一切都没有让我认为这是值得做的。The 2nd query will be much less effective with the btree index on
(a,b,t)
because the absence ofb
meanst
cannot be used efficiently (it can still be used as an in-index filter, but that is not nearly as good as being used as a start/stop point). An index on(a,t)
will be able to support the 2nd query much more efficiently.But that doesn't mean you have to create that index as well. Indexes take space and must be maintained, so are far from free. It might be better to just live with less-than-optimal plans for the 2nd query, since that query is used "less often". On the other hand, you did bother to post about it, so maybe "less often" is still pretty often. So you might be better off just to build the extra index and spend your time worrying about something else.
A btree index can be thought of like a phonebook, which is sorted on last name, then first name, then middle name. Your first query is like searching for "people named Mary Smith with a middle name less than Cathy" You can use binary search to efficiently find the first "Mary Smith", then you scan through those until the middle name is > 'Cathy', and you are done. Compare that to "people surnamed Smith with a middle name less than Cathy". Now you have to scan all the Smith's. You can't stop at the first middle name > Cathy, because any change in first name resets the order of the middle names.
Given that
b
only has 10 distinct values, you could conceivably use the(a,b,t)
index in a skip scan quite efficiently. But PostgreSQL doen't yet implement skip scans natively. You can emulate them, but that is fragile, ugly, a lot of work, and easy to screw up. Nothing you said here makes me think it would be worthwhile to do.