MySQL 复合索引中键的高性能排序(WRT Rails 多态关联和 STI)
之前,我问过 这个关于 ActiveRecord 中多态外键的复合索引的问题< /a>.我的问题的基础是我的理解,即索引应该基于列的基数,并且 Rails 的 STI 类型和多态 _type 列的基数通常相当低。
接受我的问题的答案是正确的——对高基数 _id 列和低基数 _type 列进行索引是有价值的,因为它们在一起具有高基数——我的下一个问题是:您应该如何订购您的化合物索引?
[owner_id,owner_type] 索引将基数较高的字段放在第一位,而 [owner_type,owner_id] 将基数较高的字段放在第二位。使用前一个键的查询是否比使用后一个键的查询性能更高,或者它们的性能相同?
我问这个问题是因为这对于我如何订购服务于 STI 模型的表的复合键有特别的影响。 STI Rails 查找器几乎总是在类型列上查询——这又是一个基数通常较低的列。因此,类型列的查询频率比其他索引高得多。如果类型列的查询频率更高,那么使用类型引导索引也许是有意义的,因为不太具体的查询可以利用索引的第一部分来提高性能。然而,我不会以牺牲高度特定查询的性能为代价而降低额外的好处。利用索引的较高基数部分。
Previously, I asked this question about compound indexes on polymorphic foreign keys in ActiveRecord. The basis of my question was my understanding that indexes should be based on the cardinality of your column, and there's generally pretty low cardinality on Rails's STI type and polymorphic _type columns.
Accepting that the answer to my question is right -- that's there's value to indexing both the high cardinality _id columns and the low cardinality _type columns, because they together they have a high cardinality -- my next question is: how should you order your compound indexes?
An index of [owner_id, owner_type] places the field with higher cardinality first, while [owner_type, owner_id] places the field with higher cardinality second. Is a query using the former key more performant than a query using the latter key, or are they equally performant?
I ask because this has particular bearing on how I would order the compound keys for tables serving STI models. STI Rails finders almost always query on the type column -- which again is a column of generally low cardinality. The type column is therefore queried much more often than other indexes. If the type column is queried much more often, then maybe it makes sense to use the type-leading index, because less specific queries could take advantage of the first part of the index yielding a performance-boost. However, I wouldn't smaller perk to come at the detriment of performance to highly-specific queries. that take advantage of the higher-cardinality portion of the index.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
根据我自己的研究(但我不是专家 DBA),我了解到在决定复合键索引的顺序时需要考虑两件事。
首先,关于列的基数,索引通常更适合搜索基数高的列。因此,我倾向于将基数最高的列放在索引中的第一个位置。作为参考,有一篇标题为 MySQL 查询优化 的文章,内容如下:
在您的情况下,
_id
列显然更适合该定义,因此它们更适合作为键的前缀。另一件需要考虑的事情是这些索引的可重用性。大多数(如果不是全部)数据库系统允许重复使用复合键的前缀。例如,
(owner_id,owner_type)
上的复合键也可以用于owner_id
上的查询,但不能用于owner_type
上的查询。因此,根据您在问题中所解释的内容,您可能最好使用两个索引:
(owner_id,owner_type)
上的复合键索引和(owner_type)
上的另一个复合键索引。最后,这实际上完全取决于您的数据集和查询。尝试多种场景,使用不同的复合键排序进行基准测试,看看什么是最佳解决方案。另外,不要忘记索引会导致表上的写入损失。
更新:还有另一个关于复合键索引的相当流行的问题:
什么时候应该使用复合索引?
From my own research (but I'm no expert DBA) I've learned that there's two thing to consider when deciding the order of a compound key index.
First, concerning the cardinality of columns, index generally are better at searching columns with high cardinality. So I would be inclined to place the column with the highest cardinality first in the index. For reference, there's an article titled MySQL Query Optimization that says:
In your case, the
_id
columns would clearly fit better that definition, thus they're a better candidate for being a prefix of the key.Another thing to consider would be the reusability of these indexes. Most (if not all) database systems allow a prefix of a compound key to be reused. For example, a compound key on
(owner_id, owner_type)
could also be used by queries onowner_id
but not onowner_type
.So from what you explained in your question you might be better off with two indexes: a compound key index on
(owner_id, owner_type)
and a another on(owner_type)
.Finally, it really all comes down to your dataset and queries. Try out multiple scenarios, benchmarks using different compound key ordering to see what is the most optimal solution. Also, don't forget that indexes incur a write penalty on your tables.
Update: There's also another rather popular SO question about compound key index there:
When should I use a composite index?
TL;DR 首先放置类型,然后放置 id。
确实,将 id 放在第一位会增加第一个决策的基数,从而可以轻松扫描结果记录或应用第二个小索引。但是,如果您单独按类型进行查询(您将这样做),则必须在该类型上维护另一个顶级索引,这会给您的写入性能带来影响。
相反,
[type, id]
将给出一个顶级索引,在仅按类型搜索时可以重复使用该索引。第二个决定将始终对应于单行,因为 id 在类型上是唯一的,因此您仍然可以确保在索引解析后不会进行行扫描。IMO 维护另一个索引对写入性能的影响不值得不首先采用类型决策树的边际收益。
TL;DR Put the type first, then the id.
True, putting the id first would increase the cardinality of the first decision, making it easy to scan the resulting records or apply the second small index. However, if you ever query by type alone (which you will), you'll have to maintain another top-level index on the type, which will give you a performance hit on writes.
The other way around,
[type, id]
, will give a top-level index that can be re-used when searching just by type. The second decision will always correspond to a single row, since id is unique by type, so you're still assured no row scanning after index resolution.IMO the write performance hit of maintaining another index is not worth the marginal gain of not taking the type decision tree first.