为什么MySQL要使用索引交集而不是组合索引?
我时常会遇到奇怪的 MySQL 行为。假设我有索引(类型、相对、创建)、(类型)、(相对)。对于这样的查询,最佳选择
SELECT id FROM tbl
WHERE rel = 3 AND type = 3
ORDER BY created;
是使用索引(type, rel,created)
。 但是 MySQL 决定将索引 (type)
和 (rel)
相交,这会导致性能更差。这是一个例子:
mysql> EXPLAIN
-> SELECT id FROM tbl
-> WHERE rel = 3 AND type = 3
-> ORDER BY created\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tbl
type: index_merge
possible_keys: idx_type,idx_rel,idx_rel_type_created
key: idx_type,idx_rel
key_len: 1,2
ref: NULL
rows: 4343
Extra: Using intersect(idx_type,idx_rel); Using where; Using filesort
和相同的查询,但添加了提示:
mysql> EXPLAIN
-> SELECT id FROM tbl USE INDEX (idx_type_rel_created)
-> WHERE rel = 3 AND type = 3
-> ORDER BY created\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tbl
type: ref
possible_keys: idx_type_rel_created
key: idx_type_rel_created
key_len: 3
ref: const,const
rows: 8906
Extra: Using where
我认为 MySQL 采用的执行计划在 EXPLAIN 命令的“行”列中包含较少的数字。从这个角度来看,与 4343 行的索引交集看起来确实比使用带有 8906 行的组合索引要好。那么,问题可能出在这些数字上吗?
mysql> SELECT COUNT(*) FROM tbl WHERE type=3 AND rel=3;
+----------+
| COUNT(*) |
+----------+
| 3056 |
+----------+
由此我可以得出结论,MySQL 在计算组合索引的大致行数时是错误的。
那么,我在这里该怎么做才能让MySQL采取正确的执行计划呢?
我不能使用优化器提示,因为我必须坚持使用 Django ORM 我发现的唯一解决方案是删除那些单字段索引。
MySQL 版本是 5.1.49。
表结构为:
CREATE TABLE tbl (
`id` int(11) NOT NULL AUTO_INCREMENT,
`type` tinyint(1) NOT NULL,
`rel` smallint(2) NOT NULL,
`created` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_type` (`type`),
KEY `idx_rel` (`rel`),
KEY `idx_type_rel_created` (`type`,`rel`,`created`)
) ENGINE=MyISAM;
From time to time I encounter a strange MySQL behavior. Let's assume I have indexes (type, rel, created), (type), (rel). The best choice for a query like this one:
SELECT id FROM tbl
WHERE rel = 3 AND type = 3
ORDER BY created;
would be to use index (type, rel, created)
.
But MySQL decides to intersect indexes (type)
and (rel)
, and that leads to worse perfomance. Here is an example:
mysql> EXPLAIN
-> SELECT id FROM tbl
-> WHERE rel = 3 AND type = 3
-> ORDER BY created\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tbl
type: index_merge
possible_keys: idx_type,idx_rel,idx_rel_type_created
key: idx_type,idx_rel
key_len: 1,2
ref: NULL
rows: 4343
Extra: Using intersect(idx_type,idx_rel); Using where; Using filesort
And the same query, but with a hint added:
mysql> EXPLAIN
-> SELECT id FROM tbl USE INDEX (idx_type_rel_created)
-> WHERE rel = 3 AND type = 3
-> ORDER BY created\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tbl
type: ref
possible_keys: idx_type_rel_created
key: idx_type_rel_created
key_len: 3
ref: const,const
rows: 8906
Extra: Using where
I think MySQL takes an execution plan which contains less number in the "rows" column of the EXPLAIN command. From that point of view, index intersection with 4343 rows looks really better than using my combined index with 8906 rows. So, maybe the problem is within those numbers?
mysql> SELECT COUNT(*) FROM tbl WHERE type=3 AND rel=3;
+----------+
| COUNT(*) |
+----------+
| 3056 |
+----------+
From this I can conclude that MySQL is mistaken at calculating approximate number of rows for combined index.
So, what can I do here to make MySQL take the right execution plan?
I can not use optimizer hints, because I have to stick to Django ORM
The only solution I found yet is to remove those one-field indexes.
MySQL version is 5.1.49.
The table structure is:
CREATE TABLE tbl (
`id` int(11) NOT NULL AUTO_INCREMENT,
`type` tinyint(1) NOT NULL,
`rel` smallint(2) NOT NULL,
`created` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_type` (`type`),
KEY `idx_rel` (`rel`),
KEY `idx_type_rel_created` (`type`,`rel`,`created`)
) ENGINE=MyISAM;
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
很难确切地说明为什么 MySQL 在索引扫描上选择 index_merge_intersection,但您应该注意,对于复合索引,将为复合索引存储最多给定列的统计信息。
复合索引的
type
列的information_schema.statistics.cardinality
值将显示(rel, type)
的基数,而不是 <代码>类型本身。如果
rel
和type
之间存在相关性,则(rel, type)
的基数将小于基数的乘积rel
和type
与相应列上的索引分开获取。这就是行数计算不正确的原因(交集的大小不能大于并集)。
您可以通过在
@@optimizer_switch
中将其设置为 off 来禁止index_merge_intersection
:It's hard to tell exactly why
MySQL
choosesindex_merge_intersection
over the index scan, but you should note that with the composite indexes, statistics up to the given column are stored for the composite indexes.The value of
information_schema.statistics.cardinality
for the columntype
of the composite index will show the cardinality of(rel, type)
, nottype
itself.If there is a correlation between
rel
andtype
, then cardinality of(rel, type)
will be less than product of cardinalities ofrel
andtype
taken separately from the indexes on corresponding columns.That's why the number of rows is calculated incorrectly (an intersection cannot be larger in size than a union).
You can forbid
index_merge_intersection
by setting it to off in@@optimizer_switch
:另一件值得一提的事情是:如果仅删除类型上的索引,则不会出现问题。该索引不是必需的,因为它复制了复合索引的一部分。
Another thing is worth mentioning: you would not have the problem if you deleted the index on type only. the index is not required since it duplicates a part of the composite index.
有时,同一个表上的交集可能很有趣,并且您可能不希望删除单个列上的索引,以便其他一些查询可以很好地处理交集。
在这种情况下,如果错误的执行计划只涉及一个查询,解决方案是排除不需要的索引。然后 Il 将阻止仅针对该特定查询使用交集......
在你的例子中:
Some time the intersection on same table could be interesting, and you may not want to remove an index on a single colum so as some other query work well with intersection.
In such case, if the bad execution plan concerns only one single query, a solution is to exclude the unwanted index. Il will then prevent the usage of intersection only for that sepcific query...
In your example :