为什么MySQL要使用索引交集而不是组合索引？

发布于 2024-10-08 22:12:46 字数 2055 浏览 15 评论 0原文

我时常会遇到奇怪的 MySQL 行为。假设我有索引（类型、相对、创建）、（类型）、（相对）。对于这样的查询，最佳选择

SELECT id FROM tbl
WHERE rel = 3 AND type = 3
ORDER BY created;

是使用索引(type, rel,created)。但是 MySQL 决定将索引 (type) 和 (rel) 相交，这会导致性能更差。这是一个例子：

mysql> EXPLAIN
    -> SELECT id FROM tbl
    -> WHERE rel = 3 AND type = 3
    -> ORDER BY created\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tbl
         type: index_merge
possible_keys: idx_type,idx_rel,idx_rel_type_created
          key: idx_type,idx_rel
      key_len: 1,2
          ref: NULL
         rows: 4343
        Extra: Using intersect(idx_type,idx_rel); Using where; Using filesort

和相同的查询，但添加了提示：

mysql> EXPLAIN
    -> SELECT id FROM tbl USE INDEX (idx_type_rel_created)
    -> WHERE rel = 3 AND type = 3
    -> ORDER BY created\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tbl
         type: ref
possible_keys: idx_type_rel_created
          key: idx_type_rel_created
      key_len: 3
          ref: const,const
         rows: 8906
        Extra: Using where

我认为 MySQL 采用的执行计划在 EXPLAIN 命令的“行”列中包含较少的数字。从这个角度来看，与 4343 行的索引交集看起来确实比使用带有 8906 行的组合索引要好。那么，问题可能出在这些数字上吗？

mysql> SELECT COUNT(*) FROM tbl WHERE type=3 AND rel=3;
+----------+
| COUNT(*) |
+----------+
|     3056 |
+----------+

由此我可以得出结论，MySQL 在计算组合索引的大致行数时是错误的。

那么，我在这里该怎么做才能让MySQL采取正确的执行计划呢？

我不能使用优化器提示，因为我必须坚持使用 Django ORM 我发现的唯一解决方案是删除那些单字段索引。

MySQL 版本是 5.1.49。

表结构为：

CREATE TABLE tbl (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `type` tinyint(1) NOT NULL,
  `rel` smallint(2) NOT NULL,
  `created` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_type` (`type`),
  KEY `idx_rel` (`rel`),
  KEY `idx_type_rel_created` (`type`,`rel`,`created`)
) ENGINE=MyISAM;

原文

From time to time I encounter a strange MySQL behavior. Let's assume I have indexes (type, rel, created), (type), (rel). The best choice for a query like this one:

SELECT id FROM tbl
WHERE rel = 3 AND type = 3
ORDER BY created;

would be to use index (type, rel, created).
But MySQL decides to intersect indexes (type) and (rel), and that leads to worse perfomance. Here is an example:

mysql> EXPLAIN
    -> SELECT id FROM tbl
    -> WHERE rel = 3 AND type = 3
    -> ORDER BY created\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tbl
         type: index_merge
possible_keys: idx_type,idx_rel,idx_rel_type_created
          key: idx_type,idx_rel
      key_len: 1,2
          ref: NULL
         rows: 4343
        Extra: Using intersect(idx_type,idx_rel); Using where; Using filesort

And the same query, but with a hint added:

mysql> EXPLAIN
    -> SELECT id FROM tbl USE INDEX (idx_type_rel_created)
    -> WHERE rel = 3 AND type = 3
    -> ORDER BY created\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tbl
         type: ref
possible_keys: idx_type_rel_created
          key: idx_type_rel_created
      key_len: 3
          ref: const,const
         rows: 8906
        Extra: Using where

I think MySQL takes an execution plan which contains less number in the "rows" column of the EXPLAIN command. From that point of view, index intersection with 4343 rows looks really better than using my combined index with 8906 rows. So, maybe the problem is within those numbers?

mysql> SELECT COUNT(*) FROM tbl WHERE type=3 AND rel=3;
+----------+
| COUNT(*) |
+----------+
|     3056 |
+----------+

From this I can conclude that MySQL is mistaken at calculating approximate number of rows for combined index.

So, what can I do here to make MySQL take the right execution plan?

I can not use optimizer hints, because I have to stick to Django ORM
The only solution I found yet is to remove those one-field indexes.

MySQL version is 5.1.49.

The table structure is:

CREATE TABLE tbl (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `type` tinyint(1) NOT NULL,
  `rel` smallint(2) NOT NULL,
  `created` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_type` (`type`),
  KEY `idx_rel` (`rel`),
  KEY `idx_type_rel_created` (`type`,`rel`,`created`)
) ENGINE=MyISAM;

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

嘿哥们儿 2024-10-15 22:12:46

很难确切地说明为什么 MySQL 在索引扫描上选择 index_merge_intersection，但您应该注意，对于复合索引，将为复合索引存储最多给定列的统计信息。

复合索引的 type 列的 information_schema.statistics.cardinality 值将显示 (rel, type) 的基数，而不是 <代码>类型本身。

如果 rel 和 type 之间存在相关性，则 (rel, type) 的基数将小于 基数的乘积rel 和 type 与相应列上的索引分开获取。

这就是行数计算不正确的原因（交集的大小不能大于并集）。

您可以通过在 @@optimizer_switch 中将其设置为 off 来禁止 index_merge_intersection：

SET optimizer_switch = 'index_merge_intersection=off'

It's hard to tell exactly why MySQL chooses index_merge_intersection over the index scan, but you should note that with the composite indexes, statistics up to the given column are stored for the composite indexes.

The value of information_schema.statistics.cardinality for the column type of the composite index will show the cardinality of (rel, type), not type itself.

If there is a correlation between rel and type, then cardinality of (rel, type) will be less than product of cardinalities of rel and type taken separately from the indexes on corresponding columns.

That's why the number of rows is calculated incorrectly (an intersection cannot be larger in size than a union).

You can forbid index_merge_intersection by setting it to off in @@optimizer_switch:

SET optimizer_switch = 'index_merge_intersection=off'

回复收藏 0 原文

虫児飞 2024-10-15 22:12:46

另一件值得一提的事情是：如果仅删除类型上的索引，则不会出现问题。该索引不是必需的，因为它复制了复合索引的一部分。

回复收藏 0 原文

请叫√我孤独 2024-10-15 22:12:46

有时，同一个表上的交集可能很有趣，并且您可能不希望删除单个列上的索引，以便其他一些查询可以很好地处理交集。
在这种情况下，如果错误的执行计划只涉及一个查询，解决方案是排除不需要的索引。然后 Il 将阻止仅针对该特定查询使用交集......
在你的例子中：

SELECT id FROM tbl IGNORE INDEX(idx_type) 
WHERE rel = 3 AND type = 3
ORDER BY created;
enter code here

Some time the intersection on same table could be interesting, and you may not want to remove an index on a single colum so as some other query work well with intersection.
In such case, if the bad execution plan concerns only one single query, a solution is to exclude the unwanted index. Il will then prevent the usage of intersection only for that sepcific query...
In your example :

SELECT id FROM tbl IGNORE INDEX(idx_type) 
WHERE rel = 3 AND type = 3
ORDER BY created;
enter code here

回复收藏 0 原文

~没有更多了~

关于作者

清音悠歌

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

为什么MySQL要使用索引交集而不是组合索引？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

为什么MySQL要使用索引交集而不是组合索引？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。