MySQL对varchar字段的部分索引和group by优化

发布于 2024-10-30 23:02:16 字数 2100 浏览 1 评论 0原文

我在使用 MySQL 进行组查询时遇到一些问题。

问题

是否有原因导致查询不在 varchar(255) 字段上使用 10 个字符的部分索引来优化分组依据?

详细信息

我的设置:

CREATE TABLE `sessions` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `user_id` int(11) DEFAULT NULL,
  `ref_source` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `guid` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `initial_path` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `referrer_host` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `campaign` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_sessions_on_user_id` (`user_id`),
  KEY `index_sessions_on_referrer_host` (`referrer_host`(10)),
  KEY `index_sessions_on_initial_path` (`initial_path`(10)),
  KEY `index_sessions_on_campaign` (`campaign`(10))
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

此处未显示许多列和索引,因为它们不会真正影响问题。

我想要做的是运行查询以查看所有引用主机以及来自每个主机的会话数。我没有很大的表,但它足够大,我的全表扫描并不有趣。我想要运行的查询是:

SELECT COUNT(*) AS count_all, referrer_host AS referrer_host FROM `sessions` GROUP BY referrer_host;

解释给出:

+----+-------------+----------+------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows   | Extra                           |
+----+-------------+----------+------+---------------+------+---------+------+--------+---------------------------------+
|  1 | SIMPLE      | sessions | ALL  | NULL          | NULL | NULL    | NULL | 303049 | Using temporary; Using filesort |
+----+-------------+----------+------+---------------+------+---------+------+--------+---------------------------------+

我在 referrer_host 上有一个部分索引,但它没有使用它。即使我尝试USE INDEXFORCE INDEX也没有帮助。解释是一样的,表现也是一样。

如果我在 referrer_host 上添加完整索引,而不是 10 个字符的部分索引,那么一切都会更好(即使不是立即)。 (350 毫秒与 10 秒)

我已经测试了大于字段中最长条目的部分索引,但也无济于事。完整索引似乎是唯一有效的方法。

I am having some issues with a group query with MySQL.

Question

Is there a reason why a query won't use a 10 character partial index on a varchar(255) field to optimize a group by?

Details

My setup:

CREATE TABLE `sessions` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `user_id` int(11) DEFAULT NULL,
  `ref_source` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `guid` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `initial_path` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `referrer_host` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `campaign` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_sessions_on_user_id` (`user_id`),
  KEY `index_sessions_on_referrer_host` (`referrer_host`(10)),
  KEY `index_sessions_on_initial_path` (`initial_path`(10)),
  KEY `index_sessions_on_campaign` (`campaign`(10))
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

A number of columns and indexes are not shown here since they don't really impact the issue.

What I want to do is run a query to see all of the referring hosts and the number of session coming from each. I don't have a huge table, but it is big enough where I full table scans aren't fun. The query I want to run is:

SELECT COUNT(*) AS count_all, referrer_host AS referrer_host FROM `sessions` GROUP BY referrer_host;

The explain gives:

+----+-------------+----------+------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows   | Extra                           |
+----+-------------+----------+------+---------------+------+---------+------+--------+---------------------------------+
|  1 | SIMPLE      | sessions | ALL  | NULL          | NULL | NULL    | NULL | 303049 | Using temporary; Using filesort |
+----+-------------+----------+------+---------------+------+---------+------+--------+---------------------------------+

I have a partial index on referrer_host, but it isn't using it. Even if I try to USE INDEX or FORCE INDEX it doesn't help. The explain is the same, as is the performance.

If I add a full index on referrer_host, instead of a 10 character partial index, everything is works better, if not instantly. (350ms vs. 10 seconds)

I have tested partial indexes that are bigger than the longest entry in the field to no avail as well. The full index is the only thing that seems to work.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

深海蓝天 2024-11-06 23:02:16

使用完整索引,查询将扫描整个索引并返回每个唯一键指向的记录数。桌子没有被碰过。

使用部分索引,引擎在查看记录之前不知道referrer_host 的值。它必须扫描整个表!

如果referrer_host 的大多数值都小于 10 个字符,那么理论上,优化器可以使用索引,然后仅检查具有超过 10 个字符的行。但是,因为这不是聚集索引,所以必须进行许多非顺序磁盘读取才能找到这些记录。它最终可能会更慢,因为表扫描至少是顺序读取。优化器不做假设,而是进行扫描。

with the full index, the query will find scan the entire index and return the number of records pointed to for each unique key. the table isn't touched.

with the partial index, the engine doesn't know the value of the referrer_host until it looks at the record. It has to scan the whole table!

if most of the values for referrer_host are less than 10 chars then in theory, the optimiser could use the index and then only check rows that have more than 10 chars. But, because this is not a clustered index it would have to make many non-sequential disk reads to find these records. It could end up being even slower, because a table scan will at least be a sequential read. Instead of making assumptions, the optimiser just does a scan.

撩发小公举 2024-11-06 23:02:16

尝试这个查询:

EXPLAIN SELECT COUNT(referrer_host) AS count_all, referrer_host  FROM `sessions` GROUP BY referrer_host;

现在,当referrer_host = null时,group by的计数将失败,但我不确定是否有另一种方法可以解决这个问题。

Try this query:

EXPLAIN SELECT COUNT(referrer_host) AS count_all, referrer_host  FROM `sessions` GROUP BY referrer_host;

Now the count will fail for the group by on referrer_host = null, but I'm uncertain if there's another way around this.

叫思念不要吵 2024-11-06 23:02:16

您将在referrer_host 上对表中的所有行进行分组。由于您的索引不包含referrer_host(它包含前10个字符!),因此它将扫描整个表。

我敢打赌,这会更快,但不太详细:

SELECT COUNT(*) AS count_all, substring(referrer_host,1,10) AS referrer_host FROM `sessions` GROUP BY referrer_host;

如果您需要完整的引荐来源网址,请将其编入索引。

You're grouping on referrer_host for all the rows in the table. As your index doesn't include referrer_host (it contains the first 10 chars!), it's going to scan the whole table.

I'll bet that this is faster, though less detailed:

SELECT COUNT(*) AS count_all, substring(referrer_host,1,10) AS referrer_host FROM `sessions` GROUP BY referrer_host;

If you need the full referrer, index it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文