如何加快已经使用索引的 count(*) 速度? (MyISAM)
我有 3 个大表(10k、10k 和 100M 行),我试图对它们的连接进行简单的计数,其中所有连接的列都被索引。为什么 COUNT(*) 需要这么长时间,如何加快速度(没有触发器和运行摘要)?
mysql> describe SELECT COUNT(*) FROM `metaward_alias` INNER JOIN `metaward_achiever` ON (`metaward_alias`.`id` = `metaward_achiever`.`alias_id`) INNER JOIN `metaward_award` ON (`metaward_achiever`.`award_id` = `metaward_award`.`id`) WHERE `metaward_award`.`owner_id` = 8;
+----+-------------+-------------------+--------+-------------------------------------------------------+----------------------------+---------+---------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+--------+-------------------------------------------------------+----------------------------+---------+---------------------------------+------+-------------+
| 1 | SIMPLE | metaward_award | ref | PRIMARY,metaward_award_owner_id | metaward_award_owner_id | 4 | const | 1552 | |
| 1 | SIMPLE | metaward_achiever | ref | metaward_achiever_award_id,metaward_achiever_alias_id | metaward_achiever_award_id | 4 | paul.metaward_award.id | 2498 | |
| 1 | SIMPLE | metaward_alias | eq_ref | PRIMARY | PRIMARY | 4 | paul.metaward_achiever.alias_id | 1 | Using index |
+----+-------------+-------------------+--------+-------------------------------------------------------+----------------------------+---------+---------------------------------+------+-------------+
3 rows in set (0.00 sec)
但实际运行查询大约需要 10 分钟,而且我使用的是 MyISAM,因此表在这段时间内被完全锁定
I have a 3 large tables (10k, 10k, and 100M rows) and am trying to do a simple count on a join of them, where all the joined columns are indexed. Why does the COUNT(*) take so long, and how can I speed it up (without triggers and a running summary)?
mysql> describe SELECT COUNT(*) FROM `metaward_alias` INNER JOIN `metaward_achiever` ON (`metaward_alias`.`id` = `metaward_achiever`.`alias_id`) INNER JOIN `metaward_award` ON (`metaward_achiever`.`award_id` = `metaward_award`.`id`) WHERE `metaward_award`.`owner_id` = 8;
+----+-------------+-------------------+--------+-------------------------------------------------------+----------------------------+---------+---------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+--------+-------------------------------------------------------+----------------------------+---------+---------------------------------+------+-------------+
| 1 | SIMPLE | metaward_award | ref | PRIMARY,metaward_award_owner_id | metaward_award_owner_id | 4 | const | 1552 | |
| 1 | SIMPLE | metaward_achiever | ref | metaward_achiever_award_id,metaward_achiever_alias_id | metaward_achiever_award_id | 4 | paul.metaward_award.id | 2498 | |
| 1 | SIMPLE | metaward_alias | eq_ref | PRIMARY | PRIMARY | 4 | paul.metaward_achiever.alias_id | 1 | Using index |
+----+-------------+-------------------+--------+-------------------------------------------------------+----------------------------+---------+---------------------------------+------+-------------+
3 rows in set (0.00 sec)
But actually running the query takes about 10 minutes, and I'm on MyISAM so the tables are fully locked down for that duration
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我猜原因是您对三个表进行了巨大的联接(如果不首先应用 where 子句,结果将是 10k * 10k * 100M = 1016 行)。尝试重新排序连接(例如从
metaward_award
开始,然后仅连接metaward_achiever
看看需要多长时间,然后尝试插入metaward_alias
,可能使用子查询强制您首选的评估顺序)。如果这没有帮助,您可能必须对数据进行非规范化,例如通过存储特定
metaward_achiever
的别名数量。然后你就可以完全摆脱一个连接。也许您甚至可以缓存metaward_award
的总和,具体取决于数据更新的方式和频率。其他可能有帮助的事情是将所有数据库内容放入 RAM :-)
I guess the reason is that you do a huge join over three tables (without applying where clause first, the result would be 10k * 10k * 100M = 1016 rows). Try to reorder joins (for example start with
metaward_award
, then join onlymetaward_achiever
see how long that takes, then try to plugmetaward_alias
, possibly using subquery to force your preferred evaluation order).If that does not help you might have to denormalize your data, for example by storing number of aliases for particular
metaward_achiever
. Then you'd get rid of one join altogether. Maybe you can even cache the sums formetaward_award
, depending on how and how often is your data updated.Other thing that might help is getting all your database content into RAM :-)
确保你有索引:
我相信很多人也会建议依赖特定的列,但在 MySql 中这对你的查询没有任何影响。
更新:
您还可以尝试在主表而不是连接表之一上设置条件。这会给你相同的结果,但它可能会更快(我不知道 MySql 有多聪明):
Make sure you have indexes on:
I'm sure many people will also suggest to count on a specific column, but in MySql this doesn't make any difference for your query.
UPDATE:
You could also try to set the condition on the main table instead of one of the joined tables. That would give you the same result, but it could be faster (I don't know how clever MySql is):
10 分钟对于该查询来说太长了。我想你的密钥缓存一定很小。您可以通过以下方式获取其大小(以字节为单位):
首先,您应该运行 分析表或优化表 。他们会对您的索引进行排序,并可以稍微提高性能。
您还应该看看是否可以为您的列使用更紧凑的类型。例如,如果您不打算拥有超过 1600 万个所有者或奖项或别名,则可以将 INT 列更改为 MEDIUMINT(当然是 UNSIGNED)。在某些情况下甚至可能是 SMALLINT?这将减少您的索引占用空间,并且您将在缓存中容纳更多索引。
10 minutes is way too long for that query. I think you must have a really small key cache. You can get its size in bytes with:
First off, you should run ANALYZE TABLE or OPTIMIZE TABLE. They'll sort your index and can slightly improve the performance.
You should also see if you can use more compact types for your columns. For instance, if you're not going to have more than 16 millions owners or awards or aliases, you can change your INT columns into MEDIUMINT (UNSIGNED, of course). Perhaps even SMALLINT in some cases? That will reduce your index footprint and you'll fit more of it in the cache.