GROUP BY 查询优化
数据库是带有MyISAM引擎的MySQL。
表定义:
CREATE TABLE IF NOT EXISTS matches (
id int(11) NOT NULL AUTO_INCREMENT,
game int(11) NOT NULL,
user int(11) NOT NULL,
opponent int(11) NOT NULL,
tournament int(11) NOT NULL,
score int(11) NOT NULL,
finish tinyint(4) NOT NULL,
PRIMARY KEY ( id ),
KEY game ( game ),
KEY user ( user ),
KEY i_gfu ( game , finish , user )
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=3149047 ;
我已在 (game, finish, user)
上设置了索引,但此 GROUP BY
查询仍需要 0.4 - 0.6 秒才能运行:
SELECT user AS player
, COUNT( id ) AS times
FROM matches
WHERE finish = 1
AND game = 19
GROUP BY user
ORDER BY times DESC
EXPLAIN< /code> 输出:
| id | select_type | table | type | possible_keys | key | key_len |
| 1 | SIMPLE | matches | ref | game,i_gfu | i_gfu | 5 |
| ref | rows | Extra |
| const,const | 155855 | Using where; Using temporary; Using filesort |
有什么方法可以让它更快吗?该表大约有 800K 条记录。
编辑:我将 COUNT(id) 更改为 COUNT(*) ,时间下降到 0.08 - 0.12 秒。我想我在创建索引之前已经尝试过,但之后忘记再次更改它。
在解释输出中,使用索引解释了加速情况:(
| rows | Extra |
| 168029 | Using where; Using index; Using temporary; Using filesort |
附带问题:下降 5 倍是否正常?)
大约有 2000 个用户,所以最终的排序,即使它使用文件排序,这不会损害性能。我尝试不使用 ORDER BY
,但仍然需要几乎相同的时间。
Database is MySQL with MyISAM engine.
Table definition:
CREATE TABLE IF NOT EXISTS matches (
id int(11) NOT NULL AUTO_INCREMENT,
game int(11) NOT NULL,
user int(11) NOT NULL,
opponent int(11) NOT NULL,
tournament int(11) NOT NULL,
score int(11) NOT NULL,
finish tinyint(4) NOT NULL,
PRIMARY KEY ( id ),
KEY game ( game ),
KEY user ( user ),
KEY i_gfu ( game , finish , user )
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=3149047 ;
I have set an index on (game, finish, user)
but this GROUP BY
query still needs 0.4 - 0.6 seconds to run:
SELECT user AS player
, COUNT( id ) AS times
FROM matches
WHERE finish = 1
AND game = 19
GROUP BY user
ORDER BY times DESC
The EXPLAIN
output:
| id | select_type | table | type | possible_keys | key | key_len |
| 1 | SIMPLE | matches | ref | game,i_gfu | i_gfu | 5 |
| ref | rows | Extra |
| const,const | 155855 | Using where; Using temporary; Using filesort |
Is there any way I can make it faster? The table has about 800K records.
EDIT: I changed COUNT(id)
into COUNT(*)
and the time dropped to 0.08 - 0.12 seconds. I think I've tried that before making the index and forgot to change it again after.
In the explain output the Using index explains the speeding up:
| rows | Extra |
| 168029 | Using where; Using index; Using temporary; Using filesort |
(Side question: is this dropping of a factor of 5 normal?)
There are about 2000 users, so the final sorting, even if it uses filesort, it doesn't hurt performance. I tried without ORDER BY
and it still takes almost same time.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
摆脱“游戏”键 - 它与“i_gfu”是多余的。由于 'id' 是唯一的 count(id) 仅返回每个组中的行数,因此您可以摆脱它并将其替换为 count(*)。尝试这种方式并粘贴 EXPLAIN 的输出:
Get rid of 'game' key - it's redundant with 'i_gfu'. As 'id' is unique count(id) just returns number of rows in each group, so you can get rid of that and replace it with count(*). Try it that way and paste output of EXPLAIN:
呃,艰难。尝试重新排序索引:将
user
列放在前面(因此将索引设置为(user, finish, game)
),因为这样会增加 GROUP BY 使用索引的机会。但是,一般来说,如果您将使用的聚合函数限制为 MIN 和 MAX,GROUP BY 只能使用索引(请参见 http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html 和 http://dev.mysql.com/doc/refman/5.5 /en/loose-index-scan.html)。您的订购方式也没有真正帮助。Eh, tough. Try reordering your index: put the
user
column first (so make the index(user, finish, game)
) as that increases the chance the GROUP BY can use the index. However, in general GROUP BY can only use indexes if you limit the aggregate functions used to MIN and MAX (see http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html and http://dev.mysql.com/doc/refman/5.5/en/loose-index-scan.html). Your order by isn't really helping either.此查询的缺点之一是您按聚合排序。这意味着在生成完整结果集之前您无法返回任何行;没有索引可以存在(无论如何对于 mysql myisam)来解决这个问题。
不过,您可以相当轻松地对数据进行非规范化来克服这个问题;例如,您可以添加一个插入/更新触发器以将计数值与索引粘贴在汇总表中,以便您可以立即开始返回行。
One of the shortcomings of this query is that you order by an aggregate. That means that you can't return any rows until the full result set has been generated; no index can exist (for mysql myisam, anyway) to fix that.
You can denormalize your data fairly easily to overcome this, though; You could, for instance, add an insert/update trigger to stick a count value in a summary table, with an index, so that you can start returning rows immediately.
EXPLAIN 验证查询中使用了
(game, finish, user)
索引。对我来说这似乎是最好的索引。会不会是硬件问题?您的系统 RAM 和 CPU 是多少?The EXPLAIN verifies the
(game, finish, user)
index was used in the query. That seems like the best possible index to me. Could it be a hardware issue? What is your system RAM and CPU?我认为大部分时间都花在了从 800k 行中提取 150k 行,更重要的是排序(两次,包括通过读取索引跳过的那一次)。我怀疑你能否比现在更优化它。
I take it that the bulk of the time is spent on extracting and more importantly sorting (twice, including the one skipped by reading the index) 150k rows out of 800k. I doubt you can optimize it much more than it already is.
正如其他人所指出的,您可能已达到调整查询本身的能力的极限。接下来您应该查看服务器中
max_heap_table_size
和tmp_table_size
变量的设置。默认值为 16MB,这对于您的表来说可能太小。As others have noted, you may have reached the limit of your ability to tune the query itself. You should next see what the setting of
max_heap_table_size
andtmp_table_size
variables in your server. The default is 16MB, which may be too small for your table.