MySQL 查询速度慢
嘿,我的 MySQL 查询非常慢。我确信我需要做的就是添加正确的索引,但我尝试的所有操作都不起作用。
查询是:
SELECT DATE(DateTime) as 'SpeedDate', avg(LoadTime) as 'LoadTime'
FROM SpeedMonitor
GROUP BY Date(DateTime);
查询的解释是:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE SpeedMonitor ALL 7259978 Using temporary; Using filesort
表结构是:
CREATE TABLE `SpeedMonitor` (
`SMID` int(10) unsigned NOT NULL auto_increment,
`DateTime` datetime NOT NULL,
`LoadTime` double unsigned NOT NULL,
PRIMARY KEY (`SMID`)
) ENGINE=InnoDB AUTO_INCREMENT=7258294 DEFAULT CHARSET=latin1;
任何帮助将不胜感激。
Hey I have a very slow MySQL query. I'm sure all I need to do is add the correct index but all the things I try don't work.
The query is:
SELECT DATE(DateTime) as 'SpeedDate', avg(LoadTime) as 'LoadTime'
FROM SpeedMonitor
GROUP BY Date(DateTime);
The Explain for the query is:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE SpeedMonitor ALL 7259978 Using temporary; Using filesort
And the table structure is:
CREATE TABLE `SpeedMonitor` (
`SMID` int(10) unsigned NOT NULL auto_increment,
`DateTime` datetime NOT NULL,
`LoadTime` double unsigned NOT NULL,
PRIMARY KEY (`SMID`)
) ENGINE=InnoDB AUTO_INCREMENT=7258294 DEFAULT CHARSET=latin1;
Any help would be greatly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您只需要在查询中输入两列,因此索引可以/应该放在那里:
加快查询速度的另一种方法可以将 DateTime 字段分为两部分:日期和时间。
这样 db 就可以直接对日期字段进行分组,而不是计算 DATE(...)。
已编辑:
如果您更喜欢使用触发器,请创建一个新列(DATE)并将其命名为newdate,然后尝试使用此操作(我现在无法尝试查看它是否正确):
再次编辑:
我刚刚创建了一个数据库,其中包含相同的表 speedmonitor,其中包含大约 900,000 条记录。
然后我运行查询 S
ELECT newdate,AVG(LoadTime) loadtime FROM speedmonitor GROUP BY newdate
大约花费了 100 秒!!删除 newdate 字段上的索引(并使用
RESET QUERY CACHE
和FLUSH TABLES
清除缓存),相同的查询花费了 0.6 秒!!!仅用于比较:查询
SELECT DATE(DateTime),AVG(LoadTime) loadtime FROM speedmonitor GROUP BY DATE(DateTime)
花费了 0.9 秒。所以我认为 newdate 上的索引不好:将其删除。
我现在将添加尽可能多的记录并再次测试两个查询。
最终编辑:
删除 newdate 和 DateTime 列上的索引,在 speedmonitor 表上有 800 万条记录,结果如下:
我认为这是一个很好的加速。
在 mysql 命令提示符下执行查询需要时间。
You're just asking for two columns in your query, so indexes could/should go there:
Another way to speed your query up could be split DateTime field in two: date and time.
This way db can group directly on date field instead of calculating DATE(...).
EDITED:
If you prefer using a trigger, create a new column(DATE) and call it newdate, and try with this (I can't try it now to see if it's correct):
EDITED AGAIN:
I've just created a db with the same table speedmonitor filled with about 900,000 records.
Then I run the query S
ELECT newdate,AVG(LoadTime) loadtime FROM speedmonitor GROUP BY newdate
and it took about 100s!!Removing index on newdate field (and clearing cache using
RESET QUERY CACHE
andFLUSH TABLES
), the same query took 0.6s!!!Just for comparison: query
SELECT DATE(DateTime),AVG(LoadTime) loadtime FROM speedmonitor GROUP BY DATE(DateTime)
took 0.9s.So I suppose that the index on newdate is not good: remove it.
I'm going to add as many records as I can now and test two queries again.
FINAL EDIT:
Removing indexes on newdate and DateTime columns, having 8mln records on speedmonitor table, here are results:
I think it's a good speedup.
Time is taken executing query inside mysql command prompt.
问题是您在
GROUP BY
子句中使用了一个函数,因此 MySQL 必须先对每条记录计算表达式Date(DateTime)
,然后才能对结果进行分组。我建议为Date(DateTime)
添加一个计算字段,然后您可以对其进行索引并查看是否有助于您的性能。The problem is that you're using a function in your
GROUP BY
clause, so MySQL has to evaluate the expressionDate(DateTime)
on every record before it can group the results. I'd suggest adding a calculated field forDate(DateTime)
, which you could then index and see if that helps your performance.我希望您允许我指出,在将包含数百万条记录的表投入生产之前,您应该认真考虑如何使用这些数据并进行相应的计划。
现在发生的情况是您的查询无法使用任何索引,因此会扫描整个表来构建响应。这不是处理相对较大的表的最快方法。
如果您想达到更好的状态,您需要考虑一些事项:
如果最后一个问题的答案是“否”,您始终可以创建一个新表/解决方案并开始在那里写入记录...如果/根据需要导入旧数据。
报告粒度非常重要,例如,您可以将一天的数据压缩为 24 条记录。将当天加载到索引免费加载表中,然后在第二天将其处理为每小时平均值。根据样本日期命名每个加载表,您可以删除处理后的旧表。
当然,每小时可能粒度不够细。
根据您的保留需求,您可能需要考虑某种类型的分区存储。这可以让您查询示例数据的子集,并在旧分区不再具有足够的相关性时简单地删除或归档旧分区。
无论如何,您似乎处于拥有某种类型的大规模采样、报告和/或监控系统的边缘(特别是如果您正在报告具有不同特征的各种网站或页面)。您可能需要花一些精力来设计它,以便它满足您的需求......;)
I hope you'll permit me to point out that before you put a table into production with millions of records you should seriously consider how that data is going to be used and plan accordingly.
What is happening right now is that your query cannot use any indexes and hence scans the entire table building a response. Not the fastest way to work with relatively large tables.
You have some things to consider if you want to get to a better state:
If the answer is "No" to the last question you could always create a new table/solution and start writing records there... importing in old data if/as needed.
Reporting granularity is important as you could, for example, compress a day's worth of data into 24 records. Load the current day into an index free loading table and then process it the next day into per hour averages. Name each loading table based on the sample date and you can delete old tables as processed.
Of course, hourly may not be fine grained enough.
Depending on your retention needs you might want to consider some type of partitioned storage. This can let you query against subsets of sample data and simply drop or archive old partitions when they are no long current enough to be relevant.
Anyhow, you seem to be on the edge of having some type of massive sampling, reporting and/or monitoring system (particularly if you were reporting on a variety of sites or pages with different characteristics). You may want to put some effort into designing this so it will fit your needs... ;)