MySQL 查询速度慢

发布于 2024-10-31 18:21:47 字数 711 浏览 3 评论 0原文

嘿,我的 MySQL 查询非常慢。我确信我需要做的就是添加正确的索引,但我尝试的所有操作都不起作用。

查询是:

SELECT DATE(DateTime) as 'SpeedDate', avg(LoadTime) as 'LoadTime'
FROM SpeedMonitor
GROUP BY Date(DateTime);

查询的解释是:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  SpeedMonitor    ALL                 7259978 Using temporary; Using filesort

表结构是:

CREATE TABLE `SpeedMonitor` (
  `SMID` int(10) unsigned NOT NULL auto_increment,
  `DateTime` datetime NOT NULL,
  `LoadTime` double unsigned NOT NULL,
  PRIMARY KEY  (`SMID`)
) ENGINE=InnoDB AUTO_INCREMENT=7258294 DEFAULT CHARSET=latin1;

任何帮助将不胜感激。

Hey I have a very slow MySQL query. I'm sure all I need to do is add the correct index but all the things I try don't work.

The query is:

SELECT DATE(DateTime) as 'SpeedDate', avg(LoadTime) as 'LoadTime'
FROM SpeedMonitor
GROUP BY Date(DateTime);

The Explain for the query is:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  SpeedMonitor    ALL                 7259978 Using temporary; Using filesort

And the table structure is:

CREATE TABLE `SpeedMonitor` (
  `SMID` int(10) unsigned NOT NULL auto_increment,
  `DateTime` datetime NOT NULL,
  `LoadTime` double unsigned NOT NULL,
  PRIMARY KEY  (`SMID`)
) ENGINE=InnoDB AUTO_INCREMENT=7258294 DEFAULT CHARSET=latin1;

Any help would be greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

拧巴小姐 2024-11-07 18:21:47

您只需要在查询中输入两列,因此索引可以/应该放在那里:

  • DateTime
  • LoadTime

加快查询速度的另一种方法可以将 DateTime 字段分为两部分:日期和时间。
这样 db 就可以直接对日期字段进行分组,而不是计算 DATE(...)。

已编辑:
如果您更喜欢使用触发器,请创建一个新列(DATE)并将其命名为newdate,然后尝试使用此操作(我现在无法尝试查看它是否正确):

CREATE TRIGGER upd_check BEFORE INSERT ON SpeedMonitor
FOR EACH ROW
BEGIN
  SET NEW.newdate=DATE(NEW.DateTime);
END

再次编辑:
我刚刚创建了一个数据库,其中包含相同的表 speedmonitor,其中包含大约 900,000 条记录。
然后我运行查询 SELECT newdate,AVG(LoadTime) loadtime FROM speedmonitor GROUP BY newdate 大约花费了 100 秒!!
删除 newdate 字段上的索引(并使用 RESET QUERY CACHEFLUSH TABLES 清除缓存),相同的查询花费了 0.6 秒!!!
仅用于比较:查询 SELECT DATE(DateTime),AVG(LoadTime) loadtime FROM speedmonitor GROUP BY DATE(DateTime) 花费了 0.9 秒。
所以我认为 newdate 上的索引不好:将其删除。
我现在将添加尽可能多的记录并再次测试两个查询。

最终编辑:
删除 newdate 和 DateTime 列上的索引,在 speedmonitor 表上有 800 万条记录,结果如下:

  • 在 newdate 列上选择和分组:7.5s
  • 在 DATE(DateTime) 上选择和分组字段:13.7s

我认为这是一个很好的加速。
在 mysql 命令提示符下执行查询需要时间。

You're just asking for two columns in your query, so indexes could/should go there:

  • DateTime
  • LoadTime

Another way to speed your query up could be split DateTime field in two: date and time.
This way db can group directly on date field instead of calculating DATE(...).

EDITED:
If you prefer using a trigger, create a new column(DATE) and call it newdate, and try with this (I can't try it now to see if it's correct):

CREATE TRIGGER upd_check BEFORE INSERT ON SpeedMonitor
FOR EACH ROW
BEGIN
  SET NEW.newdate=DATE(NEW.DateTime);
END

EDITED AGAIN:
I've just created a db with the same table speedmonitor filled with about 900,000 records.
Then I run the query SELECT newdate,AVG(LoadTime) loadtime FROM speedmonitor GROUP BY newdate and it took about 100s!!
Removing index on newdate field (and clearing cache using RESET QUERY CACHE and FLUSH TABLES), the same query took 0.6s!!!
Just for comparison: query SELECT DATE(DateTime),AVG(LoadTime) loadtime FROM speedmonitor GROUP BY DATE(DateTime) took 0.9s.
So I suppose that the index on newdate is not good: remove it.
I'm going to add as many records as I can now and test two queries again.

FINAL EDIT:
Removing indexes on newdate and DateTime columns, having 8mln records on speedmonitor table, here are results:

  • selecting and grouping on newdate column: 7.5s
  • selecting and grouping on DATE(DateTime) field: 13.7s

I think it's a good speedup.
Time is taken executing query inside mysql command prompt.

墨落成白 2024-11-07 18:21:47

问题是您在 GROUP BY 子句中使用了一个函数,因此 MySQL 必须先对每条记录计算表达式 Date(DateTime),然后才能对结果进行分组。我建议为 Date(DateTime) 添加一个计算字段,然后您可以对其进行索引并查看是否有助于您的性能。

The problem is that you're using a function in your GROUP BY clause, so MySQL has to evaluate the expression Date(DateTime) on every record before it can group the results. I'd suggest adding a calculated field for Date(DateTime), which you could then index and see if that helps your performance.

深者入戏 2024-11-07 18:21:47

我希望您允许我指出,在将包含数百万条记录的表投入生产之前,您应该认真考虑如何使用这些数据并进行相应的计划。

现在发生的情况是您的查询无法使用任何索引,因此会扫描整个表来构建响应。这不是处理相对较大的表的最快方法。

如果您想达到更好的状态,您需要考虑一些事项:

  1. 收集数据的速度有多快?
  2. 你需要多少历史?
  3. 您的报告要求有多细化?
  4. 您可以暂停日志记录以进行表更改吗?

如果最后一个问题的答案是“否”,您始终可以创建一个新表/解决方案并开始在那里写入记录...如果/根据需要导入旧数据。

报告粒度非常重要,例如,您可以将一天的数据压缩为 24 条记录。将当天加载到索引免费加载表中,然后在第二天将其处理为每小时平均值。根据样本日期命名每个加载表,您可以删除处理后的旧表。

当然,每小时可能粒度不够细。

根据您的保留需求,您可能需要考虑某种类型的分区存储。这可以让您查询示例数据的子集,并在旧分区不再具有足够的相关性时简单地删除或归档旧分区。

无论如何,您似乎处于拥有某种类型的大规模采样、报告和/或监控系统的边缘(特别是如果您正在报告具有不同特征的各种网站或页面)。您可能需要花一些精力来设计它,以便它满足您的需求......;)

I hope you'll permit me to point out that before you put a table into production with millions of records you should seriously consider how that data is going to be used and plan accordingly.

What is happening right now is that your query cannot use any indexes and hence scans the entire table building a response. Not the fastest way to work with relatively large tables.

You have some things to consider if you want to get to a better state:

  1. How fast is it collecting data?
  2. How much history do you need?
  3. How granular are your reporting requirements?
  4. Are you able to suspend logging to make table changes?

If the answer is "No" to the last question you could always create a new table/solution and start writing records there... importing in old data if/as needed.

Reporting granularity is important as you could, for example, compress a day's worth of data into 24 records. Load the current day into an index free loading table and then process it the next day into per hour averages. Name each loading table based on the sample date and you can delete old tables as processed.

Of course, hourly may not be fine grained enough.

Depending on your retention needs you might want to consider some type of partitioned storage. This can let you query against subsets of sample data and simply drop or archive old partitions when they are no long current enough to be relevant.

Anyhow, you seem to be on the edge of having some type of massive sampling, reporting and/or monitoring system (particularly if you were reporting on a variety of sites or pages with different characteristics). You may want to put some effort into designing this so it will fit your needs... ;)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文