简单查询需要15-30秒

发布于 2024-08-12 06:04:24 字数 3006 浏览 3 评论 0原文

下面的查询非常简单。它从消息表中选择最后 20 条记录以用于寻呼场景。第一次运行此查询需要 15 到 30 秒。随后的运行只需不到一秒(我预计涉及一些缓存)。我试图确定为什么第一次需要这么长时间。

这是查询:

SELECT DISTINCT ID,List,`From`,Subject, UNIX_TIMESTAMP(MsgDate) AS FmtDate
FROM messages
WHERE List='general'
ORDER BY MsgDate
LIMIT 17290,20;

MySQL 版本:4.0.26-log

这是表格:

messages  CREATE TABLE `messages` (
  `ID` int(10) unsigned NOT NULL auto_increment,
  `List` varchar(10) NOT NULL default '',
  `MessageId` varchar(128) NOT NULL default '',
  `From` varchar(128) NOT NULL default '',
  `Subject` varchar(128) NOT NULL default '',
  `MsgDate` datetime NOT NULL default '0000-00-00 00:00:00',
  `TextBody` longtext NOT NULL,
  `HtmlBody` longtext NOT NULL,
  `Headers` text NOT NULL,
  `UserID` int(10) unsigned default NULL,
  PRIMARY KEY  (`ID`),
  UNIQUE KEY `List` (`List`,`MsgDate`,`MessageId`),
  KEY `From` (`From`),
  KEY `UserID` (`UserID`,`List`,`MsgDate`),
  KEY `MsgDate` (`MsgDate`),
  KEY `ListOnly` (`List`)
) TYPE=MyISAM ROW_FORMAT=DYNAMIC

这是解释:

table   type    possible_keys  key       key_len  ref       rows  Extra
------  ------  -------------  --------  -------  ------  ------  --------------------------------------------
m       ref     List,ListOnly  ListOnly  10       const    18002  Using where; Using temporary; Using filesort

当我在所有相关列上都有索引时,为什么要使用文件排序?我添加了 ListOnly 索引只是为了看看它是否有帮助。我原本以为 List 索引会同时处理列表选择和 MsgDate 上的排序,但事实并非如此。现在我添加了 ListOnly 索引,这就是它使用的索引,但它仍然对 MsgDate 进行文件排序,我怀疑这会花费很长时间。

我尝试使用 FORCE INDEX 如下:

SELECT DISTINCT ID,List,`From`,Subject, UNIX_TIMESTAMP(MsgDate) AS FmtDate
FROM messages
FORCE INDEX (List)
WHERE List='general'
ORDER BY MsgDate
LIMIT 17290,20;

这似乎确实强制 MySQL 使用索引,但它根本不会加快查询速度。

以下是此查询的解释:

table   type    possible_keys  key     key_len  ref       rows  Extra                       
------  ------  -------------  ------  -------  ------  ------  ----------------------------
m       ref     List           List    10       const    18002  Using where; Using temporary

更新:

我从查询中删除了 DISTINCT。它对性能根本没有帮助。

我删除了 UNIX_TIMESTAMP 调用。它也没有影响性能。

我在 PHP 代码中做了一个特例,这样如果我检测到用户正在查看结果的最后一页,我会添加一个仅返回最后 7 天结果的 WHERE 子句:

SELECT m.ID,List,From,Subject,MsgDate
FROM messages
WHERE MsgDate>='2009-11-15'
ORDER BY MsgDate DESC
LIMIT 20

这要快得多。但是,当我导航到另一页结果时,它必须使用旧的 SQL,并且需要很长时间才能执行。我想不出一种实用、现实的方法来对所有页面执行此操作。另外,执行这种特殊情况会使我的 PHP 代码更加复杂。

奇怪的是,只有第一次运行原始查询需要很长时间。后续运行同一查询或显示不同结果页的查询(即仅 LIMIT 子句更改)的速度非常快。如果查询在大约 5 分钟内未运行,则会再次变慢。

解决方案:

我想出的最佳解决方案是基于 Jason Orendorff 和 Juliet 的想法。

首先,我确定当前页面是否更接近总页数的开头或结尾。如果接近结尾,我使用 ORDER BY MsgDate DESC,应用适当的限制,然后反转返回记录的顺序。

这使得检索靠近结果集开头或结尾的页面速度更快(第一次现在需要 4-5 秒,而不是 15-30 秒)。如果用户想要导航到靠近中间的页面(当前在第 430 页左右),那么速度可能会回落。但这是一种罕见的情况。

因此,虽然似乎没有完美的解决方案,但这比大多数情况下要好得多。

谢谢你们,杰森和朱丽叶。

The following query is pretty simple. It selects the last 20 records from a messages table for use in a paging scenario. The first time this query is run, it takes from 15 to 30 seconds. Subsequent runs take less than a second (I expect some caching is involved). I am trying to determine why the first time takes so long.

Here's the query:

SELECT DISTINCT ID,List,`From`,Subject, UNIX_TIMESTAMP(MsgDate) AS FmtDate
FROM messages
WHERE List='general'
ORDER BY MsgDate
LIMIT 17290,20;

MySQL version: 4.0.26-log

Here's the table:

messages  CREATE TABLE `messages` (
  `ID` int(10) unsigned NOT NULL auto_increment,
  `List` varchar(10) NOT NULL default '',
  `MessageId` varchar(128) NOT NULL default '',
  `From` varchar(128) NOT NULL default '',
  `Subject` varchar(128) NOT NULL default '',
  `MsgDate` datetime NOT NULL default '0000-00-00 00:00:00',
  `TextBody` longtext NOT NULL,
  `HtmlBody` longtext NOT NULL,
  `Headers` text NOT NULL,
  `UserID` int(10) unsigned default NULL,
  PRIMARY KEY  (`ID`),
  UNIQUE KEY `List` (`List`,`MsgDate`,`MessageId`),
  KEY `From` (`From`),
  KEY `UserID` (`UserID`,`List`,`MsgDate`),
  KEY `MsgDate` (`MsgDate`),
  KEY `ListOnly` (`List`)
) TYPE=MyISAM ROW_FORMAT=DYNAMIC

Here's the explain:

table   type    possible_keys  key       key_len  ref       rows  Extra
------  ------  -------------  --------  -------  ------  ------  --------------------------------------------
m       ref     List,ListOnly  ListOnly  10       const    18002  Using where; Using temporary; Using filesort

Why is it using a filesort when I have indexes on all the relevant columns? I added the ListOnly index just to see if it would help. I had originally thought that the List index would handle both the list selection and the sorting on MsgDate, but it didn't. Now that I added the ListOnly index, that's the one it uses, but it still does a filesort on MsgDate, which is what I suspect is taking so long.

I tried using FORCE INDEX as follows:

SELECT DISTINCT ID,List,`From`,Subject, UNIX_TIMESTAMP(MsgDate) AS FmtDate
FROM messages
FORCE INDEX (List)
WHERE List='general'
ORDER BY MsgDate
LIMIT 17290,20;

This does seem to force MySQL to use the index, but it doesn't speed up the query at all.

Here's the explain for this query:

table   type    possible_keys  key     key_len  ref       rows  Extra                       
------  ------  -------------  ------  -------  ------  ------  ----------------------------
m       ref     List           List    10       const    18002  Using where; Using temporary

UPDATES:

I removed DISTINCT from the query. It didn't help performance at all.

I removed the UNIX_TIMESTAMP call. It also didn't affect performance.

I made a special case in my PHP code so that if I detect the user is looking at the last page of results, I add a WHERE clause that returns only the last 7 days of results:

SELECT m.ID,List,From,Subject,MsgDate
FROM messages
WHERE MsgDate>='2009-11-15'
ORDER BY MsgDate DESC
LIMIT 20

This is a lot faster. However, as soon as I navigate to another page of results, it must use the old SQL and takes a very long time to execute. I can't think of a practical, realistic way to do this for all pages. Also, doing this special case makes my PHP code more complex.

Strangely, only the first time the original query is run takes a long time. Subsequent runs of either the same query or a query showing a different page of results (i.e., only the LIMIT clause changes) are very fast. The query slows down again if it has not been run for about 5 minutes.

SOLUTION:

The best solution I came up with is based on Jason Orendorff and Juliet's idea.

First, I determine if the current page is closer to the beginning or end of the total number of pages. If it's closer to the end, I use ORDER BY MsgDate DESC, apply an appropriate limit, then reverse the order of the returned records.

This makes retrieving pages close to the beginning or end of the resultset much faster (first time now takes 4-5 seconds instead of 15-30). If the user wants to navigate to a page near the middle (currently around the 430th page), then the speed might drop back down. But that would be a rare case.

So while there seems to be no perfect solution, this is much better than it was for most cases.

Thank you, Jason and Juliet.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

掀纱窥君容 2024-08-19 06:04:24

尝试使用 ORDER BY MsgDate LIMIT 20,而不是 ORDER BY MsgDate LIMIT 17290,20

当然,结果会以相反的顺序出现,但这应该很容易处理。

编辑:您的MessageId值是否总是随着时间的推移而增加?它们是独一无二的吗?

如果是这样,我会创建一个索引:

UNIQUE KEY `ListMsgId` ( `List`, `MessageId` )

并在可能的情况下根据消息 ID 而不是日期进行查询。

-- Most recent messages (in reverse order)
SELECT * FROM messages
WHERE List = 'general'
ORDER BY MessageId DESC
LIMIT 20

-- Previous page (in reverse order)
SELECT * FROM messages
WHERE List = 'general' AND MessageId < '15885830'
ORDER BY MessageId DESC
LIMIT 20

-- Next page
SELECT * FROM messages
WHERE List = 'general' AND MessageId > '15885829'
ORDER BY MessageId
LIMIT 20

我认为您还为拥有 varchar 列付出了代价,其中 int 类型会快得多。例如,List 可以改为指向单独表中的条目的 ListId。您可能想在测试数据库中尝试一下,看看这是否属实;我不是 MySQL 专家。

Instead of ORDER BY MsgDate LIMIT 17290,20, try ORDER BY MsgDate DESC LIMIT 20.

Of course the results will come out in the reverse order, but that should be easy to deal with.

EDIT: Do your MessageId values always increase with time? Are they unique?

If so, I would make an index:

UNIQUE KEY `ListMsgId` ( `List`, `MessageId` )

and query based on the message ids rather than the date when possible.

-- Most recent messages (in reverse order)
SELECT * FROM messages
WHERE List = 'general'
ORDER BY MessageId DESC
LIMIT 20

-- Previous page (in reverse order)
SELECT * FROM messages
WHERE List = 'general' AND MessageId < '15885830'
ORDER BY MessageId DESC
LIMIT 20

-- Next page
SELECT * FROM messages
WHERE List = 'general' AND MessageId > '15885829'
ORDER BY MessageId
LIMIT 20

I think you're also paying for having varchar columns where an int type would be a lot faster. For example, List could instead be a ListId that points to an entry in a separate table. You might want to try it out in a test database to see if that's really true; I'm not a MySQL expert.

Oo萌小芽oO 2024-08-19 06:04:24

您可以删除 ListOnly 键。复合索引List已经包含了其中的所有信息。

您对 List 索引查询的解释看起来好多了,缺少文件排序。通过按照 Jason 的建议交换 ORDER,您也许能够获得更好的实际性能,并且可能会丢失 UNIX_TIMESTAMP 调用(您可以在应用程序层中执行此操作,或者仅使用在模式中存储为 INTEGER 的 Unix 时间戳)。

You can drop the ListOnly key. The compound index List already contains all the information in it.

Your EXPLAIN for the List-indexed query looks much better, lacking the filesort. You may be able to get better real performance out of it by swapping the ORDER as suggested by Jason, and maybe losing the UNIX_TIMESTAMP call (you can do that in the application layer, or just use Unix timestamps stored as INTEGER in the schema).

街角卖回忆 2024-08-19 06:04:24

您使用的 SQL 版本是什么?一些旧版本使用 LIMIT 子句作为后处理过滤器(意味着获取从服务器请求的所有记录,但只显示您请求返回的 20 条记录)。

您可以从您的解释中看到,尽管您只显示了其中的 20 行,但返回了 18002 行。有什么方法可以调整您的选择标准来识别您想要返回的 20 行,而不是返回 18000 行并只显示其中的 20 行???

What version of my SQL are you using? Some of the older versions used the LIMIT clause as a post-process filter (meaning get all the record requested from the server, but only display the 20 you requested back).

You can see from your explain, 18002 rows are coming back, even though you are only showing 20 of them. Is there any way to adjust your selection criteria to identify the 20 rows you want to return, rather than getting 18000 rows back and only showing 20 of them???

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文