如何优化这个简单的 JOIN+ORDER BY 查询?
我有两个 mysql 表:
/* Table users */
CREATE TABLE IF NOT EXISTS `users` (
`Id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`DateRegistered` datetime NOT NULL,
PRIMARY KEY (`Id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
/* Table statistics_user */
CREATE TABLE IF NOT EXISTS `statistics_user` (
`UserId` int(10) unsigned NOT NULL AUTO_INCREMENT,
`Sent_Views` int(10) unsigned NOT NULL DEFAULT '0',
`Sent_Winks` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`UserId`),
CONSTRAINT `statistics_user_ibfk_1` FOREIGN KEY (`UserId`) REFERENCES `users` (`Id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
两个表都填充了 10.000 个随机行,以便使用以下过程进行测试:
DELIMITER //
CREATE DEFINER=`root`@`localhost` PROCEDURE `FillUsersStatistics`(IN `cnt` INT)
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE dt DATE;
DECLARE Winks INT DEFAULT 1;
DECLARE Views INT DEFAULT 1;
WHILE (i<=cnt) DO
SET dt = str_to_date(concat(floor(1 + rand() * (9-1)),'-',floor(1 + rand() * (28 -1)),'-','2011'),'%m-%d-%Y');
INSERT INTO users (Id, DateRegistered) VALUES(i, dt);
SET Winks = floor(1 + rand() * (30-1));
SET Views = floor(1 + rand() * (30-1));
INSERT INTO statistics_user (UserId, Sent_Winks, Sent_Views) VALUES (i, Winks, Views);
SET i=i+1;
END WHILE;
END//
DELIMITER ;
CALL `FillUsersStatistics`(10000);
问题:
查询运行 EXPLAIN 时:
SELECT
t1.Id, (Sent_Views + Sent_Winks) / DATEDIFF(NOW(), t1.DateRegistered) as Score
FROM users t1
JOIN statistics_user t2 ON t2.UserId = t1.Id
ORDER BY Score DESC
.. 我得到这个解释:
Id select_type table type possible_keys key key_len ref rows extra
1 SIMPLE t1 ALL PRIMARY (NULL) (NULL) (NULL) 10037 Using temporary; Using filesort
1 SIMPLE t2 eq_ref PRIMARY PRIMARY 4 test2.t2.UserId 1
当我对此 当两个表的行数超过 500K 时,查询会变得非常慢。我猜这是因为“使用临时;”在查询的解释中使用 filesort'。
如何优化上述查询以使其运行得更快?
I have two mysql tables:
/* Table users */
CREATE TABLE IF NOT EXISTS `users` (
`Id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`DateRegistered` datetime NOT NULL,
PRIMARY KEY (`Id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
/* Table statistics_user */
CREATE TABLE IF NOT EXISTS `statistics_user` (
`UserId` int(10) unsigned NOT NULL AUTO_INCREMENT,
`Sent_Views` int(10) unsigned NOT NULL DEFAULT '0',
`Sent_Winks` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`UserId`),
CONSTRAINT `statistics_user_ibfk_1` FOREIGN KEY (`UserId`) REFERENCES `users` (`Id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Both tables are populated with 10.000 random rows for testing by using the following procedure:
DELIMITER //
CREATE DEFINER=`root`@`localhost` PROCEDURE `FillUsersStatistics`(IN `cnt` INT)
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE dt DATE;
DECLARE Winks INT DEFAULT 1;
DECLARE Views INT DEFAULT 1;
WHILE (i<=cnt) DO
SET dt = str_to_date(concat(floor(1 + rand() * (9-1)),'-',floor(1 + rand() * (28 -1)),'-','2011'),'%m-%d-%Y');
INSERT INTO users (Id, DateRegistered) VALUES(i, dt);
SET Winks = floor(1 + rand() * (30-1));
SET Views = floor(1 + rand() * (30-1));
INSERT INTO statistics_user (UserId, Sent_Winks, Sent_Views) VALUES (i, Winks, Views);
SET i=i+1;
END WHILE;
END//
DELIMITER ;
CALL `FillUsersStatistics`(10000);
The problem:
When I run the EXPLAIN for this query:
SELECT
t1.Id, (Sent_Views + Sent_Winks) / DATEDIFF(NOW(), t1.DateRegistered) as Score
FROM users t1
JOIN statistics_user t2 ON t2.UserId = t1.Id
ORDER BY Score DESC
.. I get this explain:
Id select_type table type possible_keys key key_len ref rows extra
1 SIMPLE t1 ALL PRIMARY (NULL) (NULL) (NULL) 10037 Using temporary; Using filesort
1 SIMPLE t2 eq_ref PRIMARY PRIMARY 4 test2.t2.UserId 1
The above query gets very slow when both tables have more than 500K rows. I guess it's because of the 'Using temporary; Using filesort' in the explain of the query.
How can the above query be optimized so that it runs faster?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我无法确定 ORDER BY 是什么让你丧命,因为它无法正确索引。这是一个可行的解决方案,即使不是特别漂亮。
首先,假设您有一个名为
Score
的列,用于存储用户的当前分数。每次用户的Sent_Views
或Sent_Winks
更改时,请修改Score
列以进行匹配。这可能可以通过触发器来完成(我对触发器的经验有限),或者肯定使用更新Sent_Views
和Sent_Winks
字段的相同代码来完成。此更改不需要知道 DATEDIFF 部分,因为它只需除以Sent_Views + Sent_Winks
的旧总和并乘以新总和即可。现在,您只需每天更改一次
Score
列(如果您对用户注册的精确小时数不挑剔)。这可以通过 cron 作业运行的脚本来完成。然后,只需索引
Score
列并选择即可!注意:已编辑以删除不正确的首次尝试。
I'm faily sure that the ORDER BY is what's killing you, since it cannot be properly indexed. Here is a workable, if not particularly pretty, solution.
First, let's say you have a column named
Score
for storing a user's current score. Every time a user'sSent_Views
orSent_Winks
changes, modify theScore
column to match. This could probably be done with a trigger (my experience with triggers is limited), or definitely done in the same code that updates theSent_Views
andSent_Winks
fields. This change wouldn't need to know the DATEDIFF portion, because it could just divide by the old sum ofSent_Views + Sent_Winks
and multiply by the new one.Now you just need to change the
Score
column once per day (if you're not picky about the precise number of hours a user has been registered). This could be done with a script run by a cron job.Then, just index the
Score
column and SELECT away!Note: edited to remove incorrect first attempt.
我提供我的评论作为答案:
建立一个未来的日期,足够远,不会干扰你的申请,比如 5000 年。在分数计算中用这个未来日期替换当前日期。现在,分数计算对于所有意图和目的都是绝对的,并且可以在更新眨眼和视图时计算(通过存储的程序或 atrigger(mysql 是否有触发器?))。
将
score
列添加到statistics_user
表中以存储计算的分数并在其上定义索引。您的 SQL 可以重写为:
如果您需要真实分数,只需使用常量乘法即可轻松计算,如果它干扰 mysql 索引选择,则可以在之后完成。
I'm offering my comment as answer:
Establish a future date, far enough to not interfere with your application, say the year 5000. Replace the current date with this future date in your score calculation. The score computation is now for all intents and purposes absolute, and can be computed when updating winks and views (through a stored rocedure or atrigger (does mysql have triggers?)).
Add a
score
column to yourstatistics_user
table to store the computed score and define an index on it.Your SQL can be rewritten as:
If you need the real score, it is easily computed with just a constant multiplication which could be done afterwards if it interferse with mysql index selection.
您不应该在 Users 中对 DateRegistered 建立索引吗?
Shouldn't you have indexed DateRegistered in Users?
您应该尝试内部联接,而不是笛卡尔积,接下来您可以做的就是根据 date_registered 进行分区。
You should try an inner join, rather than a cartesian product, the next thing you can do is partitioning according to date_registered.