如何优化这个简单的 JOIN+ORDER BY 查询？

发布于 2024-12-12 00:55:08 字数 2119 浏览 0 评论 0原文

我有两个 mysql 表：

/* Table users */
CREATE TABLE IF NOT EXISTS `users` (
  `Id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `DateRegistered` datetime NOT NULL,
  PRIMARY KEY (`Id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

/* Table statistics_user */
CREATE TABLE IF NOT EXISTS `statistics_user` (
  `UserId` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `Sent_Views` int(10) unsigned NOT NULL DEFAULT '0',
  `Sent_Winks` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`UserId`),
  CONSTRAINT `statistics_user_ibfk_1` FOREIGN KEY (`UserId`) REFERENCES `users` (`Id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

两个表都填充了 10.000 个随机行，以便使用以下过程进行测试：

DELIMITER //
CREATE DEFINER=`root`@`localhost` PROCEDURE `FillUsersStatistics`(IN `cnt` INT)
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE dt DATE;
DECLARE Winks INT DEFAULT 1;
DECLARE Views INT DEFAULT 1;

WHILE (i<=cnt) DO
        SET dt = str_to_date(concat(floor(1 + rand() * (9-1)),'-',floor(1 + rand() * (28 -1)),'-','2011'),'%m-%d-%Y');

        INSERT INTO users (Id, DateRegistered) VALUES(i, dt);

        SET Winks = floor(1 + rand() * (30-1));
        SET Views = floor(1 + rand() * (30-1));
        INSERT INTO statistics_user (UserId, Sent_Winks, Sent_Views) VALUES (i, Winks, Views);

     SET i=i+1;
END WHILE;

END//
DELIMITER ;
CALL `FillUsersStatistics`(10000);

问题：

查询运行 EXPLAIN 时：

SELECT
t1.Id, (Sent_Views + Sent_Winks) / DATEDIFF(NOW(), t1.DateRegistered) as Score
FROM users t1
JOIN  statistics_user t2 ON t2.UserId = t1.Id
ORDER BY Score DESC

.. 我得到这个解释：

Id  select_type table   type    possible_keys   key     key_len     ref             rows    extra
1   SIMPLE      t1      ALL     PRIMARY         (NULL)  (NULL)      (NULL)          10037   Using temporary; Using filesort
1   SIMPLE      t2      eq_ref  PRIMARY         PRIMARY 4           test2.t2.UserId 1

当我对此当两个表的行数超过 500K 时，查询会变得非常慢。我猜这是因为“使用临时；”在查询的解释中使用 filesort'。

如何优化上述查询以使其运行得更快？

原文

I have two mysql tables:

/* Table users */
CREATE TABLE IF NOT EXISTS `users` (
  `Id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `DateRegistered` datetime NOT NULL,
  PRIMARY KEY (`Id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

/* Table statistics_user */
CREATE TABLE IF NOT EXISTS `statistics_user` (
  `UserId` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `Sent_Views` int(10) unsigned NOT NULL DEFAULT '0',
  `Sent_Winks` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`UserId`),
  CONSTRAINT `statistics_user_ibfk_1` FOREIGN KEY (`UserId`) REFERENCES `users` (`Id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Both tables are populated with 10.000 random rows for testing by using the following procedure:

DELIMITER //
CREATE DEFINER=`root`@`localhost` PROCEDURE `FillUsersStatistics`(IN `cnt` INT)
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE dt DATE;
DECLARE Winks INT DEFAULT 1;
DECLARE Views INT DEFAULT 1;

WHILE (i<=cnt) DO
        SET dt = str_to_date(concat(floor(1 + rand() * (9-1)),'-',floor(1 + rand() * (28 -1)),'-','2011'),'%m-%d-%Y');

        INSERT INTO users (Id, DateRegistered) VALUES(i, dt);

        SET Winks = floor(1 + rand() * (30-1));
        SET Views = floor(1 + rand() * (30-1));
        INSERT INTO statistics_user (UserId, Sent_Winks, Sent_Views) VALUES (i, Winks, Views);

     SET i=i+1;
END WHILE;

END//
DELIMITER ;
CALL `FillUsersStatistics`(10000);

The problem:

When I run the EXPLAIN for this query:

SELECT
t1.Id, (Sent_Views + Sent_Winks) / DATEDIFF(NOW(), t1.DateRegistered) as Score
FROM users t1
JOIN  statistics_user t2 ON t2.UserId = t1.Id
ORDER BY Score DESC

.. I get this explain:

Id  select_type table   type    possible_keys   key     key_len     ref             rows    extra
1   SIMPLE      t1      ALL     PRIMARY         (NULL)  (NULL)      (NULL)          10037   Using temporary; Using filesort
1   SIMPLE      t2      eq_ref  PRIMARY         PRIMARY 4           test2.t2.UserId 1

The above query gets very slow when both tables have more than 500K rows. I guess it's because of the 'Using temporary; Using filesort' in the explain of the query.

How can the above query be optimized so that it runs faster?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

三生池水覆流年 2024-12-19 00:55:08

我无法确定 ORDER BY 是什么让你丧命，因为它无法正确索引。这是一个可行的解决方案，即使不是特别漂亮。

首先，假设您有一个名为 Score 的列，用于存储用户的当前分数。每次用户的 Sent_Views 或 Sent_Winks 更改时，请修改 Score 列以进行匹配。这可能可以通过触发器来完成（我对触发器的经验有限），或者肯定使用更新 Sent_Views 和 Sent_Winks 字段的相同代码来完成。此更改不需要知道 DATEDIFF 部分，因为它只需除以 Sent_Views + Sent_Winks 的旧总和并乘以新总和即可。

现在，您只需每天更改一次 Score 列（如果您对用户注册的精确小时数不挑剔）。这可以通过 cron 作业运行的脚本来完成。

然后，只需索引 Score 列并选择即可！

注意：已编辑以删除不正确的首次尝试。

回复收藏 0 原文

强者自强 2024-12-19 00:55:08

我提供我的评论作为答案：

建立一个未来的日期，足够远，不会干扰你的申请，比如 5000 年。在分数计算中用这个未来日期替换当前日期。现在，分数计算对于所有意图和目的都是绝对的，并且可以在更新眨眼和视图时计算（通过存储的程序或 atrigger（mysql 是否有触发器？））。

将 score 列添加到 statistics_user 表中以存储计算的分数并在其上定义索引。

您的 SQL 可以重写为：

SELECT
   UserId, score  
FROM
  statistics_user 
ORDER BY score DESC

如果您需要真实分数，只需使用常量乘法即可轻松计算，如果它干扰 mysql 索引选择，则可以在之后完成。

I'm offering my comment as answer:

Establish a future date, far enough to not interfere with your application, say the year 5000. Replace the current date with this future date in your score calculation. The score computation is now for all intents and purposes absolute, and can be computed when updating winks and views (through a stored rocedure or atrigger (does mysql have triggers?)).

Add a score column to your statistics_user table to store the computed score and define an index on it.

Your SQL can be rewritten as:

SELECT
   UserId, score  
FROM
  statistics_user 
ORDER BY score DESC

If you need the real score, it is easily computed with just a constant multiplication which could be done afterwards if it interferse with mysql index selection.

回复收藏 0 原文