如何优化这个简单的 JOIN+ORDER BY 查询?

发布于 2024-12-12 00:55:08 字数 2119 浏览 0 评论 0原文

我有两个 mysql 表:

/* Table users */
CREATE TABLE IF NOT EXISTS `users` (
  `Id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `DateRegistered` datetime NOT NULL,
  PRIMARY KEY (`Id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

/* Table statistics_user */
CREATE TABLE IF NOT EXISTS `statistics_user` (
  `UserId` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `Sent_Views` int(10) unsigned NOT NULL DEFAULT '0',
  `Sent_Winks` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`UserId`),
  CONSTRAINT `statistics_user_ibfk_1` FOREIGN KEY (`UserId`) REFERENCES `users` (`Id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

两个表都填充了 10.000 个随机行,以便使用以下过程进行测试:

DELIMITER //
CREATE DEFINER=`root`@`localhost` PROCEDURE `FillUsersStatistics`(IN `cnt` INT)
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE dt DATE;
DECLARE Winks INT DEFAULT 1;
DECLARE Views INT DEFAULT 1;

WHILE (i<=cnt) DO
        SET dt = str_to_date(concat(floor(1 + rand() * (9-1)),'-',floor(1 + rand() * (28 -1)),'-','2011'),'%m-%d-%Y');

        INSERT INTO users (Id, DateRegistered) VALUES(i, dt);

        SET Winks = floor(1 + rand() * (30-1));
        SET Views = floor(1 + rand() * (30-1));
        INSERT INTO statistics_user (UserId, Sent_Winks, Sent_Views) VALUES (i, Winks, Views);

     SET i=i+1;
END WHILE;

END//
DELIMITER ;
CALL `FillUsersStatistics`(10000);

问题:

查询运行 EXPLAIN 时:

SELECT
t1.Id, (Sent_Views + Sent_Winks) / DATEDIFF(NOW(), t1.DateRegistered) as Score
FROM users t1
JOIN  statistics_user t2 ON t2.UserId = t1.Id
ORDER BY Score DESC

.. 我得到这个解释:

Id  select_type table   type    possible_keys   key     key_len     ref             rows    extra
1   SIMPLE      t1      ALL     PRIMARY         (NULL)  (NULL)      (NULL)          10037   Using temporary; Using filesort
1   SIMPLE      t2      eq_ref  PRIMARY         PRIMARY 4           test2.t2.UserId 1   

当我对此 当两个表的行数超过 500K 时,查询会变得非常慢。我猜这是因为“使用临时;”在查询的解释中使用 filesort'。

如何优化上述查询以使其运行得更快?

I have two mysql tables:

/* Table users */
CREATE TABLE IF NOT EXISTS `users` (
  `Id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `DateRegistered` datetime NOT NULL,
  PRIMARY KEY (`Id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

/* Table statistics_user */
CREATE TABLE IF NOT EXISTS `statistics_user` (
  `UserId` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `Sent_Views` int(10) unsigned NOT NULL DEFAULT '0',
  `Sent_Winks` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`UserId`),
  CONSTRAINT `statistics_user_ibfk_1` FOREIGN KEY (`UserId`) REFERENCES `users` (`Id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Both tables are populated with 10.000 random rows for testing by using the following procedure:

DELIMITER //
CREATE DEFINER=`root`@`localhost` PROCEDURE `FillUsersStatistics`(IN `cnt` INT)
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE dt DATE;
DECLARE Winks INT DEFAULT 1;
DECLARE Views INT DEFAULT 1;

WHILE (i<=cnt) DO
        SET dt = str_to_date(concat(floor(1 + rand() * (9-1)),'-',floor(1 + rand() * (28 -1)),'-','2011'),'%m-%d-%Y');

        INSERT INTO users (Id, DateRegistered) VALUES(i, dt);

        SET Winks = floor(1 + rand() * (30-1));
        SET Views = floor(1 + rand() * (30-1));
        INSERT INTO statistics_user (UserId, Sent_Winks, Sent_Views) VALUES (i, Winks, Views);

     SET i=i+1;
END WHILE;

END//
DELIMITER ;
CALL `FillUsersStatistics`(10000);

The problem:

When I run the EXPLAIN for this query:

SELECT
t1.Id, (Sent_Views + Sent_Winks) / DATEDIFF(NOW(), t1.DateRegistered) as Score
FROM users t1
JOIN  statistics_user t2 ON t2.UserId = t1.Id
ORDER BY Score DESC

.. I get this explain:

Id  select_type table   type    possible_keys   key     key_len     ref             rows    extra
1   SIMPLE      t1      ALL     PRIMARY         (NULL)  (NULL)      (NULL)          10037   Using temporary; Using filesort
1   SIMPLE      t2      eq_ref  PRIMARY         PRIMARY 4           test2.t2.UserId 1   

The above query gets very slow when both tables have more than 500K rows. I guess it's because of the 'Using temporary; Using filesort' in the explain of the query.

How can the above query be optimized so that it runs faster?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

三生池水覆流年 2024-12-19 00:55:08

我无法确定 ORDER BY 是什么让你丧命,因为它无法正确索引。这是一个可行的解决方案,即使不是特别漂亮。

首先,假设您有一个名为 Score 的列,用于存储用户的当前分数。每次用户的 Sent_ViewsSent_Winks 更改时,请修改 Score 列以进行匹配。这可能可以通过触发器来完成(我对触发器的经验有限),或者肯定使用更新 Sent_ViewsSent_Winks 字段的相同代码来完成。此更改不需要知道 DATEDIFF 部分,因为它只需除以 Sent_Views + Sent_Winks 的旧总和并乘以新总和即可。

现在,您只需每天更改一次 Score 列(如果您对用户注册的精确小时数不挑剔)。这可以通过 cron 作业运行的脚本来完成。

然后,只需索引 Score 列并选择即可!

注意:已编辑以删除不正确的首次尝试。

I'm faily sure that the ORDER BY is what's killing you, since it cannot be properly indexed. Here is a workable, if not particularly pretty, solution.

First, let's say you have a column named Score for storing a user's current score. Every time a user's Sent_Views or Sent_Winks changes, modify the Score column to match. This could probably be done with a trigger (my experience with triggers is limited), or definitely done in the same code that updates the Sent_Views and Sent_Winks fields. This change wouldn't need to know the DATEDIFF portion, because it could just divide by the old sum of Sent_Views + Sent_Winks and multiply by the new one.

Now you just need to change the Score column once per day (if you're not picky about the precise number of hours a user has been registered). This could be done with a script run by a cron job.

Then, just index the Score column and SELECT away!

Note: edited to remove incorrect first attempt.

强者自强 2024-12-19 00:55:08

我提供我的评论作为答案:

建立一个未来的日期,足够远,不会干扰你的申请,比如 5000 年。在分数计算中用这个未来日期替换当前日期。现在,分数计算对于所有意图和目的都是绝对的,并且可以在更新眨眼和视图时计算(通过存储的程序或 atrigger(mysql 是否有触发器?))。

score 列添加到 statistics_user 表中以存储计算的分数并在其上定义索引。

您的 SQL 可以重写为:

SELECT
   UserId, score  
FROM
  statistics_user 
ORDER BY score DESC

如果您需要真实分数,只需使用常量乘法即可轻松计算,如果它干扰 mysql 索引选择,则可以在之后完成。

I'm offering my comment as answer:

Establish a future date, far enough to not interfere with your application, say the year 5000. Replace the current date with this future date in your score calculation. The score computation is now for all intents and purposes absolute, and can be computed when updating winks and views (through a stored rocedure or atrigger (does mysql have triggers?)).

Add a score column to your statistics_user table to store the computed score and define an index on it.

Your SQL can be rewritten as:

SELECT
   UserId, score  
FROM
  statistics_user 
ORDER BY score DESC

If you need the real score, it is easily computed with just a constant multiplication which could be done afterwards if it interferse with mysql index selection.

始终不够 2024-12-19 00:55:08

您不应该在 Users 中对 DateRegistered 建立索引吗?

Shouldn't you have indexed DateRegistered in Users?

瞄了个咪的 2024-12-19 00:55:08

您应该尝试内部联接,而不是笛卡尔积,接下来您可以做的就是根据 date_registered 进行分区。

You should try an inner join, rather than a cartesian product, the next thing you can do is partitioning according to date_registered.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文