大数据集上的 SQL 语句超时

发布于 2025-01-10 16:48:07 字数 951 浏览 3 评论 0原文

在尝试完成 Web 流程时，我收到错误由于语句超时而取消语句。调试代码库发现，由于数据集较大，以下查询超时。我很感激有关如何提高以下查询性能的任何建议。

select userid, max(recent_activity_date) recent_activity_date 
from (
      SELECT id AS userid,
        recent_logged_in AS recent_activity_date
      FROM user
      WHERE recent_logged_in > now() - cast('10 days' AS INTERVAL)
      UNION
      SELECT userid AS userid, max(recentaccessed) AS recent_activity_date
      FROM tokencreds
      WHERE recentaccessed > now() - cast('10 days' AS INTERVAL)
      GROUP BY userid
) recent_activity 
  WHERE EXISTS(select 1 from user where id = userid and not deleted) 
  group by userid 
  order by userid;

每表索引：

表用户： user_recent_logged_in on user (recent_logged_in)
表tokencreds：tokencreds_userid_token on tokencreds (userid, token)。 tokencreds_userid_token 是唯一的。

原文

Trying to complete a web process, I'm getting the error canceling statement due to statement timeout. Debugging the codebase it turns out that the below query is timing out due to large data set. I appreciate any suggestions on how to increase the below query performance.

select userid, max(recent_activity_date) recent_activity_date 
from (
      SELECT id AS userid,
        recent_logged_in AS recent_activity_date
      FROM user
      WHERE recent_logged_in > now() - cast('10 days' AS INTERVAL)
      UNION
      SELECT userid AS userid, max(recentaccessed) AS recent_activity_date
      FROM tokencreds
      WHERE recentaccessed > now() - cast('10 days' AS INTERVAL)
      GROUP BY userid
) recent_activity 
  WHERE EXISTS(select 1 from user where id = userid and not deleted) 
  group by userid 
  order by userid;

Index per table:

Table user:
user_recent_logged_in on user (recent_logged_in)
Table tokencreds: tokencreds_userid_token on tokencreds (userid, token). tokencreds_userid_token is unique.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

最舍不得你 2025-01-17 16:48:07

很大程度上取决于数据的“布局”。是否有很多记录“命中”？用户中有很多记录吗？在代币信用中？等等...就

我个人而言，我会这样做：

SELECT userid, max(recent_activity_date) recent_activity_date _user
 FROM (
        SELECT id AS userid, MAX(recent_logged_in) AS recent_activity_date
          FROM user
         WHERE recent_logged_in > now() - cast('10 days' AS INTERVAL)
           AND NOT deleted
         UNION ALL
        SELECT userid AS userid, MAX(recentaccessed) AS recent_activity_date
          FROM tokencreds
         WHERE recentaccessed > now() - cast('10 days' AS INTERVAL)
           AND EXISTS(SELECT * FROM user WHERE id = userid AND NOT deleted)
) recent_activity 
  GROUP BY userid 
  ORDER BY userid; 
  
  
 -- indexes 'needed' on :
 CREATE INDEX idx_userid_not_deleted ON user (userid) WHERE NOT deleted;
 CREATE INDEX idx_recent_logged_in_user_id_not_deleted ON user (recent_logged_in, userid) WHERE not deleted;
 CREATE INDEX idx_recentaccessed_user_id ON tokencreds (recentaccessed, userid);

但是YMMV。 为了获得更好的想法，您确实应该提供完整的 EXPLAIN ANALYZE 结果，否则我们只是盲目地猜测。系统可能会拒绝使用任何建议的索引，在这种情况下，您最好再次删除。

推理：

UNION 将在子选择上导致隐式不同，而您实际上并不需要它，因为 MAX() 和 GROUP BY稍后也会做同样的事情，那么为什么要做两次呢？
最好“尽快”过滤，而不是最后过滤恕我直言（**）

**：请注意，这里的结果会有所不同！（但我认为我的“更好”）。例如，假设您有 3 条记录 user_id 5

user_id = 5，deleted = true，recent_activity_date = Dec 10
user_id = 5，deleted = false，recent_activity_date = Dec 8
user_id = 5，deleted = false，recent_activity_date = Dec 5

忽略 tokencreds 表，在你的版本中 user_id 5 的结果将是 Dec 10，而在我的版本中它将是 Dec 8. 检查您的要求，选择您想要的！

编辑：建议索引中的错误

A lot depends on the 'layout' of your data. Are there a lot of records 'hit'? Are there a lot of records in users? in tokencreds? etc...

Personally I would go for this:

SELECT userid, max(recent_activity_date) recent_activity_date _user
 FROM (
        SELECT id AS userid, MAX(recent_logged_in) AS recent_activity_date
          FROM user
         WHERE recent_logged_in > now() - cast('10 days' AS INTERVAL)
           AND NOT deleted
         UNION ALL
        SELECT userid AS userid, MAX(recentaccessed) AS recent_activity_date
          FROM tokencreds
         WHERE recentaccessed > now() - cast('10 days' AS INTERVAL)
           AND EXISTS(SELECT * FROM user WHERE id = userid AND NOT deleted)
) recent_activity 
  GROUP BY userid 
  ORDER BY userid; 
  
  
 -- indexes 'needed' on :
 CREATE INDEX idx_userid_not_deleted ON user (userid) WHERE NOT deleted;
 CREATE INDEX idx_recent_logged_in_user_id_not_deleted ON user (recent_logged_in, userid) WHERE not deleted;
 CREATE INDEX idx_recentaccessed_user_id ON tokencreds (recentaccessed, userid);

but YMMV. To get a better idea you really should provide the full EXPLAIN ANALYZE result otherwise we're just flying blind and guessing here. Could be the system will refuse to use any of the suggested indexes in which case you better remove then again off course.

Reasoning:

the UNION will cause an implicit distinct on the sub-select which you don't really need as te MAX() and GROUP BY later on will pretty much do the same so why do things twice?
better to filter 'as soon as possible' rather then filter in the end IMHO (**)

**: Do note that the results here ARE going to be different! (but I think mine are 'better'). E.g. suppose you have 3 records for user_id 5

user_id = 5, deleted = true, recent_activity_date = Dec 10
user_id = 5, deleted = false, recent_activity_date = Dec 8
user_id = 5, deleted = false, recent_activity_date = Dec 5

Ignoring the tokencreds table, the result for user_id 5 will be Dec 10 in your version while in mine it will be Dec 8. Check your requirements on which one you want!

edit: mistake in suggested indexes

回复收藏 0 原文

深陷 2025-01-17 16:48:07

去掉union和exists()，并将它们组合成一个直接的join：

SELECT x.userid
        , GREATEST(x.recent_logged_in, x.recent_activity_date ) AS recent_activity_date
FROM (
      SELECT u.id AS userid
        , u.recent_logged_in
        , MAX(t.recentaccessed) AS recent_activity_date
      FROM users u
      LEFT JOIN tokencreds AS t ON t.userid = u.id
      WHERE NOT u.deleted  
      AND (
        u.recent_logged_in > now() - cast('10 days' AS INTERVAL)
        OR t.recentaccessed > now() - cast('10 days' AS INTERVAL)
        )
      GROUP BY u.id
        ) x
  ORDER by userid;

Get rid of the union, and the exists(), and combine them into a straight join:

SELECT x.userid
        , GREATEST(x.recent_logged_in, x.recent_activity_date ) AS recent_activity_date
FROM (
      SELECT u.id AS userid
        , u.recent_logged_in
        , MAX(t.recentaccessed) AS recent_activity_date
      FROM users u
      LEFT JOIN tokencreds AS t ON t.userid = u.id
      WHERE NOT u.deleted  
      AND (
        u.recent_logged_in > now() - cast('10 days' AS INTERVAL)
        OR t.recentaccessed > now() - cast('10 days' AS INTERVAL)
        )
      GROUP BY u.id
        ) x
  ORDER by userid;

回复收藏 0 原文

~没有更多了~