大数据集上的 SQL 语句超时
在尝试完成 Web 流程时,我收到错误由于语句超时而取消语句
。调试代码库发现,由于数据集较大,以下查询超时。我很感激有关如何提高以下查询性能的任何建议。
select userid, max(recent_activity_date) recent_activity_date
from (
SELECT id AS userid,
recent_logged_in AS recent_activity_date
FROM user
WHERE recent_logged_in > now() - cast('10 days' AS INTERVAL)
UNION
SELECT userid AS userid, max(recentaccessed) AS recent_activity_date
FROM tokencreds
WHERE recentaccessed > now() - cast('10 days' AS INTERVAL)
GROUP BY userid
) recent_activity
WHERE EXISTS(select 1 from user where id = userid and not deleted)
group by userid
order by userid;
每表索引:
- 表用户:
user_recent_logged_in on user (recent_logged_in)
- 表tokencreds:
tokencreds_userid_token on tokencreds (userid, token)
。 tokencreds_userid_token 是唯一的。
Trying to complete a web process, I'm getting the error canceling statement due to statement timeout
. Debugging the codebase it turns out that the below query is timing out due to large data set. I appreciate any suggestions on how to increase the below query performance.
select userid, max(recent_activity_date) recent_activity_date
from (
SELECT id AS userid,
recent_logged_in AS recent_activity_date
FROM user
WHERE recent_logged_in > now() - cast('10 days' AS INTERVAL)
UNION
SELECT userid AS userid, max(recentaccessed) AS recent_activity_date
FROM tokencreds
WHERE recentaccessed > now() - cast('10 days' AS INTERVAL)
GROUP BY userid
) recent_activity
WHERE EXISTS(select 1 from user where id = userid and not deleted)
group by userid
order by userid;
Index per table:
- Table user:
user_recent_logged_in on user (recent_logged_in)
- Table tokencreds:
tokencreds_userid_token on tokencreds (userid, token)
. tokencreds_userid_token is unique.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
很大程度上取决于数据的“布局”。是否有很多记录“命中”?用户中有很多记录吗?在代币信用中?等等...就
我个人而言,我会这样做:
但是YMMV。 为了获得更好的想法,您确实应该提供完整的 EXPLAIN ANALYZE 结果,否则我们只是盲目地猜测。系统可能会拒绝使用任何建议的索引,在这种情况下,您最好再次删除。
推理:
UNION
将在子选择上导致隐式不同,而您实际上并不需要它,因为MAX()
和GROUP BY
稍后也会做同样的事情,那么为什么要做两次呢?**:请注意,这里的结果会有所不同! (但我认为我的“更好”)。例如,假设您有 3 条记录 user_id 5
忽略
tokencreds
表,在你的版本中 user_id 5 的结果将是 Dec 10,而在我的版本中它将是 Dec 8. 检查您的要求,选择您想要的!编辑:建议索引中的错误
A lot depends on the 'layout' of your data. Are there a lot of records 'hit'? Are there a lot of records in users? in tokencreds? etc...
Personally I would go for this:
but YMMV. To get a better idea you really should provide the full EXPLAIN ANALYZE result otherwise we're just flying blind and guessing here. Could be the system will refuse to use any of the suggested indexes in which case you better remove then again off course.
Reasoning:
UNION
will cause an implicit distinct on the sub-select which you don't really need as teMAX()
andGROUP BY
later on will pretty much do the same so why do things twice?**: Do note that the results here ARE going to be different! (but I think mine are 'better'). E.g. suppose you have 3 records for user_id 5
Ignoring the
tokencreds
table, the result for user_id 5 will be Dec 10 in your version while in mine it will be Dec 8. Check your requirements on which one you want!edit: mistake in suggested indexes
去掉
union
和exists()
,并将它们组合成一个直接的join
:Get rid of the
union
, and theexists()
, and combine them into a straightjoin
: