MySQL 在执行这种查询时天生就很慢,还是我配置错误?
以下查询旨在接收用户未读消息的列表。它涉及 3 个表:recipients
包含用户与消息 ID 的关系,messages
包含消息本身,message_readers
包含用户的列表已阅读哪些消息。
查询可靠地花费了 4.9 秒——这严重损害了我们的性能,而且尤其令人担忧,因为我们希望数据库最终会大几个数量级。诚然,这本质上是一个繁重的查询,但数据集很小,直观上看起来应该更快。服务器有足够的内存 (32GB),整个数据库应始终加载到 RAM 中,并且服务器上没有任何其他运行。
这些表都很小:
recipients: 23581
messages: 9679
message_readers: 2685
查询本身:
SELECT
m.*
FROM
messages m
INNER JOIN recipients r ON r.message_id = m.id
LEFT JOIN message_readers mr ON mr.message_id = m.id
WHERE
r.id = $user_id
AND (mr.read_by_id IS NULL OR mr.read_by_id <> $user_id)
解释计划非常简单:
+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+
| 1 | SIMPLE | r | ref | index_recipients_on_id | index_recipients_on_id | 768 | const | 11908 | Using where |
| 1 | SIMPLE | m | eq_ref | PRIMARY | PRIMARY | 4 | db.r.message_id | 1 | Using index |
| 1 | SIMPLE | mr | ALL | NULL | NULL | NULL | NULL | 2498 | Using where |
+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+
message_readers.read_by_id
上有一个索引,但我猜由于 IS NULL 条件,它无法真正使用它。
我正在使用除以下设置之外的所有默认设置:
key_buffer=4G
query_cache_limit = 256M
query_cache_size = 1G
innodb_buffer_pool_size=12G
谢谢!
The following query is intended to receive a list of unread messages by user. It involves 3 tables: recipients
contains a relation of users to message IDs, messages
contains the messages themselves, and message_readers
contains a list of which users have read which messages.
The query reliably takes 4.9 seconds - this is seriously hurting our performance, and is especially worrisome since we hope the database will eventually be several orders of magnitude larger. Granted, it's an inherently heavy query, but the data set is tiny, and intuitively it seems that it should be much faster. The server has enough memory (32gb) that the entire database should be loaded in RAM at all times, and there's nothing else running on the box.
The tables are all tiny:
recipients: 23581
messages: 9679
message_readers: 2685
The query itself:
SELECT
m.*
FROM
messages m
INNER JOIN recipients r ON r.message_id = m.id
LEFT JOIN message_readers mr ON mr.message_id = m.id
WHERE
r.id = $user_id
AND (mr.read_by_id IS NULL OR mr.read_by_id <> $user_id)
The explain plan is pretty straightforward:
+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+
| 1 | SIMPLE | r | ref | index_recipients_on_id | index_recipients_on_id | 768 | const | 11908 | Using where |
| 1 | SIMPLE | m | eq_ref | PRIMARY | PRIMARY | 4 | db.r.message_id | 1 | Using index |
| 1 | SIMPLE | mr | ALL | NULL | NULL | NULL | NULL | 2498 | Using where |
+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+
There IS an index on message_readers.read_by_id
, but I guess it can't really use it because of the IS NULL condition.
I'm using all default settings except for the following:
key_buffer=4G
query_cache_limit = 256M
query_cache_size = 1G
innodb_buffer_pool_size=12G
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
假设
message_readers
是recipients
的子集,我建议进行以下更改:删除
message_readers
表并将其替换为recipients
表上的标志。这将消除空检查并删除联接。可能已经是这样,但请确保
recipients
的聚集索引是id, message_id
而不是message_id, id
,因为几乎所有邮件搜索将基于收件人。以下是结果的 SELECT:
UPDATE
这是使用现有方案的查询的正确版本:
这假设您的聚集索引符合预期:
Assuming that
message_readers
is a subset ofrecipients
, I recommend making the following changes:Get rid of the
message_readers
table and replace it with a flag on therecipients
table. This will eliminiate the null check and remove a join.It probably already is, but make sure your clustered index for
recipients
isid, message_id
rather thanmessage_id, id
, since nearly all searches for messages will be based on the recipients.Here is the SELECT that results:
UPDATE
Here is the correct version of your query using the existing scheme:
This assumes that your clustered indexes are what would be expected:
当您像这样重写查询时,您可以摆脱 IS NULL 条件:
基本上,这看起来像:获取
recipient
中不在message_readers< 中的所有
messages
/code> 并更简单地描述问题。You can get rid of the IS NULL-condition when you rewrite your query like this:
Basically this reads like: get all
messages
forrecipient
where not inmessage_readers
and describes the problem simpeler.假设您只想要查询中显示的计数),如果您像这样更改连接会发生什么?
我使用 MSSQL,这有可能加快速度。我从未使用过 MySQL,但它应该可以工作,不是吗?
编辑:这个疯狂的想法怎么样?我认为您可以将
OR
拆分为两个单独的左连接,然后在其中任何一个返回某些内容时获取记录。Assuming you just want the count as shown in your query), what happens if you change the joins like so?
I use MSSQL and this has the potential to speed it up. I've never used MySQL, but it should work, shouldn't it?
EDIT: What about this for a mad idea? I thought you could split out the
OR
into two separate left joins and then take the record if either of those returns something.除非我遗漏了什么,否则您似乎根本不需要消息表。您真正想要的是在收件人中显示该用户的消息 ID 数,但在 message_readers 中不显示该用户的消息 ID 数。
如果我在上面,你可以用减号来完成你想要的:
这完全避免了连接。现在,如果您确实需要消息表中的数据用于生产查询,则可以将消息表连接到此子查询(或将其粘贴在 IN 子句中)。
可能我的说法有些偏差,因为我的经验是在 Oracle 领域,但 MySQL 支持 MINUS,所以这可能值得一试。
Unless I am missing something, you don't appear to need the messages table at all. What you really want is the number of message ids that appear for this user in recipients, and do not appear for this user in message_readers.
If I'm right above, you can accomplish what you want with a MINUS:
This avoids joins entirely. Now if you do indeed need data from the messages table for your production query, you can join the messages table to this subquery (or stick it in an IN clause).
It's possible that I'm off base here as my experience is in Oracle-land but MySQL supports MINUS so this is probably worth a shot.
注意
:“is null”逻辑应该被捕获,因为 null 不等于任何东西。
如果这很快,那么尝试这个:
原始答案不起作用:尝试包括message_id 和 id 在收件人的覆盖索引中,看看会发生什么。
What's the query time for
Note: The "is null" logic should be caught by this since null isn't equal to anything
If this is fast then try this:
Original answer didn't work: Try including message_id and id in a covering index on recipients and see what happens.
一条评论
count(m.id) 表示计数非空值,但 m.id 永远不会为空,因此它是额外的。我们尝试一下这个
疑问,也许在您的业务逻辑中是正确的,为什么所有用户都可以读取传入消息(mr.read_by_is null),以及为什么可以为其他人读取消息或不读取特定接收者(mr.read_by_id <> $ user_id)
它是一个池,我想
一种更好的方法是通过存在更改子查询中的内部。 请注意,“mr.read_by_id IS NULL”不是必需的,即如果 mr_read_by_id 为 null,“那么意味着“mr.read_by_id = $user_id”为 false”
an comment
count(m.id) means count not null values but m.id is never null so its extra. well try with that
one doubt maybe is correct in you business logic why all user can read incomming messages (mr.read_by_is null ) and why an message can be read for the others or do not specific receiver (mr.read_by_id <> $user_id)
its a pool, I guess
one better approach is change the inner in subquery by an exists. see that "mr.read_by_id IS NULL" is not neccesary that is if mr_read_by_id is null "so means what " mr.read_by_id = $user_id " is false"