MySQL 在执行这种查询时天生就很慢,还是我配置错误?

发布于 2024-11-17 10:12:30 字数 2170 浏览 2 评论 0原文

以下查询旨在接收用户未读消息的列表。它涉及 3 个表:recipients 包含用户与消息 ID 的关系,messages 包含消息本身,message_readers 包含用户的列表已阅读哪些消息。

查询可靠地花费了 4.9 秒——这严重损害了我们的性能,而且尤其令人担忧,因为我们希望数据库最终会大几个数量级。诚然,这本质上是一个繁重的查询,但数据集很小,直观上看起来应该更快。服务器有足够的内存 (32GB),整个数据库应始终加载到 RAM 中,并且服务器上没有任何其他运行。

这些表都很小:

recipients: 23581
messages: 9679
message_readers: 2685

查询本身:

SELECT 
    m.*
FROM 
    messages m
INNER JOIN recipients r ON r.message_id = m.id
LEFT JOIN message_readers mr ON mr.message_id = m.id
WHERE
    r.id = $user_id
    AND (mr.read_by_id IS NULL OR mr.read_by_id <> $user_id)

解释计划非常简单:

+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+
| id | select_type | table | type   | possible_keys                     | key                               | key_len | ref                            | rows  | Extra       |
+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+
|  1 | SIMPLE      | r     | ref    | index_recipients_on_id            | index_recipients_on_id            | 768     | const                          | 11908 | Using where |
|  1 | SIMPLE      | m     | eq_ref | PRIMARY                           | PRIMARY                           | 4       | db.r.message_id                |     1 | Using index |
|  1 | SIMPLE      | mr    | ALL    | NULL                              | NULL                              | NULL    | NULL                           |  2498 | Using where |
+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+

message_readers.read_by_id 上有一个索引,但我猜由于 IS NULL 条件,它无法真正使用它。

我正在使用除以下设置之外的所有默认设置:

key_buffer=4G
query_cache_limit = 256M
query_cache_size = 1G
innodb_buffer_pool_size=12G

谢谢!

The following query is intended to receive a list of unread messages by user. It involves 3 tables: recipients contains a relation of users to message IDs, messages contains the messages themselves, and message_readers contains a list of which users have read which messages.

The query reliably takes 4.9 seconds - this is seriously hurting our performance, and is especially worrisome since we hope the database will eventually be several orders of magnitude larger. Granted, it's an inherently heavy query, but the data set is tiny, and intuitively it seems that it should be much faster. The server has enough memory (32gb) that the entire database should be loaded in RAM at all times, and there's nothing else running on the box.

The tables are all tiny:

recipients: 23581
messages: 9679
message_readers: 2685

The query itself:

SELECT 
    m.*
FROM 
    messages m
INNER JOIN recipients r ON r.message_id = m.id
LEFT JOIN message_readers mr ON mr.message_id = m.id
WHERE
    r.id = $user_id
    AND (mr.read_by_id IS NULL OR mr.read_by_id <> $user_id)

The explain plan is pretty straightforward:

+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+
| id | select_type | table | type   | possible_keys                     | key                               | key_len | ref                            | rows  | Extra       |
+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+
|  1 | SIMPLE      | r     | ref    | index_recipients_on_id            | index_recipients_on_id            | 768     | const                          | 11908 | Using where |
|  1 | SIMPLE      | m     | eq_ref | PRIMARY                           | PRIMARY                           | 4       | db.r.message_id                |     1 | Using index |
|  1 | SIMPLE      | mr    | ALL    | NULL                              | NULL                              | NULL    | NULL                           |  2498 | Using where |
+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+

There IS an index on message_readers.read_by_id, but I guess it can't really use it because of the IS NULL condition.

I'm using all default settings except for the following:

key_buffer=4G
query_cache_limit = 256M
query_cache_size = 1G
innodb_buffer_pool_size=12G

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

堇色安年 2024-11-24 10:12:30

假设 message_readersrecipients 的子集,我建议进行以下更改:

  1. 删除 message_readers 表并将其替换为recipients 表上的标志。这将消除空检查并删除联接。

  2. 可能已经是这样,但请确保recipients的聚集索引是id, message_id而不是message_id, id,因为几乎所有邮件搜索将基于收件人。

以下是结果的 SELECT:

SELECT
    r.whatever,
    m.whatever,
    -- ...
FROM
    recipients r
    INNER JOIN messages m ON m.id = r.message_id
WHERE
    r.id = $user_id
    AND r.read_flag = 'N'

UPDATE

这是使用现有方案的查询的正确版本:

SELECT
    r.whatever,
    m.whatever,
    -- ...
FROM
    recipients r
    INNER JOIN messages m ON r.message_id = m.id
    LEFT JOIN message_readers mr ON mr.read_by_id = r.id 
                                 AND mr.message_id = m.id
WHERE
    r.id = $user_id
    AND mr.read_by_id IS NULL

这假设您的聚集索引符合预期:

recipients: id, message_id
messages: id
message_readers: read_by_id, message_id

Assuming that message_readers is a subset of recipients, I recommend making the following changes:

  1. Get rid of the message_readers table and replace it with a flag on the recipients table. This will eliminiate the null check and remove a join.

  2. It probably already is, but make sure your clustered index for recipients is id, message_id rather than message_id, id, since nearly all searches for messages will be based on the recipients.

Here is the SELECT that results:

SELECT
    r.whatever,
    m.whatever,
    -- ...
FROM
    recipients r
    INNER JOIN messages m ON m.id = r.message_id
WHERE
    r.id = $user_id
    AND r.read_flag = 'N'

UPDATE

Here is the correct version of your query using the existing scheme:

SELECT
    r.whatever,
    m.whatever,
    -- ...
FROM
    recipients r
    INNER JOIN messages m ON r.message_id = m.id
    LEFT JOIN message_readers mr ON mr.read_by_id = r.id 
                                 AND mr.message_id = m.id
WHERE
    r.id = $user_id
    AND mr.read_by_id IS NULL

This assumes that your clustered indexes are what would be expected:

recipients: id, message_id
messages: id
message_readers: read_by_id, message_id
恰似旧人归 2024-11-24 10:12:30

当您像这样重写查询时,您可以摆脱 IS NULL 条件:

SELECT 
    count(m.id)
FROM 
    messages m
INNER JOIN recipients r ON re.message_id = m.id
WHERE r.id = $user_id
  AND NOT EXISTS
         (SELECT mr.id 
            FROM message_readers mr 
           WHERE mr.message_id = m.id
             AND mr.read_by_id = $user_id)

基本上,这看起来像:获取 recipient 中不在 message_readers< 中的所有 messages /code> 并更简单地描述问题。

You can get rid of the IS NULL-condition when you rewrite your query like this:

SELECT 
    count(m.id)
FROM 
    messages m
INNER JOIN recipients r ON re.message_id = m.id
WHERE r.id = $user_id
  AND NOT EXISTS
         (SELECT mr.id 
            FROM message_readers mr 
           WHERE mr.message_id = m.id
             AND mr.read_by_id = $user_id)

Basically this reads like: get all messages for recipient where not in message_readers and describes the problem simpeler.

眼趣 2024-11-24 10:12:30

假设您只想要查询中显示的计数),如果您像这样更改连接会发生什么?

我使用 MSSQL,这有可能加快速度。我从未使用过 MySQL,但它应该可以工作,不是吗?

SELECT     count(m.id)
FROM       messages m
INNER JOIN recipients r ON r.message_id = m.id AND r.id = $user_id
LEFT JOIN  message_readers mr ON mr.message_id = m.id AND (mr.read_by_id IS NULL OR mr.read_by_id <> $user_id)

编辑:这个疯狂的想法怎么样?我认为您可以将 OR 拆分为两个单独的左连接,然后在其中任何一个返回某些内容时获取记录。

SELECT     count(m.id)
FROM       messages m
LEFT JOIN  recipients r ON r.message_id = m.id AND r.id = $user_id
LEFT JOIN  message_readers mr ON mr.message_id = m.id AND mr.read_by_id IS NULL
LEFT JOIN  message_readers mr2 ON mr2.message_id = m.id AND mr2.read_by_id <> $user_id
WHERE      COALESCE(mr.message_id, mr2.message_id) IS NOT NULL

Assuming you just want the count as shown in your query), what happens if you change the joins like so?

I use MSSQL and this has the potential to speed it up. I've never used MySQL, but it should work, shouldn't it?

SELECT     count(m.id)
FROM       messages m
INNER JOIN recipients r ON r.message_id = m.id AND r.id = $user_id
LEFT JOIN  message_readers mr ON mr.message_id = m.id AND (mr.read_by_id IS NULL OR mr.read_by_id <> $user_id)

EDIT: What about this for a mad idea? I thought you could split out the OR into two separate left joins and then take the record if either of those returns something.

SELECT     count(m.id)
FROM       messages m
LEFT JOIN  recipients r ON r.message_id = m.id AND r.id = $user_id
LEFT JOIN  message_readers mr ON mr.message_id = m.id AND mr.read_by_id IS NULL
LEFT JOIN  message_readers mr2 ON mr2.message_id = m.id AND mr2.read_by_id <> $user_id
WHERE      COALESCE(mr.message_id, mr2.message_id) IS NOT NULL
魂牵梦绕锁你心扉 2024-11-24 10:12:30

除非我遗漏了什么,否则您似乎根本不需要消息表。您真正想要的是在收件人中显示该用户的消息 ID 数,但在 message_readers 中不显示该用户的消息 ID 数。

如果我在上面,你可以用减号来完成你想要的:

SELECT count(message_id)
  FROM (
        SELECT r.message_id  
          FROM recipients r 
         WHERE r.id = $user_id
        MINUS
        SELECT mr.message_id
          FROM message_readers mr
         WHERE mr.read_by_id = $user_id
       )

这完全避免了连接。现在,如果您确实需要消息表中的数据用于生产查询,则可以将消息表连接到此子查询(或将其粘贴在 IN 子句中)。

可能我的说法有些偏差,因为我的经验是在 Oracle 领域,但 MySQL 支持 MINUS,所以这可能值得一试。

Unless I am missing something, you don't appear to need the messages table at all. What you really want is the number of message ids that appear for this user in recipients, and do not appear for this user in message_readers.

If I'm right above, you can accomplish what you want with a MINUS:

SELECT count(message_id)
  FROM (
        SELECT r.message_id  
          FROM recipients r 
         WHERE r.id = $user_id
        MINUS
        SELECT mr.message_id
          FROM message_readers mr
         WHERE mr.read_by_id = $user_id
       )

This avoids joins entirely. Now if you do indeed need data from the messages table for your production query, you can join the messages table to this subquery (or stick it in an IN clause).

It's possible that I'm off base here as my experience is in Oracle-land but MySQL supports MINUS so this is probably worth a shot.

网白 2024-11-24 10:12:30

注意

select distinct message_id
  from message_readers
 where read_by_id <> $user_id

:“is null”逻辑应该被捕获,因为 null 不等于任何东西。

如果这很快,那么尝试这个:

SELECT count(m.id)
FROM messages m
INNER JOIN recipients r ON r.message_id = m.id
where r.id = $user_id
and m.id in (
    select distinct message_id
      from message_readers
     where read_by_id <> $user_id)

原始答案不起作用:尝试包括message_id 和 id 在收件人的覆盖索引中,看看会发生什么。

What's the query time for

select distinct message_id
  from message_readers
 where read_by_id <> $user_id

Note: The "is null" logic should be caught by this since null isn't equal to anything

If this is fast then try this:

SELECT count(m.id)
FROM messages m
INNER JOIN recipients r ON r.message_id = m.id
where r.id = $user_id
and m.id in (
    select distinct message_id
      from message_readers
     where read_by_id <> $user_id)

Original answer didn't work: Try including message_id and id in a covering index on recipients and see what happens.

江南月 2024-11-24 10:12:30

一条评论
count(m.id) 表示计数非空值,但 m.id 永远不会为空,因此它是额外的。我们尝试一下这个

SELECT count(*)
FROM 
messages m
INNER JOIN recipients r ON r.message_id = m.id  
left join 
(
    select m.id
    messages m
    INNER JOIN message_readers mr 
    ON mr.message_id = m.id     
    and (mr.read_by_id <> $user_id or mr.read_by_id IS NULL)        
)as sub 
on sub.id = m.id        
WHERE r.id = $user_id

疑问,也许在您的业务逻辑中是正确的,为什么所有用户都可以读取传入消息(mr.read_by_is null),以及为什么可以为其他人读取消息或不读取特定接收者(mr.read_by_id <> $ user_id)

它是一个池,我想

一种更好的方法是通过存在更改子查询中的内部。 请注意,“mr.read_by_id IS NULL”不是必需的,即如果 mr_read_by_id 为 null,“那么意味着“mr.read_by_id = $user_id”为 false”

SELECT count(*)
FROM 
messages m
INNER JOIN recipients r ON r.message_id = m.id  
left join 
(
    select m.id
    messages m
            where not exists(select * from message_readers mr 
    where mr.message_id = m.id      
    and mr.read_by_id = $user_id)
)as sub 
on sub.id = m.id        
WHERE r.id = $user_id

an comment
count(m.id) means count not null values but m.id is never null so its extra. well try with that

SELECT count(*)
FROM 
messages m
INNER JOIN recipients r ON r.message_id = m.id  
left join 
(
    select m.id
    messages m
    INNER JOIN message_readers mr 
    ON mr.message_id = m.id     
    and (mr.read_by_id <> $user_id or mr.read_by_id IS NULL)        
)as sub 
on sub.id = m.id        
WHERE r.id = $user_id

one doubt maybe is correct in you business logic why all user can read incomming messages (mr.read_by_is null ) and why an message can be read for the others or do not specific receiver (mr.read_by_id <> $user_id)

its a pool, I guess

one better approach is change the inner in subquery by an exists. see that "mr.read_by_id IS NULL" is not neccesary that is if mr_read_by_id is null "so means what " mr.read_by_id = $user_id " is false"

SELECT count(*)
FROM 
messages m
INNER JOIN recipients r ON r.message_id = m.id  
left join 
(
    select m.id
    messages m
            where not exists(select * from message_readers mr 
    where mr.message_id = m.id      
    and mr.read_by_id = $user_id)
)as sub 
on sub.id = m.id        
WHERE r.id = $user_id
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文