mysql:选择最后 10 条消息,并为每条消息选择最后 3 条回复

发布于 2024-10-19 06:40:58 字数 3097 浏览 2 评论 0原文

为了简单起见,让我们将消息表精简到最小,使用一些示例数据,

message_id  reply_to    createdate
1           0           123
2           0           124
3           0           123
4           1           154
5           1           165

reply_to 是消息的回复的 message_id,

所以我正在寻找一个 sql 语句/过程/函数/其他表设计,让我选择最后一个10条消息,对于最后3条回复中的每一条,我不介意更改表结构,甚至不介意为最后3条回复保留某种记录,

只需选择最后10条消息

SELECT * FROM message ORDER BY createdate LIMIT 10;

,对于每条消息,回复都是

SELECT * FROM message WHERE reply_to = :message_id: ORDER BY createdate LIMIT 3;

我的尝试到目前为止:

  • 消息表上的三重外连接作为回复
  • 普通连接,但 mysql 不允许
  • 使用 HAVING COUNT(DISTINCTreply_to) <= 3 进行连接限制,但当然 HAVING 最后被评估,

我无法得到其中任何

一个最后一个选项 atm 是有一个单独的表来跟踪每条消息的最后 3 个回复

message_reply: 消息 ID、r_1、r_2、r_3

然后使用触发器更新该表 因此,消息表中的新行(即回复)更新了 message_reply 表

UPDATE message_reply SET r_3 = r_2, r_2 = r_1, r_1 = NEW.reply_to WHERE message_id = NEW.message_id

然后我可以查询这些记录的消息表

有人有更好的建议,甚至是有效的 SQL 语句吗?

谢谢

编辑:

添加了解释结果

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   PRIMARY     <derived4>  ALL     NULL    NULL    NULL    NULL    3    
1   PRIMARY     <derived2>  ALL     NULL    NULL    NULL    NULL    10  Using where; Using join buffer
1   PRIMARY     r   eq_ref  PRIMARY,message_id,message_id_2     PRIMARY     4   func    1    
4   DERIVED     NULL    NULL    NULL    NULL    NULL    NULL    NULL    No tables used
5   UNION   NULL    NULL    NULL    NULL    NULL    NULL    NULL    No tables used
6   UNION   NULL    NULL    NULL    NULL    NULL    NULL    NULL    No tables used
NULL    UNION RESULT    <union4,5,6>    ALL     NULL    NULL    NULL    NULL    NULL     
2   DERIVED     m   ALL     NULL    NULL    NULL    NULL    299727   
3   DEPENDENT SUBQUERY  r   ref     reply_to,reply_to_2     reply_to_2  4   testv4.m.message_id     29973    

编辑2:

好吧,我尝试了message_reply表方法,这也是我构建

表的方法:

message_reply: message_id, r_1, r_2, r_3

构建触发器:

DELIMITER |
CREATE TRIGGER i_message AFTER INSERT ON message
  FOR EACH ROW BEGIN
    IF NEW.reply_to THEN
        INSERT INTO message_replies (message_id, r_1) VALUES (NEW.reply_to, NEW.message_id)
        ON DUPLICATE KEY UPDATE r_3 = r_2, r_2 = r_1, r_1 = NEW.message_id;
    ELSE
        INSERT INTO message_replies (message_id) VALUES (NEW.message_id);
    END IF;
  END;
|
DELIMITER ;

并选择消息:

SELECT m.*,r1.*,r2.*,r3.* FROM message_replies mr
LEFT JOIN message m ON m.message_id = mr.message_id
LEFT JOIN message r1 ON r1.message_id = mr.r_1
LEFT JOIN message r2 ON r2.message_id = mr.r_2
LEFT JOIN message r3 ON r3.message_id = mr.r_3

当然通过触发器对我进行预处理,这是最快的方法。

使用多组 100k 插入件进行测试,以查看触发器的性能影响 与没有 trigger 的情况下处理 100k 行相比,花费了 0.4 秒的时间 插入的总时间约为 12 秒(在 myIsam 表上)

For simplicity lets strip down the messages table to its minimum, with some sample data

message_id  reply_to    createdate
1           0           123
2           0           124
3           0           123
4           1           154
5           1           165

the reply_to is the message_id wich the message is a reply to

so im looking for a sql-statement/procedure/function/other table design that lets me select the last 10 messages and for each of those the last 3 replies, i dont mind changing the table structure or even keeping some sort of a record for the last 3 replies

just selecting the last 10 messages is

SELECT * FROM message ORDER BY createdate LIMIT 10;

and for each of those messages the replies are

SELECT * FROM message WHERE reply_to = :message_id: ORDER BY createdate LIMIT 3;

my attempts so far are:

  • a triple outer join over the message table as replies
  • a plain join but mysql doesnt allow limits in joins
  • useing HAVING COUNT(DISTINCT reply_to) <= 3, but ofcourse HAVING is evaluated last

i couldnt get either of those working

my last option atm is to have a separate table to track the last 3 replies per message

message_reply:
message_id, r_1, r_2, r_3

and then updateing that table useing triggers
so a new row in the message table wich is a reply updates the message_reply table

UPDATE message_reply SET r_3 = r_2, r_2 = r_1, r_1 = NEW.reply_to WHERE message_id = NEW.message_id

then i could just query the message table for those records

anyone have a better suggestion or even a working SQL statement?

thanks

EDIT:

added EXPLAIN results

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   PRIMARY     <derived4>  ALL     NULL    NULL    NULL    NULL    3    
1   PRIMARY     <derived2>  ALL     NULL    NULL    NULL    NULL    10  Using where; Using join buffer
1   PRIMARY     r   eq_ref  PRIMARY,message_id,message_id_2     PRIMARY     4   func    1    
4   DERIVED     NULL    NULL    NULL    NULL    NULL    NULL    NULL    No tables used
5   UNION   NULL    NULL    NULL    NULL    NULL    NULL    NULL    No tables used
6   UNION   NULL    NULL    NULL    NULL    NULL    NULL    NULL    No tables used
NULL    UNION RESULT    <union4,5,6>    ALL     NULL    NULL    NULL    NULL    NULL     
2   DERIVED     m   ALL     NULL    NULL    NULL    NULL    299727   
3   DEPENDENT SUBQUERY  r   ref     reply_to,reply_to_2     reply_to_2  4   testv4.m.message_id     29973    

EDIT 2:

Well i tried the message_reply table method also this is what i did

build the table:

message_reply: message_id, r_1, r_2, r_3

build the trigger:

DELIMITER |
CREATE TRIGGER i_message AFTER INSERT ON message
  FOR EACH ROW BEGIN
    IF NEW.reply_to THEN
        INSERT INTO message_replies (message_id, r_1) VALUES (NEW.reply_to, NEW.message_id)
        ON DUPLICATE KEY UPDATE r_3 = r_2, r_2 = r_1, r_1 = NEW.message_id;
    ELSE
        INSERT INTO message_replies (message_id) VALUES (NEW.message_id);
    END IF;
  END;
|
DELIMITER ;

and select the messages:

SELECT m.*,r1.*,r2.*,r3.* FROM message_replies mr
LEFT JOIN message m ON m.message_id = mr.message_id
LEFT JOIN message r1 ON r1.message_id = mr.r_1
LEFT JOIN message r2 ON r2.message_id = mr.r_2
LEFT JOIN message r3 ON r3.message_id = mr.r_3

Ofcourse with the trigger preprocessing it for me this is the fastest way.

tested with a few more sets of 100k inserts to see the performance hit for the trigger
it took a .4 sec longer to process the 100k rows as it did without the tirgger
total time to insert was about 12 sec (on myIsam tables)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

梦言归人 2024-10-26 06:40:59

一个工作示例:

编辑 - (请参阅早期查询的修订版)

完整表创建和解释计划
注意:“datetable”表仅包含大约 10 年的所有日期。它仅用于生成行。

drop table if exists messages;
create table messages (
   message_id int primary key, reply_to int, createdate datetime, index(reply_to));

insert into messages 
select @n:=@n+1, floor((100000 - @n) / 10), a.thedate
from (select @n:=0) n
cross join datetable a
cross join datetable b
limit 1000000;

上面生成了 1m 条消息,以及一些有效的回复。查询:

select m1.message_id, m1.reply_to, m1.createdate, N.N, r.*
from
(
    select m.*, (
         select group_concat(r.message_id order by createdate)
          from messages r
        where r.reply_to = m.message_id) replies
     from messages m
     order by m.message_id
    limit 10
) m1
inner join ( # this union-all query controls how many replies per message
    select 1 N union all
     select 2 union all
     select 3) N
  on (m1.replies is null and N=1) or (N <= length(m1.replies)-length(replace(m1.replies,',','')))
left join messages r
  on r.message_id = substring_index(substring_index(m1.replies, ',', N), ',', -1)

时间:0.078秒

解释计划

id     select_type         table        type      possible_keys    key      key_len ref                rows    Extra
1      PRIMARY             <derived4>   ALL      (NULL)            (NULL)   (NULL)  (NULL)             3    
1      PRIMARY             <derived2>   ALL      (NULL)            (NULL)   (NULL)  (NULL)             10      Using where
1      PRIMARY             r            eq_ref   PRIMARY           PRIMARY  4       func               1    
4      DERIVED             (NULL)       (NULL)   (NULL)            (NULL)   (NULL)  (NULL)             (NULL)  No tables used
5      UNION               (NULL)       (NULL)   (NULL)            (NULL)   (NULL)  (NULL)             (NULL)  No tables used
6      UNION               (NULL)       (NULL)   (NULL)            (NULL)   (NULL)  (NULL)             (NULL)  No tables used
(NULL) UNION RESULT        <union4,5,6> ALL      (NULL)            (NULL)   (NULL)  (NULL)             (NULL)    
2      DERIVED             m            index    (NULL)            PRIMARY  4       (NULL)             1000301    
3      DEPENDENT SUBQUERY  r            ref      reply_to          reply_to 5       test.m.message_id  5       Using where

A working example:

EDIT - (see revision for earlier query)

Full table creation and explain plan
Note: The table "datetable" just contains all dates for about 10 years. It is used just to generate rows.

drop table if exists messages;
create table messages (
   message_id int primary key, reply_to int, createdate datetime, index(reply_to));

insert into messages 
select @n:=@n+1, floor((100000 - @n) / 10), a.thedate
from (select @n:=0) n
cross join datetable a
cross join datetable b
limit 1000000;

The above generates 1m messages, and some valid replies. The query:

select m1.message_id, m1.reply_to, m1.createdate, N.N, r.*
from
(
    select m.*, (
         select group_concat(r.message_id order by createdate)
          from messages r
        where r.reply_to = m.message_id) replies
     from messages m
     order by m.message_id
    limit 10
) m1
inner join ( # this union-all query controls how many replies per message
    select 1 N union all
     select 2 union all
     select 3) N
  on (m1.replies is null and N=1) or (N <= length(m1.replies)-length(replace(m1.replies,',','')))
left join messages r
  on r.message_id = substring_index(substring_index(m1.replies, ',', N), ',', -1)

Time: 0.078 sec

Explain plan

id     select_type         table        type      possible_keys    key      key_len ref                rows    Extra
1      PRIMARY             <derived4>   ALL      (NULL)            (NULL)   (NULL)  (NULL)             3    
1      PRIMARY             <derived2>   ALL      (NULL)            (NULL)   (NULL)  (NULL)             10      Using where
1      PRIMARY             r            eq_ref   PRIMARY           PRIMARY  4       func               1    
4      DERIVED             (NULL)       (NULL)   (NULL)            (NULL)   (NULL)  (NULL)             (NULL)  No tables used
5      UNION               (NULL)       (NULL)   (NULL)            (NULL)   (NULL)  (NULL)             (NULL)  No tables used
6      UNION               (NULL)       (NULL)   (NULL)            (NULL)   (NULL)  (NULL)             (NULL)  No tables used
(NULL) UNION RESULT        <union4,5,6> ALL      (NULL)            (NULL)   (NULL)  (NULL)             (NULL)    
2      DERIVED             m            index    (NULL)            PRIMARY  4       (NULL)             1000301    
3      DEPENDENT SUBQUERY  r            ref      reply_to          reply_to 5       test.m.message_id  5       Using where
笑饮青盏花 2024-10-26 06:40:59

我建议您构建额外的表,并使其根据需要执行尽可能多的步骤。有时为了形象化答案,您需要额外的步骤。最后,您可以将 SQL 编译为一个嵌套语句。

I would suggest you build your extra table, and make it work with as many steps as necessary. Sometimes to visualize the answer you need extra steps. At the end, you can compile the SQL into one nested statement.

ゃ懵逼小萝莉 2024-10-26 06:40:59

注意:这个回答为OMG的评论提供了比较有用的信息,所以即使需要删除,也请暂时保留。

OMG:检查 mysql 和“greatest-n-per-group”标签的配对——该请求很常见。
OMG:然后访问问题,如果没有回答请礼貌地告知。

我按照你的指示进行操作,天啊,这就是我的想法
https://stackoverflow.com/questions/tagged/greatest-n-per-group+mysql< /a>

  1. SQL - 每种类型仅给我 3 次点击< /a>
  2. mySQL 返回每个类别的前 5 名
  3. < a href="https://stackoverflow.com/questions/4688664/mysql-select-n-records-base-on-group-by">MySQL 基于 GROUP BY 的 SELECT n 记录

你可能误解了由于第一页结果中看起来最相似的 3 个问题(其中 2 个是我的答案),这些问题涉及整个表格的单个维度(每个类别前 n 个维度)。解决方案总是在表中提供 row_number ALL 条记录,按类别排序。

将其与针对问题域 top-n-category -> 的此问题提供的优化答案进行比较top-m-per-category,您会意识到这个问题是不同的。

无需访问问题并礼貌地告知(如果没有回答),因为

  1. 这些问题的答案是有效的
  2. 该问题的答案是有效的

Note: This answer provides useful information for comparison for OMG's comments, so even if it needs to be deleted, please leave it up for a while.

OMG: Check the pairing of mysql and "greatest-n-per-group" tags -- the request is very common.
OMG: Then visit the questions and courteously inform if not answer.

I followed your instructions OMG, and this is what I came up with from
https://stackoverflow.com/questions/tagged/greatest-n-per-group+mysql

  1. SQL - Give me 3 hits for each type only
  2. mySQL Returning the top 5 of each category
  3. MySQL SELECT n records base on GROUP BY

You may have misunderstood the question because of the 3 that looked most similar form the first page of results (2 of which are my answers), the questions deal with a single dimension (top n per category) for the entire table. The solutions offered invariably row_number ALL records in the table ordered by category.

Compare that to the optimized answer provided for this question for the problem domain top-n-category -> top-m-per-category and you will realize that this question is a different one.

There is no need to visit the questions and courteously inform if not answer because

  1. The answer to those questions are valid
  2. The answer to this question is valid
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文