获取最后一组不同的记录

发布于 2024-10-29 18:56:07 字数 898 浏览 6 评论 0原文

我有一个包含以下列的数据库表：

id   code   value   datetime   timestamp

在此表中，唯一的值驻留在 id 即主键中。

我想根据日期时间值检索该表中最后一组不同的记录。例如，假设下面是我的表，

id   code   value   datetime               timestamp
1    1023   23.56   2011-04-05 14:54:52    1234223421
2    1024   23.56   2011-04-05 14:55:52    1234223423
3    1025   23.56   2011-04-05 14:56:52    1234223424
4    1023   23.56   2011-04-05 14:57:52    1234223425
5    1025   23.56   2011-04-05 14:58:52    1234223426
6    1025   23.56   2011-04-05 14:59:52    1234223427
7    1024   23.56   2011-04-05 15:00:12    1234223428
8    1026   23.56   2011-04-05 15:01:14    1234223429
9    1025   23.56   2011-04-05 15:02:22    1234223430

我想要检索 ID 为 4、7、8 和 9 的记录，即具有不同代码的最后一组记录（基于日期时间值）。我所强调的只是我想要实现的目标的一个示例，因为该表最终将包含数百万条记录和数百个单独的代码值。

我可以使用什么 SQL 语句来实现此目的？我似乎无法用一条 SQL 语句来完成它。我的数据库是MySQL 5。

原文

I have a database table containing the following columns:

id   code   value   datetime   timestamp

In this table the only unique values reside in id i.e. primary key.

I want to retrieve the last distinct set of records in this table based on the datetime value. For example, let's say below is my table

id   code   value   datetime               timestamp
1    1023   23.56   2011-04-05 14:54:52    1234223421
2    1024   23.56   2011-04-05 14:55:52    1234223423
3    1025   23.56   2011-04-05 14:56:52    1234223424
4    1023   23.56   2011-04-05 14:57:52    1234223425
5    1025   23.56   2011-04-05 14:58:52    1234223426
6    1025   23.56   2011-04-05 14:59:52    1234223427
7    1024   23.56   2011-04-05 15:00:12    1234223428
8    1026   23.56   2011-04-05 15:01:14    1234223429
9    1025   23.56   2011-04-05 15:02:22    1234223430

I want to retrieve the records with IDs 4, 7, 8, and 9 i.e. the last set of records with distinct codes (based on datetime value). What I have highlighted is simply an example of what I'm trying to achieve, as this table is going to eventually contain millions of records, and hundreds of individual code values.

What SQL statement can I use to achieve this? I can't seem to get it done with a single SQL statement. My database is MySQL 5.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

摇划花蜜的午后 2024-11-05 18:56:07

这应该对你有用。

 SELECT * 
 FROM [tableName] 
 WHERE id IN (SELECT MAX(id) FROM [tableName] GROUP BY code)

如果 id 为 AUTO_INCRMENT，则无需担心计算成本要高得多的日期时间，因为最近的日期时间也将具有最高的 id。

更新：从性能角度来看，请确保在处理大量记录时对 id 和 code 列建立索引。如果 id 是主键，则这是内置的，但您可能需要添加覆盖 code 和 id 的非聚集索引。

This should work for you.

 SELECT * 
 FROM [tableName] 
 WHERE id IN (SELECT MAX(id) FROM [tableName] GROUP BY code)

If id is AUTO_INCREMENT, there's no need to worry about the datetime which is far more expensive to compute, as the most recent datetime will also have the highest id.

Update: From a performance standpoint, make sure the id and code columns are indexed when dealing with a large number of records. If id is the primary key, this is built in, but you may need to add a non-clustered index covering code and id.

回复收藏 0 原文

‘画卷フ 2024-11-05 18:56:07

试试这个：

SELECT * 
  FROM <YOUR_TABLE>
 WHERE (code, datetime, timestamp) IN
 (
   SELECT code, MAX(datetime), MAX(timestamp)
     FROM <YOUR_TABLE>
    GROUP BY code
 )

Try this:

SELECT * 
  FROM <YOUR_TABLE>
 WHERE (code, datetime, timestamp) IN
 (
   SELECT code, MAX(datetime), MAX(timestamp)
     FROM <YOUR_TABLE>
    GROUP BY code
 )

回复收藏 0 原文

攒一口袋星星 2024-11-05 18:56:07

这是旧帖子，但用大表测试 @smdrager 答案非常慢。我对此的解决方法是使用“内部联接”而不是“其中”。

SELECT * 
 FROM [tableName] as t1
 INNER JOIN (SELECT MAX(id) as id FROM [tableName] GROUP BY code) as t2
 ON t1.id = t2.id

这工作得非常快。

It's and old post, but testing @smdrager answer with large tables was very slow. My fix to this was using "inner join" instead of "where in".

SELECT * 
 FROM [tableName] as t1
 INNER JOIN (SELECT MAX(id) as id FROM [tableName] GROUP BY code) as t2
 ON t1.id = t2.id

This worked really fast.

回复收藏 0 原文

所有深爱都是秘密 2024-11-05 18:56:07

我会尝试这样的事情：（

select * from table
where id in (
    select id
    from table
    group by code
    having datetime = max(datetime)
)

免责声明：这未经测试）

如果具有较大日期时间的行也具有较大的 id，则 smdrager 提出的解决方案会更快。

I'll try something like this :

select * from table
where id in (
    select id
    from table
    group by code
    having datetime = max(datetime)
)

(disclaimer: this is not tested)

If the row with the bigger datetime also have the bigger id, the solution proposed by smdrager is quicker.

回复收藏 0 原文

無心 2024-11-05 18:56:07

看起来所有现有的答案都建议对整个表执行GROUP BY code。当逻辑上正确时，实际上这个查询将遍历整个（！）表（使用EXPLAIN来确保）。就我而言，表中的行数少于 500k，执行 ...GROUP BY 代码需要 0.3 秒，这是绝对不可接受的。

不过，我可以在这里使用我的数据知识（读作“显示帖子的最后评论”）：

我需要只选择前 20 条记录
最后 X 条记录中具有相同代码的记录数量相对较小（~评论的均匀分布）帖子，没有获得所有最近评论的“病毒式”帖子）
记录总数>>>可用代码的数量>> 。

通过对数字进行试验，我发现如果我只选择最后 50 条记录，我总是可以找到 20 个不同的 code 在这种情况下，以下查询有效（请记住 @smdrager 关于使用 id 而不是 datetime 的可能性很高的评论）

SELECT id, code
FROM tablename
ORDER BY id DESC 
LIMIT 50

仅选择最后 50 个条目非常快，因为它不不需要检查整个表。剩下的就是从这 50 个条目中选择具有不同 code 的前 20 个条目。

显然，对 50 (100, 500) 个元素的集合的查询比对具有数十万个条目的整个表的查询要快得多。

原始 SQL“后处理”

SELECT MAX(id) as id, code FROM 
    (SELECT id, code
     FROM tablename
     ORDER BY id DESC 
     LIMIT 50) AS nested 
GROUP BY code
ORDER BY id DESC 
LIMIT 20

这将为您快速提供 id 列表，如果您想执行其他 JOIN，请将此查询作为另一个嵌套查询并执行所有操作加入它。

后端“后处理”

之后，您需要用编程语言处理数据，以仅将具有不同代码的记录包含到最终集中。

某种 Python 伪代码：

records = select_simple_top_records(50)
added_codes = set()
top_records = []
for record in records:
    # If record for this code was already found before
    # Note: this is not optimal, better to use structure allowing O(1) search and insert
    if record['code'] in added_codes:
        continue
    # Save record
    top_records.append(record)
    added_codes.add(record['code'])
    # If we found all top-20 required, finish
    if len(top_records) >= 20:
        break

Looks like all existing answers suggest to do GROUP BY code on the whole table. When it's logically correct, in reality this query will go through the whole(!) table (use EXPLAIN to make sure). In my case, I have less than 500k of rows in the table and executing ...GROUP BY codetakes 0.3 seconds which is absolutely not acceptable.

However I can use knowledge of my data here (read as "show last comments for posts"):

I need to select just top-20 records
Amount of records with same code across last X records is relatively small (~uniform distribution of comments across posts, there are no "viral" post which got all the recent comments)
Total amount of records >> amount of available code's >> amount of "top" records you want to get

By experimenting with numbers I found out that I can always find 20 different code if I select just last 50 records. And in this case following query works (keeping in mind @smdrager comment about high probability to use id instead of datetime)

SELECT id, code
FROM tablename
ORDER BY id DESC 
LIMIT 50

Selecting just last 50 entries is super quick, because it doesn't need to check the whole table. And the rest is to select top-20 with distinct code out of those 50 entries.

Obviously, queries on the set of 50 (100, 500) elements are significantly faster than on the whole table with hundreds of thousands entries.

Raw SQL "Postprocessing"

SELECT MAX(id) as id, code FROM 
    (SELECT id, code
     FROM tablename
     ORDER BY id DESC 
     LIMIT 50) AS nested 
GROUP BY code
ORDER BY id DESC 
LIMIT 20

This will give you list of id's really quick and if you want to perform additional JOINs, put this query as yet another nested query and perform all joins on it.

Backend-side "Postprocessing"

And after that you need to process the data in your programming language to include to the final set only the records with distinct code.

Some kind of Python pseudocode:

records = select_simple_top_records(50)
added_codes = set()
top_records = []
for record in records:
    # If record for this code was already found before
    # Note: this is not optimal, better to use structure allowing O(1) search and insert
    if record['code'] in added_codes:
        continue
    # Save record
    top_records.append(record)
    added_codes.add(record['code'])
    # If we found all top-20 required, finish
    if len(top_records) >= 20:
        break

回复收藏 0 原文

~没有更多了~