获取最后一组不同的记录
我有一个包含以下列的数据库表:
id code value datetime timestamp
在此表中,唯一的值驻留在 id 即主键中。
我想根据日期时间值检索该表中最后一组不同的记录。例如,假设下面是我的表,
id code value datetime timestamp
1 1023 23.56 2011-04-05 14:54:52 1234223421
2 1024 23.56 2011-04-05 14:55:52 1234223423
3 1025 23.56 2011-04-05 14:56:52 1234223424
4 1023 23.56 2011-04-05 14:57:52 1234223425
5 1025 23.56 2011-04-05 14:58:52 1234223426
6 1025 23.56 2011-04-05 14:59:52 1234223427
7 1024 23.56 2011-04-05 15:00:12 1234223428
8 1026 23.56 2011-04-05 15:01:14 1234223429
9 1025 23.56 2011-04-05 15:02:22 1234223430
我想要检索 ID 为 4、7、8 和 9 的记录,即具有不同代码的最后一组记录(基于日期时间值)。我所强调的只是我想要实现的目标的一个示例,因为该表最终将包含数百万条记录和数百个单独的代码值。
我可以使用什么 SQL 语句来实现此目的?我似乎无法用一条 SQL 语句来完成它。我的数据库是MySQL 5。
I have a database table containing the following columns:
id code value datetime timestamp
In this table the only unique values reside in id i.e. primary key.
I want to retrieve the last distinct set of records in this table based on the datetime value. For example, let's say below is my table
id code value datetime timestamp
1 1023 23.56 2011-04-05 14:54:52 1234223421
2 1024 23.56 2011-04-05 14:55:52 1234223423
3 1025 23.56 2011-04-05 14:56:52 1234223424
4 1023 23.56 2011-04-05 14:57:52 1234223425
5 1025 23.56 2011-04-05 14:58:52 1234223426
6 1025 23.56 2011-04-05 14:59:52 1234223427
7 1024 23.56 2011-04-05 15:00:12 1234223428
8 1026 23.56 2011-04-05 15:01:14 1234223429
9 1025 23.56 2011-04-05 15:02:22 1234223430
I want to retrieve the records with IDs 4, 7, 8, and 9 i.e. the last set of records with distinct codes (based on datetime value). What I have highlighted is simply an example of what I'm trying to achieve, as this table is going to eventually contain millions of records, and hundreds of individual code values.
What SQL statement can I use to achieve this? I can't seem to get it done with a single SQL statement. My database is MySQL 5.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这应该对你有用。
如果 id 为 AUTO_INCRMENT,则无需担心计算成本要高得多的日期时间,因为最近的日期时间也将具有最高的 id。
更新:从性能角度来看,请确保在处理大量记录时对
id
和code
列建立索引。如果id
是主键,则这是内置的,但您可能需要添加覆盖code
和id
的非聚集索引。This should work for you.
If id is AUTO_INCREMENT, there's no need to worry about the datetime which is far more expensive to compute, as the most recent datetime will also have the highest id.
Update: From a performance standpoint, make sure the
id
andcode
columns are indexed when dealing with a large number of records. Ifid
is the primary key, this is built in, but you may need to add a non-clustered index coveringcode
andid
.试试这个:
Try this:
这是旧帖子,但用大表测试 @smdrager 答案非常慢。我对此的解决方法是使用“内部联接”而不是“其中”。
这工作得非常快。
It's and old post, but testing @smdrager answer with large tables was very slow. My fix to this was using "inner join" instead of "where in".
This worked really fast.
我会尝试这样的事情:(
免责声明:这未经测试)
如果具有较大日期时间的行也具有较大的 id,则 smdrager 提出的解决方案会更快。
I'll try something like this :
(disclaimer: this is not tested)
If the row with the bigger datetime also have the bigger id, the solution proposed by smdrager is quicker.
看起来所有现有的答案都建议对整个表执行
GROUP BY code
。当逻辑上正确时,实际上这个查询将遍历整个(!)表(使用EXPLAIN
来确保)。就我而言,表中的行数少于 500k,执行...GROUP BY 代码
需要 0.3 秒,这是绝对不可接受的。不过,我可以在这里使用我的数据知识(读作“显示帖子的最后评论”):
代码
的数量>> 。通过对数字进行试验,我发现如果我只选择最后 50 条记录,我总是可以找到 20 个不同的
code
在这种情况下,以下查询有效(请记住 @smdrager 关于使用id
而不是datetime
的可能性很高的评论)仅选择最后 50 个条目非常快,因为它不不需要检查整个表。剩下的就是从这 50 个条目中选择具有不同
code
的前 20 个条目。显然,对 50 (100, 500) 个元素的集合的查询比对具有数十万个条目的整个表的查询要快得多。
原始 SQL“后处理”
这将为您快速提供
id
列表,如果您想执行其他 JOIN,请将此查询作为另一个嵌套查询并执行所有操作加入它。后端“后处理”
之后,您需要用编程语言处理数据,以仅将具有不同
代码
的记录包含到最终集中。某种 Python 伪代码:
Looks like all existing answers suggest to do
GROUP BY code
on the whole table. When it's logically correct, in reality this query will go through the whole(!) table (useEXPLAIN
to make sure). In my case, I have less than 500k of rows in the table and executing...GROUP BY code
takes 0.3 seconds which is absolutely not acceptable.However I can use knowledge of my data here (read as "show last comments for posts"):
code
's >> amount of "top" records you want to getBy experimenting with numbers I found out that I can always find 20 different
code
if I select just last 50 records. And in this case following query works (keeping in mind @smdrager comment about high probability to useid
instead ofdatetime
)Selecting just last 50 entries is super quick, because it doesn't need to check the whole table. And the rest is to select top-20 with distinct
code
out of those 50 entries.Obviously, queries on the set of 50 (100, 500) elements are significantly faster than on the whole table with hundreds of thousands entries.
Raw SQL "Postprocessing"
This will give you list of
id
's really quick and if you want to perform additional JOINs, put this query as yet another nested query and perform all joins on it.Backend-side "Postprocessing"
And after that you need to process the data in your programming language to include to the final set only the records with distinct
code
.Some kind of Python pseudocode: