SQL 仅选择列上具有最大值的行

发布于 2024-12-10 03:07:52 字数 593 浏览 5 评论 0原文

我有这个文档表格（此处为简化版本）：

id	rev	content
1	1	...
2	1	...
1	2	...
1	3	...

如何为每个 id 选择一行且仅选择最大的 rev？< br> 使用上述数据，结果应包含两行：[1, 3, ...] 和 [2, 1, ..]。我正在使用MySQL。

目前，我在 while 循环中使用检查来检测并覆盖结果集中的旧转速。但这是实现这一结果的唯一方法吗？没有SQL解决方案吗？

原文

I have this table for documents (simplified version here):

id	rev	content
1	1	...
2	1	...
1	2	...
1	3	...

How do I select one row per id and only the greatest rev?
With the above data, the result should contain two rows: [1, 3, ...] and [2, 1, ..]. I'm using MySQL.

Currently I use checks in the while loop to detect and over-write old revs from the resultset. But is this the only method to achieve the result? Isn't there a SQL solution?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

跨年 2024-12-17 03:07:52

乍一看...

您所需要的只是一个带有 MAX 聚合函数的 GROUP BY 子句：

SELECT id, MAX(rev)
FROM YourTable
GROUP BY id

事情从来没有那么简单，不是吗？

我刚刚注意到您还需要 content 列。

这是 SQL 中一个非常常见的问题：在每个组标识符的列中查找具有某个最大值的行的全部数据。在我的职业生涯中我经常听到这样的说法。事实上，这是我在当前工作的技术面试中回答的问题之一。

实际上，Stack Overflow 社区创建了一个标签来处理这样的问题：greatest-n-per-group。

基本上，您有两种方法来解决该问题：

使用简单的 `group-identifier, max-value-in-group` 连接子查询

在这种方法中，您首先找到 group-identifier，子查询中的 max-value-in-group （上面已解决）。然后，您将表连接到子查询，并在 group-identifier 和 max-value-in-group 上都相等：

SELECT a.id, a.rev, a.contents
FROM YourTable a
INNER JOIN (
    SELECT id, MAX(rev) rev
    FROM YourTable
    GROUP BY id
) b ON a.id = b.id AND a.rev = b.rev

使用 self 进行左连接，调整连接条件和过滤器

在这种方法中，您将表与其自身左连接。平等存在于group-identifier中。然后，2个聪明的举动：

第二个连接条件是左侧值小于右侧值
当您执行步骤1时，实际具有最大值的行将在右侧具有NULL （这是一个LEFT JOIN，还记得吗？）。然后，我们过滤连接结果，仅显示右侧为 NULL 的行。

所以你最终会得到：

SELECT a.*
FROM YourTable a
LEFT OUTER JOIN YourTable b
    ON a.id = b.id AND a.rev < b.rev
WHERE b.id IS NULL;

结论

两种方法都会带来完全相同的结果。

如果您有两行 group-identifier 具有 max-value-in-group，则这两行都将出现在两种方法的结果中。

这两种方法都与 SQL ANSI 兼容，因此，无论其“风格”如何，都可以与您最喜欢的 RDBMS 配合使用。

这两种方法也都是性能友好的，但是您的情况可能会有所不同（RDBMS、DB 结构、索引等）。因此，当您选择一种方法而不是另一种方法时，基准。并确保您选择对您最有意义的一个。

At first glance...

All you need is a GROUP BY clause with the MAX aggregate function:

SELECT id, MAX(rev)
FROM YourTable
GROUP BY id

It's never that simple, is it?

I just noticed you need the content column as well.

This is a very common question in SQL: find the whole data for the row with some max value in a column per some group identifier. I heard that a lot during my career. Actually, it was one the questions I answered in my current job's technical interview.

It is, actually, so common that Stack Overflow community has created a single tag just to deal with questions like that: greatest-n-per-group.

Basically, you have two approaches to solve that problem:

Joining with simple `group-identifier, max-value-in-group` Sub-query

In this approach, you first find the group-identifier, max-value-in-group (already solved above) in a sub-query. Then you join your table to the sub-query with equality on both group-identifier and max-value-in-group:

SELECT a.id, a.rev, a.contents
FROM YourTable a
INNER JOIN (
    SELECT id, MAX(rev) rev
    FROM YourTable
    GROUP BY id
) b ON a.id = b.id AND a.rev = b.rev

Left Joining with self, tweaking join conditions and filters

In this approach, you left join the table with itself. Equality goes in the group-identifier. Then, 2 smart moves:

The second join condition is having left side value less than right value
When you do step 1, the row(s) that actually have the max value will have NULL in the right side (it's a LEFT JOIN, remember?). Then, we filter the joined result, showing only the rows where the right side is NULL.

So you end up with:

SELECT a.*
FROM YourTable a
LEFT OUTER JOIN YourTable b
    ON a.id = b.id AND a.rev < b.rev
WHERE b.id IS NULL;

Conclusion

Both approaches bring the exact same result.

If you have two rows with max-value-in-group for group-identifier, both rows will be in the result in both approaches.

Both approaches are SQL ANSI compatible, thus, will work with your favorite RDBMS, regardless of its "flavor".

Both approaches are also performance friendly, however your mileage may vary (RDBMS, DB Structure, Indexes, etc.). So when you pick one approach over the other, benchmark. And make sure you pick the one which make most of sense to you.

SQL 仅选择列上具有最大值的行

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（27）

乍一看...

事情从来没有那么简单，不是吗？

使用简单的 group-identifier, max-value-in-group 连接子查询

使用 self 进行左连接，调整连接条件和过滤器

结论

At first glance...

It's never that simple, is it?

Joining with simple group-identifier, max-value-in-group Sub-query

Left Joining with self, tweaking join conditions and filters

Conclusion

唯一标识符？是的！唯一标识符！

最干净的解决方案

原始解决方案

唯一行解决方案

Unique Identifiers? Yes! Unique identifiers!

Cleanest Solution

Original Solution

Unique-Row Solution

在 MS SQL 这是 SqlFiddle 中的示例

Here's an example in SqlFiddle

说明

代码

Explanation

Code

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

使用简单的 `group-identifier, max-value-in-group` 连接子查询

Joining with simple `group-identifier, max-value-in-group` Sub-query