SQL 仅选择列上具有最大值的行

发布于 2024-12-10 03:07:52 字数 593 浏览 0 评论 0原文

我有这个文档表格(此处为简化版本):

idrevcontent
11...
21...
12...
13...

如何为每个 id 选择一行且仅选择最大的 rev?< br> 使用上述数据,结果应包含两行:[1, 3, ...][2, 1, ..]。我正在使用MySQL

目前,我在 while 循环中使用检查来检测并覆盖结果集中的旧转速。但这是实现这一结果的唯一方法吗?没有SQL解决方案吗?

I have this table for documents (simplified version here):

idrevcontent
11...
21...
12...
13...

How do I select one row per id and only the greatest rev?
With the above data, the result should contain two rows: [1, 3, ...] and [2, 1, ..]. I'm using MySQL.

Currently I use checks in the while loop to detect and over-write old revs from the resultset. But is this the only method to achieve the result? Isn't there a SQL solution?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(27

跨年 2024-12-17 03:07:52

乍一看...

您所需要的只是一个带有 MAX 聚合函数的 GROUP BY 子句:

SELECT id, MAX(rev)
FROM YourTable
GROUP BY id

事情从来没有那么简单,不是吗?

我刚刚注意到您还需要 content 列。

这是 SQL 中一个非常常见的问题:在每个组标识符的列中查找具有某个最大值的行的全部数据。在我的职业生涯中我经常听到这样的说法。事实上,这是我在当前工作的技术面试中回答的问题之一。

实际上,Stack Overflow 社区创建了一个标签来处理这样的问题:

基本上,您有两种方法来解决该问题:

使用简单的 group-identifier, max-value-in-group 连接子查询

在这种方法中,您首先找到 group-identifier,子查询中的 max-value-in-group (上面已解决)。然后,您将表连接到子查询,并在 group-identifiermax-value-in-group 上都相等:

SELECT a.id, a.rev, a.contents
FROM YourTable a
INNER JOIN (
    SELECT id, MAX(rev) rev
    FROM YourTable
    GROUP BY id
) b ON a.id = b.id AND a.rev = b.rev

使用 self 进行左连接,调整连接条件和过滤器

在这种方法中,您将表与其自身左连接。平等存在于group-identifier中。然后,2个聪明的举动:

  1. 第二个连接条件是左侧值小于右侧值
  2. 当您执行步骤1时,实际具有最大值的行将在右侧具有NULL (这是一个LEFT JOIN,还记得吗?)。然后,我们过滤连接结果,仅显示右侧为 NULL 的行。

所以你最终会得到:

SELECT a.*
FROM YourTable a
LEFT OUTER JOIN YourTable b
    ON a.id = b.id AND a.rev < b.rev
WHERE b.id IS NULL;

结论

两种方法都会带来完全相同的结果。

如果您有两行 group-identifier 具有 max-value-in-group,则这两行都将出现在两种方法的结果中。

这两种方法都与 SQL ANSI 兼容,因此,无论其“风格”如何,都可以与您最喜欢的 RDBMS 配合使用。

这两种方法也都是性能友好的,但是您的情况可能会有所不同(RDBMS、DB 结构、索引等)。因此,当您选择一种方法而不是另一种方法时,基准。并确保您选择对您最有意义的一个。

At first glance...

All you need is a GROUP BY clause with the MAX aggregate function:

SELECT id, MAX(rev)
FROM YourTable
GROUP BY id

It's never that simple, is it?

I just noticed you need the content column as well.

This is a very common question in SQL: find the whole data for the row with some max value in a column per some group identifier. I heard that a lot during my career. Actually, it was one the questions I answered in my current job's technical interview.

It is, actually, so common that Stack Overflow community has created a single tag just to deal with questions like that: .

Basically, you have two approaches to solve that problem:

Joining with simple group-identifier, max-value-in-group Sub-query

In this approach, you first find the group-identifier, max-value-in-group (already solved above) in a sub-query. Then you join your table to the sub-query with equality on both group-identifier and max-value-in-group:

SELECT a.id, a.rev, a.contents
FROM YourTable a
INNER JOIN (
    SELECT id, MAX(rev) rev
    FROM YourTable
    GROUP BY id
) b ON a.id = b.id AND a.rev = b.rev

Left Joining with self, tweaking join conditions and filters

In this approach, you left join the table with itself. Equality goes in the group-identifier. Then, 2 smart moves:

  1. The second join condition is having left side value less than right value
  2. When you do step 1, the row(s) that actually have the max value will have NULL in the right side (it's a LEFT JOIN, remember?). Then, we filter the joined result, showing only the rows where the right side is NULL.

So you end up with:

SELECT a.*
FROM YourTable a
LEFT OUTER JOIN YourTable b
    ON a.id = b.id AND a.rev < b.rev
WHERE b.id IS NULL;

Conclusion

Both approaches bring the exact same result.

If you have two rows with max-value-in-group for group-identifier, both rows will be in the result in both approaches.

Both approaches are SQL ANSI compatible, thus, will work with your favorite RDBMS, regardless of its "flavor".

Both approaches are also performance friendly, however your mileage may vary (RDBMS, DB Structure, Indexes, etc.). So when you pick one approach over the other, benchmark. And make sure you pick the one which make most of sense to you.

总以为 2024-12-17 03:07:52

我的偏好是使用尽可能少的代码...

您可以使用 IN 来完成
试试这个:

SELECT * 
FROM t1 WHERE (id,rev) IN 
( SELECT id, MAX(rev)
  FROM t1
  GROUP BY id
)

在我看来,它不那么复杂......更容易阅读和维护。

My preference is to use as little code as possible...

You can do it using IN
try this:

SELECT * 
FROM t1 WHERE (id,rev) IN 
( SELECT id, MAX(rev)
  FROM t1
  GROUP BY id
)

to my mind it is less complicated... easier to read and maintain.

赠佳期 2024-12-17 03:07:52

令我大吃一惊的是,没有答案提供 SQL 窗口函数解决方案:

SELECT a.id, a.rev, a.contents
  FROM (SELECT id, rev, contents,
               ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) ranked_order
          FROM YourTable) a
 WHERE a.ranked_order = 1 

在 SQL 标准 ANSI/ISO 标准 SQL:2003 中添加,后来通过 ANSI/ISO 标准 SQL:2008 进行扩展,现在所有主要供应商都可以使用窗口(或窗口)函数。有更多类型的排名函数可用于处理平局问题:RANK、DENSE_RANK、PERSENT_RANK

I am flabbergasted that no answer offered SQL window function solution:

SELECT a.id, a.rev, a.contents
  FROM (SELECT id, rev, contents,
               ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) ranked_order
          FROM YourTable) a
 WHERE a.ranked_order = 1 

Added in SQL standard ANSI/ISO Standard SQL:2003 and later extended with ANSI/ISO Standard SQL:2008, window (or windowing) functions are available with all major vendors now. There are more types of rank functions available to deal with a tie issue: RANK, DENSE_RANK, PERSENT_RANK.

遗弃M 2024-12-17 03:07:52

另一种解决方案是使用相关子查询:

select yt.id, yt.rev, yt.contents
    from YourTable yt
    where rev = 
        (select max(rev) from YourTable st where yt.id=st.id)

在 (id,rev) 上建立索引几乎可以将子查询呈现为简单的查找...

以下是与 @AdrianCarneiro 的答案(子查询,leftjoin)中的解决方案的比较,基于 MySQL 测量InnoDB 表约有 100 万条记录,组大小为:1-3。

对于全表扫描,子查询/左连接/相关计时相互关联为 6/8/9,而当涉及直接查找或批处理(id in (1,2,3))时,子查询比其他查询慢得多(由于重新运行子查询)。然而,我无法在速度上区分左连接和相关解决方案。

最后一点,由于 leftjoin 在组中创建 n*(n+1)/2 个连接,因此其性能可能会受到组大小的严重影响......

Yet another solution is to use a correlated subquery:

select yt.id, yt.rev, yt.contents
    from YourTable yt
    where rev = 
        (select max(rev) from YourTable st where yt.id=st.id)

Having an index on (id,rev) renders the subquery almost as a simple lookup...

Following are comparisons to the solutions in @AdrianCarneiro's answer (subquery, leftjoin), based on MySQL measurements with InnoDB table of ~1million records, group size being: 1-3.

While for full table scans subquery/leftjoin/correlated timings relate to each other as 6/8/9, when it comes to direct lookups or batch (id in (1,2,3)), subquery is much slower then the others (Due to rerunning the subquery). However I couldnt differentiate between leftjoin and correlated solutions in speed.

One final note, as leftjoin creates n*(n+1)/2 joins in groups, its performance can be heavily affected by the size of groups...

诗化ㄋ丶相逢 2024-12-17 03:07:52

我无法保证性能,但这里有一个受 Microsoft Excel 限制启发的技巧。它有一些很好的功能

好东西

  • 它应该强制只返回一个“最大记录”,即使存在平局(有时有用)
  • 它不需要连接

APPROACH

它有点难看,并且要求您了解 rev 列的有效值范围。假设我们知道 rev 列是 0.00 到 999 之间的数字(包括小数),但小数点右侧只会有两位数字(例如34.17 将是一个有效值)。

要点是,您通过字符串连接/打包主要比较字段以及所需的数据来创建单个合成列。通过这种方式,您可以强制 SQL 的 MAX() 聚合函数返回所有数据(因为它已被打包到单个列中)。然后你必须解压数据。

下面是用 SQL 编写的上述示例的外观。

SELECT id, 
       CAST(SUBSTRING(max(packed_col) FROM 2 FOR 6) AS float) as max_rev,
       SUBSTRING(max(packed_col) FROM 11) AS content_for_max_rev 
FROM  (SELECT id, 
       CAST(1000 + rev + .001 as CHAR) || '---' || CAST(content AS char) AS packed_col
       FROM yourtable
      ) 
GROUP BY id

打包首先强制 rev 列为多个已知字符长度,而不管 的值是什么rev 例如,

  • 3.2 变为 1003.201
  • 57 变为 1057.001
  • 923.88 变为 1923.881

如果你做得正确,两个数字的字符串比较应该产生与两个数字的数字比较相同的“最大值”,并且使用子字符串函数很容易转换回原始数字(几乎到处都可以以一种或另一种形式使用)。

I can't vouch for the performance, but here's a trick inspired by the limitations of Microsoft Excel. It has some good features

GOOD STUFF

  • It should force return of only one "max record" even if there is a tie (sometimes useful)
  • It doesn't require a join

APPROACH

It is a little bit ugly and requires that you know something about the range of valid values of the rev column. Let us assume that we know the rev column is a number between 0.00 and 999 including decimals but that there will only ever be two digits to the right of the decimal point (e.g. 34.17 would be a valid value).

The gist of the thing is that you create a single synthetic column by string concatenating/packing the primary comparison field along with the data you want. In this way, you can force SQL's MAX() aggregate function to return all of the data (because it has been packed into a single column). Then you have to unpack the data.

Here's how it looks with the above example, written in SQL

SELECT id, 
       CAST(SUBSTRING(max(packed_col) FROM 2 FOR 6) AS float) as max_rev,
       SUBSTRING(max(packed_col) FROM 11) AS content_for_max_rev 
FROM  (SELECT id, 
       CAST(1000 + rev + .001 as CHAR) || '---' || CAST(content AS char) AS packed_col
       FROM yourtable
      ) 
GROUP BY id

The packing begins by forcing the rev column to be a number of known character length regardless of the value of rev so that for example

  • 3.2 becomes 1003.201
  • 57 becomes 1057.001
  • 923.88 becomes 1923.881

If you do it right, string comparison of two numbers should yield the same "max" as numeric comparison of the two numbers and it's easy to convert back to the original number using the substring function (which is available in one form or another pretty much everywhere).

執念 2024-12-17 03:07:52

唯一标识符?是的!唯一标识符!

开发 MySQL 数据库的最佳方法之一是 每个 id AUTOINCRMENT (来源 MySQL.com)。这具有多种优点,这里无法一一介绍。这个问题的问题在于它的示例有重复的 id。这忽视了唯一标识符的巨大优势,同时也让那些已经熟悉这一点的人感到困惑。

最干净的解决方案

DB Fiddle

较新版本的MySQL附带ONLY_FULL_GROUP_BY 默认情况下启用,这里的许多解决方案在这种情况下的测试中都会失败

即便如此,我们也可以简单地选择 DISTINCT someuniquefieldMAX( whateverotherfieldtoselect ), ( *somethirdfield ) 等,无需担心了解结果或查询如何工作:

SELECT DISTINCT t1.id, MAX(t1.rev), MAX(t2.content)
FROM Table1 AS t1
JOIN Table1 AS t2 ON t2.id = t1.id AND t2.rev = (
    SELECT MAX(rev) FROM Table1 t3 WHERE t3.id = t1.id
)
GROUP BY t1.id;
  • SELECT DISTINCT Table1.id, max(Table1.rev) , max(Table2.content):返回DISTINCT某个字段,MAX()某个其他字段,最后一个MAX()是多余的,因为我知道这只是一行,但查询需要它。
  • FROM Employee:搜索表。
  • JOIN Table1 AS Table2 ON Table2.rev = Table1.rev :将第二个表连接到第一个表上,因为我们需要获取 max(table1.rev) 的注释。
  • GROUP BY Table1.id:强制将每个员工的顶部排序的 Salary 行作为返回结果。

请注意,由于OP问题中的“内容”是“...”,因此无法测试其是否有效。因此,我将其更改为“..a”、“..b”,因此,我们现在实际上可以看到结果是正确的:

id  max(Table1.rev) max(Table2.content)
1   3   ..d
2   1   ..b

为什么它是干净的? DISTINCT()< /code>、MAX() 等都很好地利用了 MySQL 索引。这样会更快。或者,如果您有索引,并将其与查看所有行的查询进行比较,它会快得多。

原始解决方案

禁用 ONLY_FULL_GROUP_BY 后,我们仍然可以使用 GROUP BY,但我们只在 Salary 上使用它,而不是 id:

SELECT *
FROM
    (SELECT *
    FROM Employee
    ORDER BY Salary DESC)
AS employeesub
GROUP BY employeesub.Salary;
  • SELECT *< /code> :返回所有字段。
  • FROM Employee:搜索表。
  • (SELECT *...) 子查询:返回所有人员,按薪水排序。
  • GROUP BYemployeesub.Salary:强制将每个员工的顶部排序的 Salary 行作为返回结果。

唯一行解决方案

请注意关系数据库的定义:“表中的每一行都有自己的行唯一的钥匙。”这意味着,在问题的示例中,id必须是唯一的,在这种情况下,我们可以这样做:

SELECT *
FROM Employee
WHERE Employee.id = 12345
ORDER BY Employee.Salary DESC
LIMIT 1

希望这是一个解决问题的解决方案,并帮助每个人更好地理解正在发生的事情在数据库中。

Unique Identifiers? Yes! Unique identifiers!

One of the best ways to develop a MySQL DB is to have each id AUTOINCREMENT (Source MySQL.com). This allows a variety of advantages, too many to cover here. The problem with the question is that its example has duplicate ids. This disregards these tremendous advantages of unique identifiers, and at the same time, is confusing to those familiar with this already.

Cleanest Solution

DB Fiddle

Newer versions of MySQL come with ONLY_FULL_GROUP_BY enabled by default, and many of the solutions here will fail in testing with this condition.

Even so, we can simply select DISTINCT someuniquefield, MAX( whateverotherfieldtoselect ), ( *somethirdfield ), etc., and have no worries understanding the result or how the query works :

SELECT DISTINCT t1.id, MAX(t1.rev), MAX(t2.content)
FROM Table1 AS t1
JOIN Table1 AS t2 ON t2.id = t1.id AND t2.rev = (
    SELECT MAX(rev) FROM Table1 t3 WHERE t3.id = t1.id
)
GROUP BY t1.id;
  • SELECT DISTINCT Table1.id, max(Table1.rev), max(Table2.content) : Return DISTINCT somefield, MAX() some otherfield, the last MAX() is redundant, because I know it's just one row, but it's required by the query.
  • FROM Employee : Table searched on.
  • JOIN Table1 AS Table2 ON Table2.rev = Table1.rev : Join the second table on the first, because, we need to get the max(table1.rev)'s comment.
  • GROUP BY Table1.id: Force the top-sorted, Salary row of each employee to be the returned result.

Note that since "content" was "..." in OP's question, there's no way to test that this works. So, I changed that to "..a", "..b", so, we can actually now see that the results are correct:

id  max(Table1.rev) max(Table2.content)
1   3   ..d
2   1   ..b

Why is it clean? DISTINCT(), MAX(), etc., all make wonderful use of MySQL indices. This will be faster. Or, it will be much faster, if you have indexing, and you compare it to a query that looks at all rows.

Original Solution

With ONLY_FULL_GROUP_BY disabled, we can use still use GROUP BY, but then we are only using it on the Salary, and not the id:

SELECT *
FROM
    (SELECT *
    FROM Employee
    ORDER BY Salary DESC)
AS employeesub
GROUP BY employeesub.Salary;
  • SELECT * : Return all fields.
  • FROM Employee : Table searched on.
  • (SELECT *...) subquery : Return all people, sorted by Salary.
  • GROUP BY employeesub.Salary: Force the top-sorted, Salary row of each employee to be the returned result.

Unique-Row Solution

Note the Definition of a Relational Database: "Each row in a table has its own unique key." This would mean that, in the question's example, id would have to be unique, and in that case, we can just do :

SELECT *
FROM Employee
WHERE Employee.id = 12345
ORDER BY Employee.Salary DESC
LIMIT 1

Hopefully this is a solution that solves the problem and helps everyone better understand what's happening in the DB.

迷爱 2024-12-17 03:07:52

完成这项工作的另一种方式是在 OVER PARTITION 子句中使用 MAX() 分析函数

SELECT t.*
  FROM
    (
    SELECT id
          ,rev
          ,contents
          ,MAX(rev) OVER (PARTITION BY id) as max_rev
      FROM YourTable
    ) t
  WHERE t.rev = t.max_rev 

本文中已记录的另一个 ROW_NUMBER() OVER PARTITION 解决方案是

SELECT t.*
  FROM
    (
    SELECT id
          ,rev
          ,contents
          ,ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) rank
      FROM YourTable
    ) t
  WHERE t.rank = 1 

This 2 SELECT work well在 Oracle 10g 上。

MAX() 解决方案的运行速度肯定比 ROW_NUMBER() 解决方案更快,因为 MAX() 复杂度为 O(n),而 ROW_NUMBER() 复杂度至少为 O(n.log(n)),其中 n 表示表中的记录数!

Another manner to do the job is using MAX() analytic function in OVER PARTITION clause

SELECT t.*
  FROM
    (
    SELECT id
          ,rev
          ,contents
          ,MAX(rev) OVER (PARTITION BY id) as max_rev
      FROM YourTable
    ) t
  WHERE t.rev = t.max_rev 

The other ROW_NUMBER() OVER PARTITION solution already documented in this post is

SELECT t.*
  FROM
    (
    SELECT id
          ,rev
          ,contents
          ,ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) rank
      FROM YourTable
    ) t
  WHERE t.rank = 1 

This 2 SELECT work well on Oracle 10g.

MAX() solution runs certainly FASTER that ROW_NUMBER() solution because MAX() complexity is O(n) while ROW_NUMBER() complexity is at minimum O(n.log(n)) where n represent the number of records in table !

血之狂魔 2024-12-17 03:07:52

像这样的东西吗?

SELECT yourtable.id, rev, content
FROM yourtable
INNER JOIN (
    SELECT id, max(rev) as maxrev
    FROM yourtable
    GROUP BY id
) AS child ON (yourtable.id = child.id) AND (yourtable.rev = maxrev)

Something like this?

SELECT yourtable.id, rev, content
FROM yourtable
INNER JOIN (
    SELECT id, max(rev) as maxrev
    FROM yourtable
    GROUP BY id
) AS child ON (yourtable.id = child.id) AND (yourtable.rev = maxrev)
总攻大人 2024-12-17 03:07:52

我喜欢使用基于 NOT EXIST 的解决方案来解决此问题:

SELECT 
  id, 
  rev
  -- you can select other columns here
FROM YourTable t
WHERE NOT EXISTS (
   SELECT * FROM YourTable t WHERE t.id = id AND rev > t.rev
)

这将选择组内具有最大值的所有记录,并允许您选择其他列。

I like to use a NOT EXIST-based solution for this problem:

SELECT 
  id, 
  rev
  -- you can select other columns here
FROM YourTable t
WHERE NOT EXISTS (
   SELECT * FROM YourTable t WHERE t.id = id AND rev > t.rev
)

This will select all records with max value within the group and allows you to select other columns.

好倦 2024-12-17 03:07:52
SELECT *
FROM Employee
where Employee.Salary in (select max(salary) from Employee group by Employe_id)
ORDER BY Employee.Salary
SELECT *
FROM Employee
where Employee.Salary in (select max(salary) from Employee group by Employe_id)
ORDER BY Employee.Salary
梦里兽 2024-12-17 03:07:52

注意:在 MySQL 8 天以上我可能不会再推荐这个。已经很多年没有使用过它了。

我几乎没有看到提到的第三个解决方案是 MySQL 特定的,看起来像这样:

SELECT id, MAX(rev) AS rev
 , 0+SUBSTRING_INDEX(GROUP_CONCAT(numeric_content ORDER BY rev DESC), ',', 1) AS numeric_content
FROM t1
GROUP BY id

是的,它看起来很糟糕(转换为字符串并返回等),但根据我的经验,它通常比其他解决方案。也许这仅适用于我的用例,但我已经在具有数百万条记录和许多唯一 ID 的表上使用了它。也许是因为MySQL在优化其他解决方案方面相当糟糕(至少在我提出这个解决方案的5.0天里)。

一件重要的事情是 GROUP_CONCAT 对于它可以构建的字符串有最大长度。您可能希望通过设置 group_concat_max_len 变量来提高此限制。请记住,如果您有大量行,这将是扩展的限制。

无论如何,如果您的内容字段已经是文本,则上述内容不会直接起作用。在这种情况下,您可能想使用不同的分隔符,例如 \0。您还会更快地遇到 group_concat_max_len 限制。

Note: I probably wouldn't recommend this anymore in MySQL 8+ days. Haven't used it in years.

A third solution I hardly ever see mentioned is MySQL specific and looks like this:

SELECT id, MAX(rev) AS rev
 , 0+SUBSTRING_INDEX(GROUP_CONCAT(numeric_content ORDER BY rev DESC), ',', 1) AS numeric_content
FROM t1
GROUP BY id

Yes it looks awful (converting to string and back etc.) but in my experience it's usually faster than the other solutions. Maybe that's just for my use cases, but I have used it on tables with millions of records and many unique ids. Maybe it's because MySQL is pretty bad at optimizing the other solutions (at least in the 5.0 days when I came up with this solution).

One important thing is that GROUP_CONCAT has a maximum length for the string it can build up. You probably want to raise this limit by setting the group_concat_max_len variable. And keep in mind that this will be a limit on scaling if you have a large number of rows.

Anyway, the above doesn't directly work if your content field is already text. In that case you probably want to use a different separator, like \0 maybe. You'll also run into the group_concat_max_len limit quicker.

绻影浮沉 2024-12-17 03:07:52

不是 mySQL,但对于其他发现此问题并使用 SQL 的人来说,这是解决 问题是使用 交叉

WITH DocIds AS (SELECT DISTINCT id FROM docs)

SELECT d2.id, d2.rev, d2.content
FROM DocIds d1
CROSS APPLY (
  SELECT Top 1 * FROM docs d
  WHERE d.id = d1.id
  ORDER BY rev DESC
) d2

在 MS SQL 这是 SqlFiddle 中的示例

NOT mySQL, but for other people finding this question and using SQL, another way to resolve the problem is using Cross Apply in MS SQL

WITH DocIds AS (SELECT DISTINCT id FROM docs)

SELECT d2.id, d2.rev, d2.content
FROM DocIds d1
CROSS APPLY (
  SELECT Top 1 * FROM docs d
  WHERE d.id = d1.id
  ORDER BY rev DESC
) d2

Here's an example in SqlFiddle

青柠芒果 2024-12-17 03:07:52

我想,你想要这个吗?

select * from docs where (id, rev) IN (select id, max(rev) as rev from docs group by id order by id)  

SQL小提琴:
查看此处

I think, You want this?

select * from docs where (id, rev) IN (select id, max(rev) as rev from docs group by id order by id)  

SQL Fiddle :
Check here

我不会写诗 2024-12-17 03:07:52

我会使用这个:

select t.*
from test as t
join
   (select max(rev) as rev
    from test
    group by id) as o
on o.rev = t.rev

子查询 SELECT 也许不太有效,但在 JOIN 子句中似乎可用。我不是优化查询方面的专家,但我尝试过 MySQL、PostgreSQL、FireBird,它确实工作得很好。

您可以在多个联接和 WHERE 子句中使用此架构。这是我的工作示例(用表“firmy”解决与您的问题相同的问题):

select *
from platnosci as p
join firmy as f
on p.id_rel_firmy = f.id_rel
join (select max(id_obj) as id_obj
      from firmy
      group by id_rel) as o
on o.id_obj = f.id_obj and p.od > '2014-03-01'

在具有十几条记录的表上进行询问,并且在真正不太强大的机器上花费的时间不到 0.01 秒。

我不会使用 IN 子句(正如上面提到的那样)。 IN 被赋予与短常量列表一起使用,而不是作为基于子查询构建的查询过滤器。这是因为 IN 中的子查询是针对每条扫描记录执行的,这会使查询花费非常长的时间。

I would use this:

select t.*
from test as t
join
   (select max(rev) as rev
    from test
    group by id) as o
on o.rev = t.rev

Subquery SELECT is not too eficient maybe, but in JOIN clause seems to be usable. I'm not an expert in optimizing queries, but I've tried at MySQL, PostgreSQL, FireBird and it does work very good.

You can use this schema in multiple joins and with WHERE clause. It is my working example (solving identical to yours problem with table "firmy"):

select *
from platnosci as p
join firmy as f
on p.id_rel_firmy = f.id_rel
join (select max(id_obj) as id_obj
      from firmy
      group by id_rel) as o
on o.id_obj = f.id_obj and p.od > '2014-03-01'

It is asked on tables having teens thusands of records, and it takes less then 0,01 second on really not too strong machine.

I wouldn't use IN clause (as it is mentioned somewhere above). IN is given to use with short lists of constans, and not as to be the query filter built on subquery. It is because subquery in IN is performed for every scanned record which can made query taking very loooong time.

§普罗旺斯的薰衣草 2024-12-17 03:07:52

由于这是关于这个问题的最受欢迎的问题,我也会在这里重新发布另一个答案:

看起来有更简单的方法来做到这一点(但仅在 MySQL 中):

select *
from (select * from mytable order by id, rev desc ) x
group by id

请在 这个问题为这个问题提供了如此简洁而优雅的答案。

编辑:虽然这个解决方案适用于很多人,但从长远来看可能不稳定,因为 MySQL 不保证 GROUP BY 语句将为不在 GROUP BY 列表中的列返回有意义的值。因此,使用此解决方案需要您自担风险!

Since this is most popular question with regard to this problem, I'll re-post another answer to it here as well:

It looks like there is simpler way to do this (but only in MySQL):

select *
from (select * from mytable order by id, rev desc ) x
group by id

Please credit answer of user Bohemian in this question for providing such a concise and elegant answer to this problem.

Edit: though this solution works for many people it may not be stable in the long run, since MySQL doesn't guarantee that GROUP BY statement will return meaningful values for columns not in GROUP BY list. So use this solution at your own risk!

栖竹 2024-12-17 03:07:52

如果 select 语句中有许多字段,并且您希望通过优化代码为所有这些字段提供最新值:

select * from
(select * from table_name
order by id,rev desc) temp
group by id 

If you have many fields in select statement and you want latest value for all of those fields through optimized code:

select * from
(select * from table_name
order by id,rev desc) temp
group by id 
我做我的改变 2024-12-17 03:07:52

这个怎么样:

SELECT all_fields.*  
FROM (SELECT id, MAX(rev) FROM yourtable GROUP BY id) AS max_recs  
LEFT OUTER JOIN yourtable AS all_fields 
ON max_recs.id = all_fields.id

How about this:

SELECT all_fields.*  
FROM (SELECT id, MAX(rev) FROM yourtable GROUP BY id) AS max_recs  
LEFT OUTER JOIN yourtable AS all_fields 
ON max_recs.id = all_fields.id

此解决方案仅从 YourTable 中进行一项选择,因此速度更快。根据 sqlfiddle.com 上的测试,它仅适用于 MySQL 和 SQLite(对于 SQLite 删除 DESC)。也许可以对其进行调整以适用于我不熟悉的其他语言。

SELECT *
FROM ( SELECT *
       FROM ( SELECT 1 as id, 1 as rev, 'content1' as content
              UNION
              SELECT 2, 1, 'content2'
              UNION
              SELECT 1, 2, 'content3'
              UNION
              SELECT 1, 3, 'content4'
            ) as YourTable
       ORDER BY id, rev DESC
   ) as YourTable
GROUP BY id

This solution makes only one selection from YourTable, therefore it's faster. It works only for MySQL and SQLite(for SQLite remove DESC) according to test on sqlfiddle.com. Maybe it can be tweaked to work on other languages which I am not familiar with.

SELECT *
FROM ( SELECT *
       FROM ( SELECT 1 as id, 1 as rev, 'content1' as content
              UNION
              SELECT 2, 1, 'content2'
              UNION
              SELECT 1, 2, 'content3'
              UNION
              SELECT 1, 3, 'content4'
            ) as YourTable
       ORDER BY id, rev DESC
   ) as YourTable
GROUP BY id
缪败 2024-12-17 03:07:52

这是一个很好的方法,

使用以下代码:

with temp as  ( 
select count(field1) as summ , field1
from table_name
group by field1 )
select * from temp where summ = (select max(summ) from temp)

Here is a nice way of doing that

Use following code :

with temp as  ( 
select count(field1) as summ , field1
from table_name
group by field1 )
select * from temp where summ = (select max(summ) from temp)
瑶笙 2024-12-17 03:07:52

我喜欢通过按某些列对记录进行排名来做到这一点。在本例中,对按 id 分组的 rev 值进行排名。 rev 较高的人排名较低。因此最高的 rev 排名为 1。

select id, rev, content
from
 (select
    @rowNum := if(@prevValue = id, @rowNum+1, 1) as row_num,
    id, rev, content,
    @prevValue := id
  from
   (select id, rev, content from YOURTABLE order by id asc, rev desc) TEMP,
   (select @rowNum := 1 from DUAL) X,
   (select @prevValue := -1 from DUAL) Y) TEMP
where row_num = 1;

不确定引入变量是否会使整个过程变慢。但至少我不会查询 YOURTABLE 两次。

I like to do this by ranking the records by some column. In this case, rank rev values grouped by id. Those with higher rev will have lower rankings. So highest rev will have ranking of 1.

select id, rev, content
from
 (select
    @rowNum := if(@prevValue = id, @rowNum+1, 1) as row_num,
    id, rev, content,
    @prevValue := id
  from
   (select id, rev, content from YOURTABLE order by id asc, rev desc) TEMP,
   (select @rowNum := 1 from DUAL) X,
   (select @prevValue := -1 from DUAL) Y) TEMP
where row_num = 1;

Not sure if introducing variables makes the whole thing slower. But at least I'm not querying YOURTABLE twice.

御守 2024-12-17 03:07:52

这是另一个解决方案希望它能帮助别人

Select a.id , a.rev, a.content from Table1 a
inner join 
(SELECT id, max(rev) rev FROM Table1 GROUP BY id) x on x.id =a.id and x.rev =a.rev

here is another solution hope it will help someone

Select a.id , a.rev, a.content from Table1 a
inner join 
(SELECT id, max(rev) rev FROM Table1 GROUP BY id) x on x.id =a.id and x.rev =a.rev
烛影斜 2024-12-17 03:07:52

这些答案都不适合我。

这对我有用。

with score as (select max(score_up) from history)
select history.* from score, history where history.score_up = score.max

None of these answers have worked for me.

This is what worked for me.

with score as (select max(score_up) from history)
select history.* from score, history where history.score_up = score.max
旧时光的容颜 2024-12-17 03:07:52

这是仅检索具有该字段最大值的字段的记录的另一种解决方案。这适用于我工作的平台 SQL400。在此示例中,将通过以下 SQL 语句检索字段 FIELD5 中具有最大值的记录。

SELECT A.KEYFIELD1, A.KEYFIELD2, A.FIELD3, A.FIELD4, A.FIELD5
  FROM MYFILE A
 WHERE RRN(A) IN
   (SELECT RRN(B) 
      FROM MYFILE B
     WHERE B.KEYFIELD1 = A.KEYFIELD1 AND B.KEYFIELD2 = A.KEYFIELD2
     ORDER BY B.FIELD5 DESC
     FETCH FIRST ROW ONLY)

Here's another solution to retrieving the records only with a field that has the maximum value for that field. This works for SQL400 which is the platform I work on. In this example, the records with the maximum value in field FIELD5 will be retrieved by the following SQL statement.

SELECT A.KEYFIELD1, A.KEYFIELD2, A.FIELD3, A.FIELD4, A.FIELD5
  FROM MYFILE A
 WHERE RRN(A) IN
   (SELECT RRN(B) 
      FROM MYFILE B
     WHERE B.KEYFIELD1 = A.KEYFIELD1 AND B.KEYFIELD2 = A.KEYFIELD2
     ORDER BY B.FIELD5 DESC
     FETCH FIRST ROW ONLY)
那些过往 2024-12-17 03:07:52

以相反顺序对 rev 字段进行排序,然后按 id 进行分组,该 id 给出每个分组的第一行,即具有最高 rev 值的行。

SELECT * FROM (SELECT * FROM table1 ORDER BY id, rev DESC) X GROUP BY X.id;

使用以下数据在 http://sqlfiddle.com/ 中进行测试

CREATE TABLE table1
    (`id` int, `rev` int, `content` varchar(11));

INSERT INTO table1
    (`id`, `rev`, `content`)
VALUES
    (1, 1, 'One-One'),
    (1, 2, 'One-Two'),
    (2, 1, 'Two-One'),
    (2, 2, 'Two-Two'),
    (3, 2, 'Three-Two'),
    (3, 1, 'Three-One'),
    (3, 3, 'Three-Three')
;

这在 MySql 5.5 和 5.6 中给出了以下结果

id  rev content
1   2   One-Two
2   2   Two-Two
3   3   Three-Two

Sorted the rev field in reverse order and then grouped by id which gave the first row of each grouping which is the one with the highest rev value.

SELECT * FROM (SELECT * FROM table1 ORDER BY id, rev DESC) X GROUP BY X.id;

Tested in http://sqlfiddle.com/ with the following data

CREATE TABLE table1
    (`id` int, `rev` int, `content` varchar(11));

INSERT INTO table1
    (`id`, `rev`, `content`)
VALUES
    (1, 1, 'One-One'),
    (1, 2, 'One-Two'),
    (2, 1, 'Two-One'),
    (2, 2, 'Two-Two'),
    (3, 2, 'Three-Two'),
    (3, 1, 'Three-One'),
    (3, 3, 'Three-Three')
;

This gave the following result in MySql 5.5 and 5.6

id  rev content
1   2   One-Two
2   2   Two-Two
3   3   Three-Two
笨死的猪 2024-12-17 03:07:52

当您将 revid 合并为 MAX() 的一个 maxRevId 值时,您可以在不进行联接的情况下进行选择> 然后将其拆分回原始值:

SELECT maxRevId & ((1 << 32) - 1) as id, maxRevId >> 32 AS rev
FROM (SELECT MAX(((rev << 32) | id)) AS maxRevId
      FROM YourTable
      GROUP BY id) x;

当存在复杂的联接而不是单个表时,这尤其快。使用传统方法,复杂的连接将完成两次。

revidINT UNSIGNED(32 位)并且组合值适合 BIGINT UNSIGNED 时,上述组合对于位函数来说很简单(64 位)。当 id & rev 大于 32 位值或由多列组成,您需要将值组合成例如二进制值,并为 MAX() 提供适当的填充。

You can make the select without a join when you combine the rev and id into one maxRevId value for MAX() and then split it back to original values:

SELECT maxRevId & ((1 << 32) - 1) as id, maxRevId >> 32 AS rev
FROM (SELECT MAX(((rev << 32) | id)) AS maxRevId
      FROM YourTable
      GROUP BY id) x;

This is especially fast when there is a complex join instead of a single table. With the traditional approaches the complex join would be done twice.

The above combination is simple with bit functions when rev and id are INT UNSIGNED (32 bit) and combined value fits to BIGINT UNSIGNED (64 bit). When the id & rev are larger than 32-bit values or made of multiple columns, you need combine the value into e.g. a binary value with suitable padding for MAX().

暖阳 2024-12-17 03:07:52

说明

这不是纯 SQL。这将使用 SQLAlchemy ORM。

我来这里寻找 SQLAlchemy 帮助,所以我将使用 python/SQLAlchemy 版本复制 Adrian Carneiro 的答案,特别是外连接部分。

此查询回答的问题是:

“您能否向我返回这组记录(基于相同 ID)中版本号最高的记录”。

这允许我复制记录并更新它,增加其版本号,并以可以显示随时间变化的方式保留旧版本的副本。

代码

MyTableAlias = aliased(MyTable)
newest_records = appdb.session.query(MyTable).select_from(join(
    MyTable, 
    MyTableAlias, 
    onclause=and_(
        MyTable.id == MyTableAlias.id,
        MyTable.version_int < MyTableAlias.version_int
    ),
    isouter=True
    )
).filter(
    MyTableAlias.id  == None,
).all()

在 PostgreSQL 数据库上测试。

Explanation

This is not pure SQL. This will use the SQLAlchemy ORM.

I came here looking for SQLAlchemy help, so I will duplicate Adrian Carneiro's answer with the python/SQLAlchemy version, specifically the outer join part.

This query answers the question of:

"Can you return me the records in this group of records (based on same id) that have the highest version number".

This allows me to duplicate the record, update it, increment its version number, and have the copy of the old version in such a way that I can show change over time.

Code

MyTableAlias = aliased(MyTable)
newest_records = appdb.session.query(MyTable).select_from(join(
    MyTable, 
    MyTableAlias, 
    onclause=and_(
        MyTable.id == MyTableAlias.id,
        MyTable.version_int < MyTableAlias.version_int
    ),
    isouter=True
    )
).filter(
    MyTableAlias.id  == None,
).all()

Tested on a PostgreSQL database.

短叹 2024-12-17 03:07:52

我用下面的方法来解决我自己的问题。我首先创建了一个临时表并插入了每个唯一 ID 的最大转速值。

CREATE TABLE #temp1
(
    id varchar(20)
    , rev int
)
INSERT INTO #temp1
SELECT a.id, MAX(a.rev) as rev
FROM 
    (
        SELECT id, content, SUM(rev) as rev
        FROM YourTable
        GROUP BY id, content
    ) as a 
GROUP BY a.id
ORDER BY a.id

然后,我将这些最大值 (#temp1) 添加到所有可能的 id/内容组合中。通过这样做,我自然会过滤掉非最大 ID/内容组合,并留下每个组合的唯一最大转速值。

SELECT a.id, a.rev, content
FROM #temp1 as a
LEFT JOIN
    (
        SELECT id, content, SUM(rev) as rev
        FROM YourTable
        GROUP BY id, content
    ) as b on a.id = b.id and a.rev = b.rev
GROUP BY a.id, a.rev, b.content
ORDER BY a.id

I used the below to solve a problem of my own. I first created a temp table and inserted the max rev value per unique id.

CREATE TABLE #temp1
(
    id varchar(20)
    , rev int
)
INSERT INTO #temp1
SELECT a.id, MAX(a.rev) as rev
FROM 
    (
        SELECT id, content, SUM(rev) as rev
        FROM YourTable
        GROUP BY id, content
    ) as a 
GROUP BY a.id
ORDER BY a.id

I then joined these max values (#temp1) to all of the possible id/content combinations. By doing this, I naturally filter out the non-maximum id/content combinations, and am left with the only max rev values for each.

SELECT a.id, a.rev, content
FROM #temp1 as a
LEFT JOIN
    (
        SELECT id, content, SUM(rev) as rev
        FROM YourTable
        GROUP BY id, content
    ) as b on a.id = b.id and a.rev = b.rev
GROUP BY a.id, a.rev, b.content
ORDER BY a.id
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文