获取每组分组结果的前n条记录

发布于 2025-01-17 04:48:14 字数 913 浏览 7 评论 0原文

以下是最简单的示例,尽管任何解决方案都应该能够扩展到所需的 n 个顶级结果:

给定一个如下表,其中包含人员、组和年龄列,您如何获得 2每组中最年长的人?(组内的平局不应产生更多结果,而是按字母顺序给出前 2 个结果)

+--------+-------+-----+
| Person | Group | Age |
+--------+-------+-----+
| Bob    | 1     | 32  |
| Jill   | 1     | 34  |
| Shawn  | 1     | 42  |
| Jake   | 2     | 29  |
| Paul   | 2     | 36  |
| Laura  | 2     | 39  |
+--------+-------+-----+

所需的结果集:

+--------+-------+-----+
| Shawn  | 1     | 42  |
| Jill   | 1     | 34  |
| Laura  | 2     | 39  |
| Paul   | 2     | 36  |
+--------+-------+-----+

注意:此问题建立在上一个问题的基础上 - <一href="https://stackoverflow.com/q/12102200/165673">获取每组分组 SQL 结果的最大值记录 - 用于从每个组中获取单个顶行,并且收到了很好的结果@Bohemian 的 MySQL 特定答案:

select * 
from (select * from mytable order by `Group`, Age desc, Person) x
group by `Group`

希望能够以此为基础,尽管我不知道如何实现。

The following is the simplest possible example, though any solution should be able to scale to however many n top results are needed:

Given a table like that below, with person, group, and age columns, how would you get the 2 oldest people in each group? (Ties within groups should not yield more results, but give the first 2 in alphabetical order)

+--------+-------+-----+
| Person | Group | Age |
+--------+-------+-----+
| Bob    | 1     | 32  |
| Jill   | 1     | 34  |
| Shawn  | 1     | 42  |
| Jake   | 2     | 29  |
| Paul   | 2     | 36  |
| Laura  | 2     | 39  |
+--------+-------+-----+

Desired result set:

+--------+-------+-----+
| Shawn  | 1     | 42  |
| Jill   | 1     | 34  |
| Laura  | 2     | 39  |
| Paul   | 2     | 36  |
+--------+-------+-----+

NOTE: This question builds on a previous one- Get records with max value for each group of grouped SQL results - for getting a single top row from each group, and which received a great MySQL-specific answer from @Bohemian:

select * 
from (select * from mytable order by `Group`, Age desc, Person) x
group by `Group`

Would love to be able to build off this, though I don't see how.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(12

糖粟与秋泊 2025-01-24 04:48:14

这是一种方法,使用 UNION ALL (请参阅 SQL Fiddle with演示)。这适用于两个组,如果您有两个以上的组,则需要指定编号并为每个添加查询:

(
  select *
  from mytable 
  where `group` = 1
  order by age desc
  LIMIT 2
)
UNION ALL
(
  select *
  from mytable 
  where `group` = 2
  order by age desc
  LIMIT 2
)

有多种方法为此,请参阅本文以确定适合您情况的最佳路线:

http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/

编辑:

这可能也适合您,它为每条记录生成一个行号。使用上面链接中的示例,这将仅返回行数小于或等于 2 的记录:

select person, `group`, age
from 
(
   select person, `group`, age,
      (@num:=if(@group = `group`, @num +1, if(@group := `group`, 1, 1))) row_number 
  from test t
  CROSS JOIN (select @num:=0, @group:=null) c
  order by `Group`, Age desc, person
) as x 
where x.row_number <= 2;

请参阅演示

Here is one way to do this, using UNION ALL (See SQL Fiddle with Demo). This works with two groups, if you have more than two groups, then you would need to specify the group number and add queries for each group:

(
  select *
  from mytable 
  where `group` = 1
  order by age desc
  LIMIT 2
)
UNION ALL
(
  select *
  from mytable 
  where `group` = 2
  order by age desc
  LIMIT 2
)

There are a variety of ways to do this, see this article to determine the best route for your situation:

http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/

Edit:

This might work for you too, it generates a row number for each record. Using an example from the link above this will return only those records with a row number of less than or equal to 2:

select person, `group`, age
from 
(
   select person, `group`, age,
      (@num:=if(@group = `group`, @num +1, if(@group := `group`, 1, 1))) row_number 
  from test t
  CROSS JOIN (select @num:=0, @group:=null) c
  order by `Group`, Age desc, person
) as x 
where x.row_number <= 2;

See Demo

金兰素衣 2025-01-24 04:48:14

在其他数据库中,您可以使用 ROW_NUMBER 执行此操作。 MySQL 不支持 ROW_NUMBER,但您可以使用变量来模拟它:

SELECT
    person,
    groupname,
    age
FROM
(
    SELECT
        person,
        groupname,
        age,
        @rn := IF(@prev = groupname, @rn + 1, 1) AS rn,
        @prev := groupname
    FROM mytable
    JOIN (SELECT @prev := NULL, @rn := 0) AS vars
    ORDER BY groupname, age DESC, person
) AS T1
WHERE rn <= 2

查看它在线工作:sqlfiddle


编辑 我刚刚注意到 bluefeet 发布了一个非常相似的答案:+1 给他。然而,这个答案有两个小优点:

  1. 它是单个查询。变量在 SELECT 语句内初始化。
  2. 它按照问题中的描述处理关系(按名称字母顺序排列)。

所以我会把它留在这里,以防它可以帮助别人。

In other databases you can do this using ROW_NUMBER. MySQL doesn't support ROW_NUMBER but you can use variables to emulate it:

SELECT
    person,
    groupname,
    age
FROM
(
    SELECT
        person,
        groupname,
        age,
        @rn := IF(@prev = groupname, @rn + 1, 1) AS rn,
        @prev := groupname
    FROM mytable
    JOIN (SELECT @prev := NULL, @rn := 0) AS vars
    ORDER BY groupname, age DESC, person
) AS T1
WHERE rn <= 2

See it working online: sqlfiddle


Edit I just noticed that bluefeet posted a very similar answer: +1 to him. However this answer has two small advantages:

  1. It it is a single query. The variables are initialized inside the SELECT statement.
  2. It handles ties as described in the question (alphabetical order by name).

So I'll leave it here in case it can help someone.

这样的小城市 2025-01-24 04:48:14

试试这个:

SELECT a.person, a.group, a.age FROM person AS a WHERE 
(SELECT COUNT(*) FROM person AS b 
WHERE b.group = a.group AND b.age >= a.age) <= 2 
ORDER BY a.group ASC, a.age DESC

演​​示

Try this:

SELECT a.person, a.group, a.age FROM person AS a WHERE 
(SELECT COUNT(*) FROM person AS b 
WHERE b.group = a.group AND b.age >= a.age) <= 2 
ORDER BY a.group ASC, a.age DESC

DEMO

ぶ宁プ宁ぶ 2025-01-24 04:48:14

使用自连接怎么样:

CREATE TABLE mytable (person, groupname, age);
INSERT INTO mytable VALUES('Bob',1,32);
INSERT INTO mytable VALUES('Jill',1,34);
INSERT INTO mytable VALUES('Shawn',1,42);
INSERT INTO mytable VALUES('Jake',2,29);
INSERT INTO mytable VALUES('Paul',2,36);
INSERT INTO mytable VALUES('Laura',2,39);

SELECT a.* FROM mytable AS a
  LEFT JOIN mytable AS a2 
    ON a.groupname = a2.groupname AND a.age <= a2.age
GROUP BY a.person
HAVING COUNT(*) <= 2
ORDER BY a.groupname, a.age DESC;

给我:

a.person    a.groupname  a.age     
----------  -----------  ----------
Shawn       1            42        
Jill        1            34        
Laura       2            39        
Paul        2            36      

我受到 Bill Karwin 的回答的强烈启发 为每个类别选择前 10 条记录

另外,我正在使用 SQLite,但这应该适用于 MySQL。

另一件事:在上面,为了方便起见,我用 groupname 列替换了 group 列。

编辑

根据OP关于缺少平局结果的评论,我增加了snuffin的答案以显示所有平局。这意味着如果最后一个是平局,则可以返回多于 2 行,如下所示:

.headers on
.mode column

CREATE TABLE foo (person, groupname, age);
INSERT INTO foo VALUES('Paul',2,36);
INSERT INTO foo VALUES('Laura',2,39);
INSERT INTO foo VALUES('Joe',2,36);
INSERT INTO foo VALUES('Bob',1,32);
INSERT INTO foo VALUES('Jill',1,34);
INSERT INTO foo VALUES('Shawn',1,42);
INSERT INTO foo VALUES('Jake',2,29);
INSERT INTO foo VALUES('James',2,15);
INSERT INTO foo VALUES('Fred',1,12);
INSERT INTO foo VALUES('Chuck',3,112);


SELECT a.person, a.groupname, a.age 
FROM foo AS a 
WHERE a.age >= (SELECT MIN(b.age)
                FROM foo AS b 
                WHERE (SELECT COUNT(*)
                       FROM foo AS c
                       WHERE c.groupname = b.groupname AND c.age >= b.age) <= 2
                GROUP BY b.groupname)
ORDER BY a.groupname ASC, a.age DESC;

给我:

person      groupname   age       
----------  ----------  ----------
Shawn       1           42        
Jill        1           34        
Laura       2           39        
Paul        2           36        
Joe         2           36        
Chuck       3           112      

How about using self-joining:

CREATE TABLE mytable (person, groupname, age);
INSERT INTO mytable VALUES('Bob',1,32);
INSERT INTO mytable VALUES('Jill',1,34);
INSERT INTO mytable VALUES('Shawn',1,42);
INSERT INTO mytable VALUES('Jake',2,29);
INSERT INTO mytable VALUES('Paul',2,36);
INSERT INTO mytable VALUES('Laura',2,39);

SELECT a.* FROM mytable AS a
  LEFT JOIN mytable AS a2 
    ON a.groupname = a2.groupname AND a.age <= a2.age
GROUP BY a.person
HAVING COUNT(*) <= 2
ORDER BY a.groupname, a.age DESC;

gives me:

a.person    a.groupname  a.age     
----------  -----------  ----------
Shawn       1            42        
Jill        1            34        
Laura       2            39        
Paul        2            36      

I was strongly inspired by the answer from Bill Karwin to Select top 10 records for each category

Also, I'm using SQLite, but this should work on MySQL.

Another thing: in the above, I replaced the group column with a groupname column for convenience.

Edit:

Following-up on the OP's comment regarding missing tie results, I incremented on snuffin's answer to show all the ties. This means that if the last ones are ties, more than 2 rows can be returned, as shown below:

.headers on
.mode column

CREATE TABLE foo (person, groupname, age);
INSERT INTO foo VALUES('Paul',2,36);
INSERT INTO foo VALUES('Laura',2,39);
INSERT INTO foo VALUES('Joe',2,36);
INSERT INTO foo VALUES('Bob',1,32);
INSERT INTO foo VALUES('Jill',1,34);
INSERT INTO foo VALUES('Shawn',1,42);
INSERT INTO foo VALUES('Jake',2,29);
INSERT INTO foo VALUES('James',2,15);
INSERT INTO foo VALUES('Fred',1,12);
INSERT INTO foo VALUES('Chuck',3,112);


SELECT a.person, a.groupname, a.age 
FROM foo AS a 
WHERE a.age >= (SELECT MIN(b.age)
                FROM foo AS b 
                WHERE (SELECT COUNT(*)
                       FROM foo AS c
                       WHERE c.groupname = b.groupname AND c.age >= b.age) <= 2
                GROUP BY b.groupname)
ORDER BY a.groupname ASC, a.age DESC;

gives me:

person      groupname   age       
----------  ----------  ----------
Shawn       1           42        
Jill        1           34        
Laura       2           39        
Paul        2           36        
Joe         2           36        
Chuck       3           112      
等往事风中吹 2025-01-24 04:48:14

当您有大量行并且 Mark Byers/Rick James 和 Bluefeet 解决方案在我的环境(MySQL 5.6)上不起作用时,Snuffin 解决方案执行起来似乎相当慢,因为 order by 是在执行 select 后应用的,所以这里是一个变体Marc Byers/Rick James 解决此问题的解决方案(使用额外的叠瓦式选择):

select person, groupname, age
from
(
    select person, groupname, age,
    (@rn:=if(@prev = groupname, @rn +1, 1)) as rownumb,
    @prev:= groupname 
    from 
    (
        select person, groupname, age
        from persons 
        order by groupname ,  age desc, person
    )   as sortedlist
    JOIN (select @prev:=NULL, @rn :=0) as vars
) as groupedlist 
where rownumb<=2
order by groupname ,  age desc, person;

我在具有 500 万行的表上尝试了类似的查询,它在不到 3 秒的时间内返回结果

Snuffin solution seems quite slow to execute when you've got plenty of rows and Mark Byers/Rick James and Bluefeet solutions doesn't work on my environnement (MySQL 5.6) because order by is applied after execution of select, so here is a variant of Marc Byers/Rick James solutions to fix this issue (with an extra imbricated select):

select person, groupname, age
from
(
    select person, groupname, age,
    (@rn:=if(@prev = groupname, @rn +1, 1)) as rownumb,
    @prev:= groupname 
    from 
    (
        select person, groupname, age
        from persons 
        order by groupname ,  age desc, person
    )   as sortedlist
    JOIN (select @prev:=NULL, @rn :=0) as vars
) as groupedlist 
where rownumb<=2
order by groupname ,  age desc, person;

I tried similar query on a table having 5 millions rows and it returns result in less than 3 seconds

昵称有卵用 2025-01-24 04:48:14

如果其他答案不够快,请尝试 此代码

SELECT
        province, n, city, population
    FROM
      ( SELECT  @prev := '', @n := 0 ) init
    JOIN
      ( SELECT  @n := if(province != @prev, 1, @n + 1) AS n,
                @prev := province,
                province, city, population
            FROM  Canada
            ORDER BY
                province   ASC,
                population DESC
      ) x
    WHERE  n <= 3
    ORDER BY  province, n;

输出:

+---------------------------+------+------------------+------------+
| province                  | n    | city             | population |
+---------------------------+------+------------------+------------+
| Alberta                   |    1 | Calgary          |     968475 |
| Alberta                   |    2 | Edmonton         |     822319 |
| Alberta                   |    3 | Red Deer         |      73595 |
| British Columbia          |    1 | Vancouver        |    1837970 |
| British Columbia          |    2 | Victoria         |     289625 |
| British Columbia          |    3 | Abbotsford       |     151685 |
| Manitoba                  |    1 | ...

If the other answers are not fast enough Give this code a try:

SELECT
        province, n, city, population
    FROM
      ( SELECT  @prev := '', @n := 0 ) init
    JOIN
      ( SELECT  @n := if(province != @prev, 1, @n + 1) AS n,
                @prev := province,
                province, city, population
            FROM  Canada
            ORDER BY
                province   ASC,
                population DESC
      ) x
    WHERE  n <= 3
    ORDER BY  province, n;

Output:

+---------------------------+------+------------------+------------+
| province                  | n    | city             | population |
+---------------------------+------+------------------+------------+
| Alberta                   |    1 | Calgary          |     968475 |
| Alberta                   |    2 | Edmonton         |     822319 |
| Alberta                   |    3 | Red Deer         |      73595 |
| British Columbia          |    1 | Vancouver        |    1837970 |
| British Columbia          |    2 | Victoria         |     289625 |
| British Columbia          |    3 | Abbotsford       |     151685 |
| Manitoba                  |    1 | ...
南冥有猫 2025-01-24 04:48:14

看看这个:

SELECT
  p.Person,
  p.`Group`,
  p.Age
FROM
  people p
  INNER JOIN
  (
    SELECT MAX(Age) AS Age, `Group` FROM people GROUP BY `Group`
    UNION
    SELECT MAX(p3.Age) AS Age, p3.`Group` FROM people p3 INNER JOIN (SELECT MAX(Age) AS Age, `Group` FROM people GROUP BY `Group`) p4 ON p3.Age < p4.Age AND p3.`Group` = p4.`Group` GROUP BY `Group`
  ) p2 ON p.Age = p2.Age AND p.`Group` = p2.`Group`
ORDER BY
  `Group`,
  Age DESC,
  Person;

SQL Fiddle:http://sqlfiddle.com/#!2/cdbb6/15< /a>

Check this out:

SELECT
  p.Person,
  p.`Group`,
  p.Age
FROM
  people p
  INNER JOIN
  (
    SELECT MAX(Age) AS Age, `Group` FROM people GROUP BY `Group`
    UNION
    SELECT MAX(p3.Age) AS Age, p3.`Group` FROM people p3 INNER JOIN (SELECT MAX(Age) AS Age, `Group` FROM people GROUP BY `Group`) p4 ON p3.Age < p4.Age AND p3.`Group` = p4.`Group` GROUP BY `Group`
  ) p2 ON p.Age = p2.Age AND p.`Group` = p2.`Group`
ORDER BY
  `Group`,
  Age DESC,
  Person;

SQL Fiddle: http://sqlfiddle.com/#!2/cdbb6/15

凌乱心跳 2025-01-24 04:48:14
WITH cte_window AS (
SELECT movie_name,director_id,release_date,
ROW_NUMBER() OVER( PARTITION BY director_id ORDER BY release_date DESC) r
FROM movies
)   
SELECT * FROM cte_window WHERE r <= <n>;

上面的查询将返回每个导演的最新 n 部电影。

WITH cte_window AS (
SELECT movie_name,director_id,release_date,
ROW_NUMBER() OVER( PARTITION BY director_id ORDER BY release_date DESC) r
FROM movies
)   
SELECT * FROM cte_window WHERE r <= <n>;

Above query will returns latest n movies for each directors.

凹づ凸ル 2025-01-24 04:48:14

我想分享这个,因为我花了很长时间寻找一种简单的方法来在我正在开发的 java 程序中实现这个。这并不能完全给出您正在寻找的输出,但它很接近。 mysql 中名为 GROUP_CONCAT() 的函数非常适合指定每个组中返回多少结果。使用 LIMIT 或任何其他尝试使用 COUNT 执行此操作的奇特方法对我来说不起作用。因此,如果您愿意接受修改后的输出,那么这是一个很好的解决方案。假设我有一个名为“学生”的表,其中包含学生 ID、性别和 GPA。假设我希望每个性别的 gpa 都达到前 5 名。然后我可以像这样编写查询

SELECT sex, SUBSTRING_INDEX(GROUP_CONCAT(cast(gpa AS char ) ORDER BY gpa desc), ',',5) 
AS subcategories FROM student GROUP BY sex;

请注意,参数“5”告诉它每行要连接多少个条目

输出看起来像这样

+--------+----------------+
| Male   | 4,4,4,4,3.9    |
| Female | 4,4,3.9,3.9,3.8|
+--------+----------------+

您还可以更改 ORDER BY 变量并对它们进行排序另一种方式。因此,如果我有学生的年龄,我可以将“gpa desc”替换为“age desc”,这样就可以了!您还可以向 group by 语句添加变量以在输出中获取更多列。所以这只是我发现的一种非常灵活的方法,如果您只列出结果就可以的话,效果很好。

I wanted to share this because I spent a long time searching for an easy way to implement this in a java program I'm working on. This doesn't quite give the output you're looking for but its close. The function in mysql called GROUP_CONCAT() worked really well for specifying how many results to return in each group. Using LIMIT or any of the other fancy ways of trying to do this with COUNT didn't work for me. So if you're willing to accept a modified output, its a great solution. Lets say I have a table called 'student' with student ids, their gender, and gpa. Lets say I want to top 5 gpas for each gender. Then I can write the query like this

SELECT sex, SUBSTRING_INDEX(GROUP_CONCAT(cast(gpa AS char ) ORDER BY gpa desc), ',',5) 
AS subcategories FROM student GROUP BY sex;

Note that the parameter '5' tells it how many entries to concatenate into each row

And the output would look something like

+--------+----------------+
| Male   | 4,4,4,4,3.9    |
| Female | 4,4,3.9,3.9,3.8|
+--------+----------------+

You can also change the ORDER BY variable and order them a different way. So if I had the student's age I could replace the 'gpa desc' with 'age desc' and it will work! You can also add variables to the group by statement to get more columns in the output. So this is just a way I found that is pretty flexible and works good if you are ok with just listing results.

转身以后 2025-01-24 04:48:14

在SQL Server中,row_number()是一个强大的函数,可以轻松获得结果,如下所示

select Person,[group],age
from
(
select * ,row_number() over(partition by [group] order by age desc) rn
from mytable
) t
where rn <= 2

In SQL Server row_number() is a powerful function that can get result easily as below

select Person,[group],age
from
(
select * ,row_number() over(partition by [group] order by age desc) rn
from mytable
) t
where rn <= 2
舂唻埖巳落 2025-01-24 04:48:14

对于这个问题有一个非常好的答案 MySQL - How To Get Top N Rows per Each组

根据引用链接中的解决方案,您的查询将类似于:

SELECT Person, Group, Age
   FROM
     (SELECT Person, Group, Age, 
                  @group_rank := IF(@group = Group, @group_rank + 1, 1) AS group_rank,
                  @current_group := Group 
       FROM `your_table`
       ORDER BY Group, Age DESC
     ) ranked
   WHERE group_rank <= `n`
   ORDER BY Group, Age DESC;

其中 ntop nyour_table 是您的表的名称。

我认为参考文献中的解释非常清楚。为了快速参考,我将其复制并粘贴到此处:

目前MySQL不支持可以赋值的ROW_NUMBER()函数
组内的序列号,但作为解决方法,我们可以使用 MySQL
会话变量。

这些变量不需要声明,并且可以在查询中使用
进行计算并存储中间结果。

@current_country := Country 此代码针对每一行执行,并且
将国家/地区列的值存储到@current_country变量。

@country_rank := IF(@current_country = 国家/地区, @country_rank + 1, 1)
在此代码中,如果 @current_country 相同,我们会增加排名,
否则将其设置为 1。对于第一行 @current_country 为 NULL,因此
排名也设置为 1。

为了正确排名,我们需要按国家/地区、人口 DESC 排序

There is a really nice answer to this problem at MySQL - How To Get Top N Rows per Each Group

Based on the solution in the referenced link, your query would be like:

SELECT Person, Group, Age
   FROM
     (SELECT Person, Group, Age, 
                  @group_rank := IF(@group = Group, @group_rank + 1, 1) AS group_rank,
                  @current_group := Group 
       FROM `your_table`
       ORDER BY Group, Age DESC
     ) ranked
   WHERE group_rank <= `n`
   ORDER BY Group, Age DESC;

where n is the top n and your_table is the name of your table.

I think the explanation in the reference is really clear. For quick reference I will copy and paste it here:

Currently MySQL does not support ROW_NUMBER() function that can assign
a sequence number within a group, but as a workaround we can use MySQL
session variables.

These variables do not require declaration, and can be used in a query
to do calculations and to store intermediate results.

@current_country := country This code is executed for each row and
stores the value of country column to @current_country variable.

@country_rank := IF(@current_country = country, @country_rank + 1, 1)
In this code, if @current_country is the same we increment rank,
otherwise set it to 1. For the first row @current_country is NULL, so
rank is also set to 1.

For correct ranking, we need to have ORDER BY country, population DESC

睡美人的小仙女 2025-01-24 04:48:14
SELECT
p1.Person,
p1.`GROUP`,
p1.Age  
   FROM
person AS p1 
 WHERE
(
SELECT
    COUNT( DISTINCT ( p2.age ) ) 
FROM
    person AS p2 
WHERE
    p2.`GROUP` = p1.`GROUP` 
    AND p2.Age >= p1.Age 
) < 2 
ORDER BY
p1.`GROUP` ASC,
p1.age DESC

参考leetcode

SELECT
p1.Person,
p1.`GROUP`,
p1.Age  
   FROM
person AS p1 
 WHERE
(
SELECT
    COUNT( DISTINCT ( p2.age ) ) 
FROM
    person AS p2 
WHERE
    p2.`GROUP` = p1.`GROUP` 
    AND p2.Age >= p1.Age 
) < 2 
ORDER BY
p1.`GROUP` ASC,
p1.age DESC

reference leetcode

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文