为表中的每个组选择前 N 行

发布于 2024-09-29 03:17:49 字数 792 浏览 0 评论 0原文

我面临一个非常常见的问题,即“为表中的每个组选择前 N 行”。

考虑一个包含 id、name、hair_colour、score 列的表。

我想要一个结果集,对于每种头发颜色,都能得到前 3 名得分手的名字。

为了解决这个问题,我得到了我所需要的 Rick Osborne 的博文“sql-getting-top-n-rows-for-a-grouped-query”

当我的分数相同时,该解决方案无法按预期工作。

在上面的例子中,结果如下。

 id  name  hair  score  ranknum
---------------------------------
 12  Kit    Blonde  10  1
  9  Becca  Blonde  9  2
  8  Katie  Blonde  8  3
  3  Sarah  Brunette 10  1    
  4  Deborah Brunette 9  2 - ------- - - > if
  1  Kim  Brunette 8  3

考虑行 4 Deborah Brunette 9 2。如果这也与 Sarah 具有相同的分数 (10),那么对于“黑发”类型的头发,ranknum 将是 2,2,3。

解决这个问题的办法是什么?

I am facing a very common issue regarding "Selecting top N rows for each group in a table".

Consider a table with id, name, hair_colour, score columns.

I want a resultset such that, for each hair colour, get me top 3 scorer names.

To solve this i got exactly what i need on Rick Osborne's blogpost "sql-getting-top-n-rows-for-a-grouped-query"

That solution doesn't work as expected when my scores are equal.

In above example the result as follow.

 id  name  hair  score  ranknum
---------------------------------
 12  Kit    Blonde  10  1
  9  Becca  Blonde  9  2
  8  Katie  Blonde  8  3
  3  Sarah  Brunette 10  1    
  4  Deborah Brunette 9  2 - ------- - - > if
  1  Kim  Brunette 8  3

Consider the row 4 Deborah Brunette 9 2. If this also has same score (10) same as Sarah, then ranknum will be 2,2,3 for "Brunette" type of hair.

What's the solution to this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

生生漫 2024-10-06 03:17:49

如果您使用的是 SQL Server 2005 或更高版本,则可以使用排名函数和 CTE 来实现此目的:

;WITH HairColors AS
(SELECT id, name, hair, score, 
        ROW_NUMBER() OVER(PARTITION BY hair ORDER BY score DESC) as 'RowNum'
)
SELECT id, name, hair, score
FROM HairColors
WHERE RowNum <= 3

此 CTE 将按 hair 列的值“分区”您的数据,并且每个分区然后按分数排序(降序)并获取行号;每个分区的最高分数是 1,然后是 2,依此类推。

因此,如果您想要每个组的前 3 名,请仅从 CTE 中选择那些 RowNum 等于或小于 3 的行(1、 2, 3) -->就这样吧!

If you're using SQL Server 2005 or newer, you can use the ranking functions and a CTE to achieve this:

;WITH HairColors AS
(SELECT id, name, hair, score, 
        ROW_NUMBER() OVER(PARTITION BY hair ORDER BY score DESC) as 'RowNum'
)
SELECT id, name, hair, score
FROM HairColors
WHERE RowNum <= 3

This CTE will "partition" your data by the value of the hair column, and each partition is then order by score (descending) and gets a row number; the highest score for each partition is 1, then 2 etc.

So if you want to the TOP 3 of each group, select only those rows from the CTE that have a RowNum of 3 or less (1, 2, 3) --> there you go!

暮凉 2024-10-06 03:17:49

该算法得出排名的方式是计算叉积中分数等于或大于相关女孩的行数,以生成排名。因此,在您讨论的问题案例中,莎拉的网格看起来

a.name | a.score | b.name  | b.score
-------+---------+---------+--------
Sarah  | 9       | Sarah   | 9
Sarah  | 9       | Deborah | 9

与黛博拉的网格相似,这就是为什么两个女孩在这里都获得 2 的排名。

问题是,当出现平局时,由于此计数,所有女孩都会采用平局范围内的最低值,而您希望她们取最高值。我认为一个简单的改变可以解决这个问题:

不要使用大于或等于比较,而是使用严格的大于比较来计算严格更好的女孩的数量。然后,再加一,你就得到了你的排名(这将酌情处理平局)。因此,内部选择将是:

SELECT a.id, COUNT(*) + 1 AS ranknum
FROM girl AS a
  INNER JOIN girl AS b ON (a.hair = b.hair) AND (a.score < b.score)
GROUP BY a.id
HAVING COUNT(*) <= 3

任何人都可以看到这种方法中我没有注意到的任何问题吗?

The way the algorithm comes up with the rank, is to count the number of rows in the cross-product with a score equal to or greater than the girl in question, in order to generate rank. Hence in the problem case you're talking about, Sarah's grid would look like

a.name | a.score | b.name  | b.score
-------+---------+---------+--------
Sarah  | 9       | Sarah   | 9
Sarah  | 9       | Deborah | 9

and similarly for Deborah, which is why both girls get a rank of 2 here.

The problem is that when there's a tie, all girls take the lowest value in the tied range due to this count, when you'd want them to take the highest value instead. I think a simple change can fix this:

Instead of a greater-than-or-equal comparison, use a strict greater-than comparison to count the number of girls who are strictly better. Then, add one to that and you have your rank (which will deal with ties as appropriate). So the inner select would be:

SELECT a.id, COUNT(*) + 1 AS ranknum
FROM girl AS a
  INNER JOIN girl AS b ON (a.hair = b.hair) AND (a.score < b.score)
GROUP BY a.id
HAVING COUNT(*) <= 3

Can anyone see any problems with this approach that have escaped my notice?

白芷 2024-10-06 03:17:49

使用此复合选择可以正确处理 OP 问题

SELECT g.* FROM girls as g
WHERE g.score > IFNULL( (SELECT g2.score FROM girls as g2
                WHERE g.hair=g2.hair ORDER BY g2.score DESC LIMIT 3,1), 0)

请注意,您需要在此处使用 IFNULL 来处理表 girls 对于某种类型的 hair 的行数较少的情况,然后我们希望看到在sql答案中(在OP情况下是3个项目)。

Use this compound select which handles OP problem properly

SELECT g.* FROM girls as g
WHERE g.score > IFNULL( (SELECT g2.score FROM girls as g2
                WHERE g.hair=g2.hair ORDER BY g2.score DESC LIMIT 3,1), 0)

Note that you need to use IFNULL here to handle case when table girls has less rows for some type of hair then we want to see in sql answer (in OP case it is 3 items).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文