在 SQL 中选择具有多个 GROUP 的表中 TOP 2 值的 SUM

发布于 2024-09-25 14:19:33 字数 1189 浏览 7 评论 0原文

我一直在使用 SQL Server 2000 中的集合,并且我的临时表之一(#Periods)具有以下表结构:

    RestCTR     HoursCTR    Duration    Rest
    ----------------------------------------
    1           337         2           0
    2           337         46          1
    3           337         2           0
    4           337         46          1
    5           338         1           0
    6           338         46          1
    7           338         2           0
    8           338         46          1
    9           338         1           0
    10          339         46          1
    ...

我想要做的是计算每个 HoursCTR 的 2 个最长休息时间的总和,最好使用集合和临时表(而不是游标或嵌套子查询)。

这是梦想中的查询,但在 SQL 中不起作用(无论我运行多少次):

Select HoursCTR, SUM ( TOP 2 Duration ) as LongestBreaks
FROM #Periods
WHERE Rest = 1
Group By HoursCTR    

HoursCTR 可以有任意数量的休息时间(包括没有)。

我当前的解决方案不是很优雅,基本上涉及以下步骤:

  1. 获取最大休息时间,按 HoursCTR 分组
  2. 选择为每个 HoursCTR 返回此最大持续时间的第一个(分钟)RestCTR 行
  3. 重复步骤 1(不包括已收集的行)步骤 2)
  4. 重复步骤 2(再次排除步骤 2 中收集的行)
  5. 将 RestCTR 行(来自步骤 2 和 4)合并到单个表中
  6. 获取步骤 5 中的行指向的持续时间的 SUM,按 HoursCTR 分组

如果有任何可以减少这个过程的设置函数,他们都会非常受欢迎。

I've been playing with sets in SQL Server 2000 and have the following table structure for one of my temp tables (#Periods):

    RestCTR     HoursCTR    Duration    Rest
    ----------------------------------------
    1           337         2           0
    2           337         46          1
    3           337         2           0
    4           337         46          1
    5           338         1           0
    6           338         46          1
    7           338         2           0
    8           338         46          1
    9           338         1           0
    10          339         46          1
    ...

What I'd like to do is to calculate the Sum of the 2 longest Rest periods for each HoursCTR, preferably using sets and temp tables (rather than cursors, or nested subqueries).

Here's the dream query that just won't work in SQL (no matter how many times I run it):

Select HoursCTR, SUM ( TOP 2 Duration ) as LongestBreaks
FROM #Periods
WHERE Rest = 1
Group By HoursCTR    

The HoursCTR can have any number of Rest periods (including none).

My current solution is not very elegant and basically involves the following steps:

  1. Get the max duration of rest, group by HoursCTR
  2. Select the first (min) RestCTR row that returns this max duration for each HoursCTR
  3. Repeat step 1 (excluding the rows already collected in step 2)
  4. Repeat step 2 (again, excluding rows collected in step 2)
  5. Combine the RestCTR rows (from step 2 and 4) into single table
  6. Get SUM of the Duration pointed to by the rows in step 5, grouped by HoursCTR

If there are any set functions that cut this process down, they would be very welcome.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

煞人兵器 2024-10-02 14:19:33

在 SQL Server 中执行此操作的最佳方法是使用公用表表达式 ,使用窗口函数 ROW_NUMBER()

WITH NumberedPeriods AS (
  SELECT HoursCTR, Duration, ROW_NUMBER() 
    OVER (PARTITION BY HoursCTR ORDER BY Duration DESC) AS RN
  FROM #Periods
  WHERE Rest = 1
)
SELECT HoursCTR, SUM(Duration) AS LongestBreaks
FROM NumberedPeriods
WHERE RN <= 2
GROUP BY HoursCTR

编辑:我在分区中添加了一个 ORDER BY 子句,以获取两个最长的休息时间。


抱歉,我没有注意到您需要它才能在 Microsoft SQL Server 2000 中工作。该版本不支持 CTE 或窗口函数。我会留下上面的答案,以防对其他人有帮助。

在 SQL Server 2000 中,常见的建议是使用相关子查询:

SELECT p1.HoursCTR, (SELECT SUM(t.Duration) FROM 
    (SELECT TOP 2 p2.Duration FROM #Periods AS p2
     WHERE p2.HoursCTR = p1.HoursCTR 
     ORDER BY p2.Duration DESC) AS t) AS LongestBreaks
FROM #Periods AS p1

The best way to do this in SQL Server is with a common table expression, numbering the rows in each group with the windowing function ROW_NUMBER():

WITH NumberedPeriods AS (
  SELECT HoursCTR, Duration, ROW_NUMBER() 
    OVER (PARTITION BY HoursCTR ORDER BY Duration DESC) AS RN
  FROM #Periods
  WHERE Rest = 1
)
SELECT HoursCTR, SUM(Duration) AS LongestBreaks
FROM NumberedPeriods
WHERE RN <= 2
GROUP BY HoursCTR

edit: I've added an ORDER BY clause in the partitioning, to get the two longest rests.


Mea culpa, I did not notice that you need this to work in Microsoft SQL Server 2000. That version doesn't support CTE's or windowing functions. I'll leave the answer above in case it helps someone else.

In SQL Server 2000, the common advice is to use a correlated subquery:

SELECT p1.HoursCTR, (SELECT SUM(t.Duration) FROM 
    (SELECT TOP 2 p2.Duration FROM #Periods AS p2
     WHERE p2.HoursCTR = p1.HoursCTR 
     ORDER BY p2.Duration DESC) AS t) AS LongestBreaks
FROM #Periods AS p1
倾其所爱 2024-10-02 14:19:33

SQL 2000 没有 CTE,也没有 ROW_NUMBER()
使用group by时,相关子查询可能需要额外的步骤。

这应该适合你:

SELECT 
    F.HoursCTR,
    MAX (F.LongestBreaks) AS LongestBreaks -- Dummy max() so that groupby can be used.
FROM
    (
        SELECT 
            Pm.HoursCTR, 
            (
                SELECT 
                    COALESCE (SUM (S.Duration), 0)    
                FROM 
                    (
                        SELECT TOP 2    T.Duration
                        FROM            #Periods    AS T
                        WHERE           T.HoursCTR  = Pm.HoursCTR 
                        AND             T.Rest      = 1
                        ORDER BY        T.Duration  DESC
                    ) AS S
             ) AS LongestBreaks
        FROM
            #Periods AS Pm
    ) AS F
GROUP BY
    F.HoursCTR

SQL 2000 does not have CTE's, nor ROW_NUMBER().
Correlated subqueries can need an extra step when using group by.

This should work for you:

SELECT 
    F.HoursCTR,
    MAX (F.LongestBreaks) AS LongestBreaks -- Dummy max() so that groupby can be used.
FROM
    (
        SELECT 
            Pm.HoursCTR, 
            (
                SELECT 
                    COALESCE (SUM (S.Duration), 0)    
                FROM 
                    (
                        SELECT TOP 2    T.Duration
                        FROM            #Periods    AS T
                        WHERE           T.HoursCTR  = Pm.HoursCTR 
                        AND             T.Rest      = 1
                        ORDER BY        T.Duration  DESC
                    ) AS S
             ) AS LongestBreaks
        FROM
            #Periods AS Pm
    ) AS F
GROUP BY
    F.HoursCTR
生死何惧 2024-10-02 14:19:33

不幸的是,亚历克斯,您已经找到了正确的解决方案:相关子查询(取决于它们的结构方式)最终将多次触发,可能会为您提供数百次单独的查询执行。

将当前的解决方案放入查询分析器中,启用“显示执行计划”(Ctrl+K),然后运行它。您将在底部有一个额外的选项卡,它将向您显示引擎如何收集结果的过程。如果您对相关子查询执行相同的操作,您将看到该选项的作用。

我相信,#Periods 表的锤击次数可能与该表中的各个行的次数一样多。

另外 - 在我看来,相关子查询有些问题。因为我像躲避瘟疫一样避开它们,知道它们是邪恶的,所以我不知道如何解决它。

Unfortunately for you, Alex, you've got the right solution: correlated subqueries, depending upon how they're structured, will end up firing multiple times, potentially giving you hundreds of individual query executions.

Put your current solution into the Query Analyzer, enable "Show Execution Plan" (Ctrl+K), and run it. You'll have an extra tab at the bottom which will show you how the engine went about the process of gathering your results. If you do the same with the correlated subquery, you'll see what that option does.

I believe that it's likely to hammer the #Periods table about as many times as you have individual rows in that table.

Also - something's off about the correlated subquery, seems to me. Since I avoid them like the plague, knowing that they're evil, I'm not sure how to go about fixing it up.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文