SQL Server：一个让我烦恼的分组问题

发布于 2024-09-05 20:07:15 字数 1322 浏览 8 评论 0原文

十年来我一直在使用 SQL Server，而这种分组（或分区，或排名...我不确定答案是什么！）让我感到困惑。感觉这也应该是一件容易的事。我概括一下我的问题：

假设我有 3 名员工（不用担心他们辞职或其他什么事情......总是有 3 名），并且我每月都会跟进如何分配他们的工资。

Month   Employee  PercentOfTotal
--------------------------------
1       Alice     25%
1       Barbara   65%
1       Claire    10%

2       Alice     25%
2       Barbara   50%
2       Claire    25%

3       Alice     25%
3       Barbara   65%
3       Claire    10%

正如您所看到的，我在第 1 个月和第 3 个月向他们支付了相同的百分比，但在第 2 个月，我给了 Alice 相同的 25%，但 Barbara 得到了 50%，Claire 得到了 25%。

我想知道的是我曾经给出的所有不同的分布。在这种情况下，会有两个 - 一个用于第 1 个月和第 3 个月，一个用于第 2 个月。

我希望结果看起来像这样（注意：ID、序列器或其他什么，并不重要）

ID      Employee  PercentOfTotal
--------------------------------
X       Alice     25%
X       Barbara   65%
X       Claire    10%

Y       Alice     25%
Y       Barbara   50%
Y       Claire    25%

看起来很容易，对吧？我被难住了！有人有一个优雅的解决方案吗？我只是在写这个问题时整理了这个解决方案，这似乎有效，但我想知道是否有更好的方法。或者也许是我能学到一些东西的不同方式。

WITH temp_ids (Month)
AS
(
  SELECT DISTINCT MIN(Month)
    FROM employees_paid
  GROUP BY PercentOfTotal
)
SELECT EMP.Month, EMP.Employee, EMP.PercentOfTotal
  FROM employees_paid EMP
         JOIN temp_ids IDS ON EMP.Month = IDS.Month
GROUP BY EMP.Month, EMP.Employee, EMP.PercentOfTotal

谢谢大家！ -瑞奇

原文

I've been working with SQL Server for the better part of a decade, and this grouping (or partitioning, or ranking...I'm not sure what the answer is!) one has me stumped. Feels like it should be an easy one, too. I'll generalize my problem:

Let's say I have 3 employees (don't worry about them quitting or anything...there's always 3), and I keep up with how I distribute their salaries on a monthly basis.

Month   Employee  PercentOfTotal
--------------------------------
1       Alice     25%
1       Barbara   65%
1       Claire    10%

2       Alice     25%
2       Barbara   50%
2       Claire    25%

3       Alice     25%
3       Barbara   65%
3       Claire    10%

As you can see, I've paid them the same percent in Months 1 and 3, but in Month 2, I've given Alice the same 25%, but Barbara got 50% and Claire got 25%.

What I want to know is all the distinct distributions I've ever given. In this case there would be two -- one for months 1 and 3, and one for month 2.

I'd expect the results to look something like this (NOTE: the ID, or sequencer, or whatever, doesn't matter)

ID      Employee  PercentOfTotal
--------------------------------
X       Alice     25%
X       Barbara   65%
X       Claire    10%

Y       Alice     25%
Y       Barbara   50%
Y       Claire    25%

Seems easy, right? I'm stumped! Anyone have an elegant solution? I just put together this solution while writing this question, which seems to work, but I'm wondering if there's a better way. Or maybe a different way from which I'll learn something.

WITH temp_ids (Month)
AS
(
  SELECT DISTINCT MIN(Month)
    FROM employees_paid
  GROUP BY PercentOfTotal
)
SELECT EMP.Month, EMP.Employee, EMP.PercentOfTotal
  FROM employees_paid EMP
         JOIN temp_ids IDS ON EMP.Month = IDS.Month
GROUP BY EMP.Month, EMP.Employee, EMP.PercentOfTotal

Thanks y'all!
-Ricky

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

淡水深流 2024-09-12 20:07:15

这会以与您要求的格式稍有不同的格式提供答案：

SELECT DISTINCT
    T1.PercentOfTotal AS Alice,
    T2.PercentOfTotal AS Barbara,
    T3.PercentOfTotal AS Claire
FROM employees_paid T1
JOIN employees_paid T2
  ON T1.Month = T2.Month AND T1.Employee = 'Alice' AND T2.Employee = 'Barbara'
JOIN employees_paid T3
  ON T2.Month = T3.Month AND T3.Employee = 'Claire'

结果：

Alice   Barbara  Claire
25%     50%      25%
25%     65%      10%

如果您愿意，可以使用 UNPIVOT 将此结果集转换为您要求的形式。

SELECT rn AS ID, Employee, PercentOfTotal
FROM (
    SELECT *, ROW_NUMBER() OVER (ORDER BY Alice) AS rn
    FROM (
        SELECT DISTINCT
            T1.PercentOfTotal AS Alice,
            T2.PercentOfTotal AS Barbara,
            T3.PercentOfTotal AS Claire
        FROM employees_paid T1
        JOIN employees_paid T2 ON T1.Month = T2.Month AND T1.Employee = 'Alice'
                                                      AND T2.Employee = 'Barbara'
        JOIN employees_paid T3 ON T2.Month = T3.Month AND T3.Employee = 'Claire'
    ) T1
) p UNPIVOT (PercentOfTotal FOR Employee IN (Alice, Barbara, Claire)) AS unpvt

结果：

ID  Employee  PercentOfTotal  
1   Alice     25%
1   Barbara   50%      
1   Claire    25%             
2   Alice     25%             
2   Barbara   65%              
2   Claire    10%

This gives you an answer in a slightly different format than you requested:

SELECT DISTINCT
    T1.PercentOfTotal AS Alice,
    T2.PercentOfTotal AS Barbara,
    T3.PercentOfTotal AS Claire
FROM employees_paid T1
JOIN employees_paid T2
  ON T1.Month = T2.Month AND T1.Employee = 'Alice' AND T2.Employee = 'Barbara'
JOIN employees_paid T3
  ON T2.Month = T3.Month AND T3.Employee = 'Claire'

Result:

Alice   Barbara  Claire
25%     50%      25%
25%     65%      10%

If you want to, you can use UNPIVOT to turn this result set into the form you asked for.

SELECT rn AS ID, Employee, PercentOfTotal
FROM (
    SELECT *, ROW_NUMBER() OVER (ORDER BY Alice) AS rn
    FROM (
        SELECT DISTINCT
            T1.PercentOfTotal AS Alice,
            T2.PercentOfTotal AS Barbara,
            T3.PercentOfTotal AS Claire
        FROM employees_paid T1
        JOIN employees_paid T2 ON T1.Month = T2.Month AND T1.Employee = 'Alice'
                                                      AND T2.Employee = 'Barbara'
        JOIN employees_paid T3 ON T2.Month = T3.Month AND T3.Employee = 'Claire'
    ) T1
) p UNPIVOT (PercentOfTotal FOR Employee IN (Alice, Barbara, Claire)) AS unpvt

Result:

ID  Employee  PercentOfTotal  
1   Alice     25%
1   Barbara   50%      
1   Claire    25%             
2   Alice     25%             
2   Barbara   65%              
2   Claire    10%

回复收藏 0 原文

止于盛夏 2024-09-12 20:07:15

您想要的是让每个月的分布充当您希望在其他月份找到的值的签名或模式。目前尚不清楚的是，价值流向的员工是否与百分比细分一样重要。例如，Alice=65%、Barbara=25%、Claire=10% 与示例中的第 3 个月相同吗？在我的例子中，我认为它不会相同。与 Martin Smith 的解决方案类似，我通过将每个百分比乘以 10 来找到签名。这假定所有百分比值都小于 1。例如，如果某人拥有 110% 的百分比，就会给该解决方案带来问题。

With Employees As
    (
    Select 1 As Month, 'Alice' As Employee, .25 As PercentOfTotal
    Union All Select 1, 'Barbara', .65
    Union All Select 1, 'Claire', .10
    Union All Select 2, 'Alice', .25
    Union All Select 2, 'Barbara', .50
    Union All Select 2, 'Claire', .25
    Union All Select 3, 'Alice', .25
    Union All Select 3, 'Barbara', .65
    Union All Select 3, 'Claire', .10
    )
    , EmployeeRanks As
    (
    Select Month, Employee, PercentOfTotal
        , Row_Number() Over ( Partition By Month Order By Employee, PercentOfTotal ) As ItemRank
    From Employees
    )
    , Signatures As
    (
    Select Month
        , Sum( PercentOfTotal * Cast( Power( 10, ItemRank ) As bigint) ) As SignatureValue
    From EmployeeRanks
    Group By Month
    )
    , DistinctSignatures As
    (
    Select Min(Month) As MinMonth, SignatureValue
    From Signatures
    Group By SignatureValue
    )
Select E.Month, E.Employee, E.PercentOfTotal
From Employees As E
    Join DistinctSignatures As D
        On D.MinMonth = E.Month

What you want is for each month's distribution to act as a signature or pattern of values which you would then want to find in other months. What is not clear is whether the employee to whom the value went is as important as the break down of percentages. For example, would Alice=65%, Barbara=25%, Claire=10% be the same as the Month 3 in your example? In my example, I presumed that it would not be the same. Similar to Martin Smith's solution, I find the signatures by multiplying each percentage by 10. This presumes that all percentage values are less than one. If someone could have a percentage of 110% for example, that would create problems for this solution.

With Employees As
    (
    Select 1 As Month, 'Alice' As Employee, .25 As PercentOfTotal
    Union All Select 1, 'Barbara', .65
    Union All Select 1, 'Claire', .10
    Union All Select 2, 'Alice', .25
    Union All Select 2, 'Barbara', .50
    Union All Select 2, 'Claire', .25
    Union All Select 3, 'Alice', .25
    Union All Select 3, 'Barbara', .65
    Union All Select 3, 'Claire', .10
    )
    , EmployeeRanks As
    (
    Select Month, Employee, PercentOfTotal
        , Row_Number() Over ( Partition By Month Order By Employee, PercentOfTotal ) As ItemRank
    From Employees
    )
    , Signatures As
    (
    Select Month
        , Sum( PercentOfTotal * Cast( Power( 10, ItemRank ) As bigint) ) As SignatureValue
    From EmployeeRanks
    Group By Month
    )
    , DistinctSignatures As
    (
    Select Min(Month) As MinMonth, SignatureValue
    From Signatures
    Group By SignatureValue
    )
Select E.Month, E.Employee, E.PercentOfTotal
From Employees As E
    Join DistinctSignatures As D
        On D.MinMonth = E.Month

回复收藏 0 原文

善良天后 2024-09-12 20:07:15

我假设性能不会很好（由于子查询）

SELECT * FROM employees_paid where Month not in (
     SELECT
          a.Month
     FROM
          employees_paid a
          INNER JOIN employees_paid b ON 
               (a.employee = B.employee AND 
               a.PercentOfTotal = b.PercentOfTotal AND 
               a.Month > b.Month)
     GROUP BY
          a.Month,
          b.Month
     HAVING
          Count(*) = (SELECT COUNT(*) FROM employees_paid c 
               where c.Month = a.Month)
     )

内部 SELECT 执行自连接来识别匹配的员工和百分比组合（同月的组合除外）。
> JOIN 中的确保仅采用一组匹配，即如果 Month1 条目 = Month3 条目，我们仅获得 Month3-Month1 条目组合，而不是 Month1-Month3、Month3-Month1 和 Month3-Month3。
然后，我们按每个月-月组合的匹配条目的 COUNT 进行 GROUP
然后 HAVING 排除那些没有与月份条目一样多的匹配项的月份
外部 SELECT 获取除内部查询返回的条目之外的所有条目（带有全套比赛）

I'm assuming performance won't be great (cause of the subquery)

SELECT * FROM employees_paid where Month not in (
     SELECT
          a.Month
     FROM
          employees_paid a
          INNER JOIN employees_paid b ON 
               (a.employee = B.employee AND 
               a.PercentOfTotal = b.PercentOfTotal AND 
               a.Month > b.Month)
     GROUP BY
          a.Month,
          b.Month
     HAVING
          Count(*) = (SELECT COUNT(*) FROM employees_paid c 
               where c.Month = a.Month)
     )

The inner SELECT does a self join to identify matching employee and percentage combinations (except those for the same month).
The > in the JOIN ensures that only one set of matches is taken i.e. if a Month1 entry = Month3 entry, we get only the Month3-Month1 entry combination instead of Month1-Month3, Month3-Month1 and Month3-Month3.
We then GROUP by COUNT of matched entries for each month-month combination
Then the HAVING excludes months that don't have as many matches as there are month entries
The outer SELECT gets all entries except the ones returned by the inner query (the ones with full set matches)

回复收藏 0 原文

￡冰雨忧蓝° 2024-09-12 20:07:15

如果我正确理解了您的意思，那么对于通用解决方案，我认为您需要将整个组连接在一起 - 例如生成 Alice:0.25、Barbara:0.50、Claire:0.25。然后选择不同的组，这样就可以执行以下操作（相当笨拙）。

WITH EmpSalaries
AS
(

SELECT 1 AS Month, 'Alice' AS Employee, 0.25 AS PercentOfTotal UNION ALL
SELECT 1 AS Month, 'Barbara' AS Employee, 0.65 UNION ALL
SELECT 1 AS Month, 'Claire' AS Employee, 0.10 UNION ALL

SELECT 2 AS Month, 'Alice' AS Employee, 0.25 UNION ALL
SELECT 2 AS Month, 'Barbara' AS Employee, 0.50 UNION ALL
SELECT 2 AS Month, 'Claire' AS Employee, 0.25 UNION ALL

SELECT 3 AS Month,  'Alice' AS Employee, 0.25 UNION ALL
SELECT 3 AS Month,  'Barbara' AS Employee, 0.65 UNION ALL
SELECT 3 AS Month,  'Claire' AS Employee, 0.10 
),
Months AS 
(
SELECT DISTINCT Month FROM EmpSalaries
),
MonthlySummary AS
(
SELECT Month,
Stuff(
            (
            Select ', ' + S1.Employee + ':' + cast(PercentOfTotal as varchar(20))
            From EmpSalaries As S1
            Where S1.Month = Months.Month
            Order By S1.Employee
            For Xml Path('')
            ), 1, 2, '') As Summary
FROM Months
)
SELECT * FROM EmpSalaries
WHERE Month IN (SELECT MIN(Month)
                FROM MonthlySummary
                GROUP BY Summary)

If I have understood you correctly then, for a general solution, I think you would need to concatenate the whole group together - e.g. to produce Alice:0.25, Barbara:0.50, Claire:0.25. Then select the distinct groups so something like the following would do it (rather clunkily).

WITH EmpSalaries
AS
(

SELECT 1 AS Month, 'Alice' AS Employee, 0.25 AS PercentOfTotal UNION ALL
SELECT 1 AS Month, 'Barbara' AS Employee, 0.65 UNION ALL
SELECT 1 AS Month, 'Claire' AS Employee, 0.10 UNION ALL

SELECT 2 AS Month, 'Alice' AS Employee, 0.25 UNION ALL
SELECT 2 AS Month, 'Barbara' AS Employee, 0.50 UNION ALL
SELECT 2 AS Month, 'Claire' AS Employee, 0.25 UNION ALL

SELECT 3 AS Month,  'Alice' AS Employee, 0.25 UNION ALL
SELECT 3 AS Month,  'Barbara' AS Employee, 0.65 UNION ALL
SELECT 3 AS Month,  'Claire' AS Employee, 0.10 
),
Months AS 
(
SELECT DISTINCT Month FROM EmpSalaries
),
MonthlySummary AS
(
SELECT Month,
Stuff(
            (
            Select ', ' + S1.Employee + ':' + cast(PercentOfTotal as varchar(20))
            From EmpSalaries As S1
            Where S1.Month = Months.Month
            Order By S1.Employee
            For Xml Path('')
            ), 1, 2, '') As Summary
FROM Months
)
SELECT * FROM EmpSalaries
WHERE Month IN (SELECT MIN(Month)
                FROM MonthlySummary
                GROUP BY Summary)

回复收藏 0 原文

唔猫 2024-09-12 20:07:15

我刚刚整理了这个解决方案
在写这个问题的时候，
似乎有效

我认为它不起作用。在这里，我添加了另外两组（分别为月份 = 4 和 5），我认为它们是不同的，但结果是相同的，即仅月份 = 1 和 2：

WITH employees_paid (Month, Employee, PercentOfTotal)
AS 
(
 SELECT 1, 'Alice', 0.25
 UNION ALL
 SELECT 1, 'Barbara', 0.65
 UNION ALL
 SELECT 1, 'Claire', 0.1
 UNION ALL
 SELECT 2, 'Alice', 0.25
 UNION ALL
 SELECT 2, 'Barbara', 0.5
 UNION ALL
 SELECT 2, 'Claire', 0.25
 UNION ALL
 SELECT 3, 'Alice', 0.25
 UNION ALL
 SELECT 3, 'Barbara', 0.65
 UNION ALL
 SELECT 3, 'Claire', 0.1
 UNION ALL
 SELECT 4, 'Barbara', 0.25
 UNION ALL
 SELECT 4, 'Claire', 0.65
 UNION ALL
 SELECT 4, 'Alice', 0.1
 UNION ALL
 SELECT 5, 'Diana', 0.25
 UNION ALL
 SELECT 5, 'Emma', 0.65
 UNION ALL
 SELECT 5, 'Fiona', 0.1
), 
temp_ids (Month)
AS
(
 SELECT DISTINCT MIN(Month)
   FROM employees_paid
  GROUP 
     BY PercentOfTotal
)
SELECT EMP.Month, EMP.Employee, EMP.PercentOfTotal
  FROM employees_paid AS EMP
       INNER JOIN temp_ids AS IDS 
          ON EMP.Month = IDS.Month
 GROUP 
    BY EMP.Month, EMP.Employee, EMP.PercentOfTotal;

I just put together this solution
while writing this question, which
seems to work

I don't think it does work. Here I've added a further two groups (month = 4 and 5 respectively) which I would consider to be distinct yet the result is the same i.e. month = 1 and 2 only:

WITH employees_paid (Month, Employee, PercentOfTotal)
AS 
(
 SELECT 1, 'Alice', 0.25
 UNION ALL
 SELECT 1, 'Barbara', 0.65
 UNION ALL
 SELECT 1, 'Claire', 0.1
 UNION ALL
 SELECT 2, 'Alice', 0.25
 UNION ALL
 SELECT 2, 'Barbara', 0.5
 UNION ALL
 SELECT 2, 'Claire', 0.25
 UNION ALL
 SELECT 3, 'Alice', 0.25
 UNION ALL
 SELECT 3, 'Barbara', 0.65
 UNION ALL
 SELECT 3, 'Claire', 0.1
 UNION ALL
 SELECT 4, 'Barbara', 0.25
 UNION ALL
 SELECT 4, 'Claire', 0.65
 UNION ALL
 SELECT 4, 'Alice', 0.1
 UNION ALL
 SELECT 5, 'Diana', 0.25
 UNION ALL
 SELECT 5, 'Emma', 0.65
 UNION ALL
 SELECT 5, 'Fiona', 0.1
), 
temp_ids (Month)
AS
(
 SELECT DISTINCT MIN(Month)
   FROM employees_paid
  GROUP 
     BY PercentOfTotal
)
SELECT EMP.Month, EMP.Employee, EMP.PercentOfTotal
  FROM employees_paid AS EMP
       INNER JOIN temp_ids AS IDS 
          ON EMP.Month = IDS.Month
 GROUP 
    BY EMP.Month, EMP.Employee, EMP.PercentOfTotal;

回复收藏 0 原文

~没有更多了~