按 + 分组加入

发布于 2024-11-15 23:31:10 字数 912 浏览 2 评论 0原文

您好,我在使用 Group By 和 3 个表之间的连接时遇到问题。

我有一个包含各种字段和项目代码字段的项目表。然后,我有一个发票表和一个工时表,每个表每个项目可以有多行。这两个表也有项目代码。

这两个 SUM 值计算不正确,我真的很难找出问题所在。

这是我正在使用的 sql:

SELECT  dbo.project.projectcode, 
        dbo.project.client, 
        dbo.project.project, 
        dbo.project.budget, 
        dbo.project.budget * 80 AS value, 
        SUM(dbo.harvest.hours) AS hourslogged, 
        SUM(dbo.salesforce.value) AS invoiced
FROM  dbo.salesforce 
    RIGHT OUTER JOIN dbo.project 
        ON dbo.salesforce.projectcode = dbo.project.projectcode 
    LEFT OUTER JOIN dbo.harvest 
        ON dbo.project.projectcode = dbo.harvest.projectcode
GROUP BY    dbo.project.projectcode, 
            dbo.salesforce.projectcode, 
            dbo.harvest.projectcode, 
            dbo.project.project, 
            dbo.project.client, 
            dbo.project.budget

任何有关此的帮助或提示将不胜感激!

Hi I am having a problems using Group By and joins between 3 tables.

I have a project table with various fields and a projectcode fields. I then have an invoice table and an hours table and each can have multiple rows per project. Both of these table have project code also.

The two SUM values are not calculating correctly and I am realy struggling to see where the issue is.

Here the sql I am using:

SELECT  dbo.project.projectcode, 
        dbo.project.client, 
        dbo.project.project, 
        dbo.project.budget, 
        dbo.project.budget * 80 AS value, 
        SUM(dbo.harvest.hours) AS hourslogged, 
        SUM(dbo.salesforce.value) AS invoiced
FROM  dbo.salesforce 
    RIGHT OUTER JOIN dbo.project 
        ON dbo.salesforce.projectcode = dbo.project.projectcode 
    LEFT OUTER JOIN dbo.harvest 
        ON dbo.project.projectcode = dbo.harvest.projectcode
GROUP BY    dbo.project.projectcode, 
            dbo.salesforce.projectcode, 
            dbo.harvest.projectcode, 
            dbo.project.project, 
            dbo.project.client, 
            dbo.project.budget

Any help or tips on this would be much appreciated!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

女皇必胜 2024-11-22 23:31:10

每当 dbo.salesforcedbo.harvest 这两个表中的每个 projectcode 都有超过 1 个匹配项时,就会生成一个迷你笛卡尔积发生。这是一个简单的例子。假设有表 AB,如下所示:

  • Table A:

    援助价值
    --- --------
    1 值A1
    2 值A2
    
  • B

    BID BVALUE AID
    --- ------- ---
    1 值B1 1
    2 值B2 1
    3 值B3 2
    

现在如果我们执行此连接:

SELECT * FROM A JOIN B ON A.AID = B.AID

结果将是:

AID  AVALUE   BID  BVALUE   AID
---  -------  ---  -------  ---
1    ValueA1  1    ValueB1  1
1    ValueA1  2    ValueB2  1
2    ValueA2  3    ValueB3  2

输入表C

CID  CVALUE   AID
---  -------  ---
1    ValueC1  1
2    ValueC2  1
3    ValueC3  1

现在的连接是这样的

SELECT * FROM A JOIN B ON A.AID = B.AID JOIN C ON A.AID = C.AID

:结果是什么?此处:

AID  AVALUE   BID  BVALUE   AID  CID  CVALUE   AID
---  -------  ---  -------  ---  ---  -------  ---
1    ValueA1  1    ValueB1  1    1    ValueC1  1
1    ValueA1  1    ValueB1  1    2    ValueC2  1
1    ValueA1  1    ValueB1  1    3    ValueC3  1
1    ValueA1  2    ValueB2  1    1    ValueC3  1
1    ValueA1  2    ValueB2  1    2    ValueC3  1
1    ValueA1  2    ValueB2  1    3    ValueC3  1

如您所见,B 中的每个匹配项都会重复 3 次,以计算 C 获得的匹配数。并且,类似地,C 中的每个匹配项都会重复两次,因为这是 B 中的匹配项数量。当然,“最幸运”的是来自 A 的行,因为它重复了 2 × 3 = 6 次。这是您的笛卡尔连接。这也正是你的情况发生的情况。

不确定它是否被认为是典型的,但在这种情况下,我通常会通过连接表达式分别对每个表进行分组,然后连接结果集。您的查询将如下所示:

SELECT
  p.projectcode, 
  p.client, 
  p.project, 
  p.budget, 
  p.budget * 80 AS value, 
  h.hourslogged, 
  s.invoiced
FROM dbo.project p
  LEFT JOIN (
    SELECT
      projectcode,
      SUM(dbo.salesforce.value) AS invoiced
    FROM dbo.salesforce
    GROUP BY projectcode
  ) s ON p.projectcode = s.projectcode
  LEFT JOIN (
    SELECT
      projectcode,
      SUM(dbo.harvest.hours) AS hourslogged
    FROM dbo.harvest 
    GROUP BY projectcode
  ) h ON p.projectcode = h.projectcode

Whenever each of the two tables, dbo.salesforce and dbo.harvest, have more than 1 match for every projectcode, a mini-Cartesian product happens. Here's a simple illustration. Suppose there are tables A and B, like this:

  • Table A:

    AID  AVALUE
    ---  -------
    1    ValueA1
    2    ValueA2
    
  • Table B:

    BID  BVALUE   AID
    ---  -------  ---
    1    ValueB1  1
    2    ValueB2  1
    3    ValueB3  2
    

Now if we performed this join:

SELECT * FROM A JOIN B ON A.AID = B.AID

the result would be:

AID  AVALUE   BID  BVALUE   AID
---  -------  ---  -------  ---
1    ValueA1  1    ValueB1  1
1    ValueA1  2    ValueB2  1
2    ValueA2  3    ValueB3  2

Enter table C:

CID  CVALUE   AID
---  -------  ---
1    ValueC1  1
2    ValueC2  1
3    ValueC3  1

And the join now is this:

SELECT * FROM A JOIN B ON A.AID = B.AID JOIN C ON A.AID = C.AID

What would be the result? Here:

AID  AVALUE   BID  BVALUE   AID  CID  CVALUE   AID
---  -------  ---  -------  ---  ---  -------  ---
1    ValueA1  1    ValueB1  1    1    ValueC1  1
1    ValueA1  1    ValueB1  1    2    ValueC2  1
1    ValueA1  1    ValueB1  1    3    ValueC3  1
1    ValueA1  2    ValueB2  1    1    ValueC3  1
1    ValueA1  2    ValueB2  1    2    ValueC3  1
1    ValueA1  2    ValueB2  1    3    ValueC3  1

As you can see, every match from B is repeated three times, for how many matches C has got. And, similarly, every match from C is repeated twice, because that is how many matches there are in B. The 'luckiest', of course, is the row from A, because it is repeated 2 × 3 = 6 times. That is a Cartesian join for you. And that's just what happens in your case too.

Not sure whether it is considered typical, but in such cases I would often group each table separately by the joining expression(s), then join the result sets. Your query would then look like this:

SELECT
  p.projectcode, 
  p.client, 
  p.project, 
  p.budget, 
  p.budget * 80 AS value, 
  h.hourslogged, 
  s.invoiced
FROM dbo.project p
  LEFT JOIN (
    SELECT
      projectcode,
      SUM(dbo.salesforce.value) AS invoiced
    FROM dbo.salesforce
    GROUP BY projectcode
  ) s ON p.projectcode = s.projectcode
  LEFT JOIN (
    SELECT
      projectcode,
      SUM(dbo.harvest.hours) AS hourslogged
    FROM dbo.harvest 
    GROUP BY projectcode
  ) h ON p.projectcode = h.projectcode
等数载,海棠开 2024-11-22 23:31:10

我建议避免混合右外连接和左外连接。
您的中心表是 Project,因此首先使用它。

SELECT  dbo.project.projectcode, 
        dbo.project.client, 
        dbo.project.project, 
        dbo.project.budget, 
        dbo.project.budget * 80 AS value, 
        SUM(dbo.harvest.hours) AS hourslogged, 
        SUM(dbo.salesforce.value) AS invoiced
FROM    dbo.project      
            LEFT OUTER JOIN dbo.salesforce
                ON dbo.salesforce.projectcode = dbo.project.projectcode 
            LEFT OUTER JOIN dbo.harvest 
                ON dbo.project.projectcode = dbo.harvest.projectcode
GROUP BY    dbo.project.projectcode, 
            dbo.project.project, 
            dbo.project.client, 
            dbo.project.budget

但错误来自 GROUP BY。您不必按要进行聚合的两个表进行分组,否则您的聚合将不好!

I'd suggest to avoid mixing right and left outer join.
Your central table is Project, so use it first.

SELECT  dbo.project.projectcode, 
        dbo.project.client, 
        dbo.project.project, 
        dbo.project.budget, 
        dbo.project.budget * 80 AS value, 
        SUM(dbo.harvest.hours) AS hourslogged, 
        SUM(dbo.salesforce.value) AS invoiced
FROM    dbo.project      
            LEFT OUTER JOIN dbo.salesforce
                ON dbo.salesforce.projectcode = dbo.project.projectcode 
            LEFT OUTER JOIN dbo.harvest 
                ON dbo.project.projectcode = dbo.harvest.projectcode
GROUP BY    dbo.project.projectcode, 
            dbo.project.project, 
            dbo.project.client, 
            dbo.project.budget

But the error come from the GROUP BY. You don't have to group by the two tables on which you are doing the aggregate, else your aggregate will not be good !

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文