如何使用 Distinct/group by 和 JOINS 来过滤行中的重复项

发布于 2024-10-12 03:32:38 字数 1442 浏览 3 评论 0原文

为简单起见,我将给出一个我想要实现的目标的快速示例:

表 1 - Members

  ID    |   Name
--------------------
  1     |   John    
  2     |   Mike    
  3     |   Sam  


表 1 - Member_Selections

  ID    |   planID
--------------------
  1     |   1    
  1     |   2    
  1     |   1    
  2     |   2    
  2     |   3    
  3     |   2    
  3     |   1    


表 3 - Selection_Details

planID  |   Cost
--------------------
  1     |   5    
  2     |   10    
  3     |   12  

当我运行查询时,我想返回所有成员的总和按成员分组的成员选择。然而,我面临的问题(例如表2数据)是某些成员可能错误地在系统内拥有重复的信息。虽然我们尽最大努力预先过滤这些数据,但有时它会被漏掉,因此当我对系统进行必要的调用以提取信息时,我也想过滤这些数据。

结果应该显示:

结果表,

ID  |    Name    | Total_Cost
-----------------------------
1   |    John    |   15
2   |    Mike    |   22
3   |    Sam     |   15

但约翰的值为 $20,因为他错误地插入了两次计划 ID #1。

我的查询当前是:

SELECT
    sq.ID, sq.name, SUM(sq.premium) AS total_cost
FROM
(
    SELECT
    m.id, m.name, g.premium
    FROM members m
    INNER JOIN member_selections s USING(ID)
    INNER JOIN selection_details g USING(planid)
) sq group by sq.agent

添加 DISTINCT s.planID 会错误地过滤结果,因为它只会显示已售出的单个 PlanID 1(即使成员 1 和 3 购买了它)。

任何帮助表示赞赏。

编辑

还有另一个表我忘了提及,它是代理表(将计划出售给会员的代理)。

最终的 group by 语句对按代理 ID 销售的所有商品进行分组(这会将最终结果转换为一行)。

For simplicity, I will give a quick example of what i am trying to achieve:

Table 1 - Members

  ID    |   Name
--------------------
  1     |   John    
  2     |   Mike    
  3     |   Sam  

Table 1 - Member_Selections

  ID    |   planID
--------------------
  1     |   1    
  1     |   2    
  1     |   1    
  2     |   2    
  2     |   3    
  3     |   2    
  3     |   1    

Table 3 - Selection_Details

planID  |   Cost
--------------------
  1     |   5    
  2     |   10    
  3     |   12  

When i run my query, I want to return the sum of the all member selections grouped by member. The issue I face however (e.g. table 2 data) is that some members may have duplicate information within the system by mistake. While we do our best to filter this data up front, sometimes it slips through the cracks so when I make the necessary calls to the system to pull information, I also want to filter this data.

the results SHOULD show:

Results Table

ID  |    Name    | Total_Cost
-----------------------------
1   |    John    |   15
2   |    Mike    |   22
3   |    Sam     |   15

but instead have John as $20 because he has plan ID #1 inserted twice by mistake.

My query is currently:

SELECT
    sq.ID, sq.name, SUM(sq.premium) AS total_cost
FROM
(
    SELECT
    m.id, m.name, g.premium
    FROM members m
    INNER JOIN member_selections s USING(ID)
    INNER JOIN selection_details g USING(planid)
) sq group by sq.agent

Adding DISTINCT s.planID filters the results incorrectly as it will only show a single PlanID 1 sold (even though members 1 and 3 bought it).

Any help is appreciated.

EDIT

There is also another table I forgot to mention which is the agent table (the agent who sold the plans to members).

the final group by statement groups ALL items sold by the agent ID (which turns the final results into a single row).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

蓝眸 2024-10-19 03:32:38

也许最简单的解决方案是在 member_selections 表上放置一个唯一的组合键:

 alter table member_selections add unique key ms_key (ID, planID);

这将防止在表中其他位置已存在 ID/planID 的唯一组合的情况下添加任何记录。这将只允许单个 (1,1)

评论后续:

刚刚看到您关于“更改忽略...”的评论。这工作得很好,但表中仍然会留下错误的重复项。我建议执行唯一键,然后手动清理表。我在评论中输入的查询应该为您找到所有重复项,然后您可以手动删除它们。一旦表干净,就不需要查询的重复处理版本。

Perhaps the simplest solution is to put a unique composite key on the member_selections table:

 alter table member_selections add unique key ms_key (ID, planID);

which would prevent any records from being added where the unique combo of ID/planID already exist elsewhere in the table. That'd allow only a single (1,1)

comment followup:

just saw your comment about the 'alter ignore...'. That's work fine, but you'd still be left with the bad duplicates in the table. I'd suggest doing the unique key, then manually cleaning up the table. The query I put in the comments should find all the duplicates for you, which you can then weed out by hand. once the table's clean, there'll be no need for the duplicate-handling version of the query.

海之角 2024-10-19 03:32:38

使用 UNIQUE 键可防止意外重复输入。这将从源头上消除问题,而不是等到问题开始出现症状时才解决。它还使以后的查询变得更容易,因为您可以依靠拥有一致的数据库。

Use UNIQUE keys to prevent accidental duplicate entries. This will eliminate the problem at the source, instead of when it starts to show symptoms. It also makes later queries easier, because you can count on having a consistent database.

国产ˉ祖宗 2024-10-19 03:32:38

怎么样:

SELECT
    sq.ID, sq.name, SUM(sq.premium) AS total_cost
FROM
(
    SELECT
    m.id, m.name, g.premium
    FROM members m
    INNER JOIN 
         (select distinct ID, PlanID from member_selections) s
    USING(ID)
    INNER JOIN selection_details g USING(planid)
) sq group by sq.agent

顺便说一句,是否有一个原因导致您在 member_selections 上没有主键来防止这些重复发生?

What about:

SELECT
    sq.ID, sq.name, SUM(sq.premium) AS total_cost
FROM
(
    SELECT
    m.id, m.name, g.premium
    FROM members m
    INNER JOIN 
         (select distinct ID, PlanID from member_selections) s
    USING(ID)
    INNER JOIN selection_details g USING(planid)
) sq group by sq.agent

By the way, is there a reason you don't have a primary key on member_selections that will prevent these duplicates from happening in the first place?

纵性 2024-10-19 03:32:38

您可以在内部查询中添加一个 group by 子句,该子句按所有三列进行分组,基本上只返回唯一的行。 (我还将“溢价”更改为“成本”以匹配您的示例表,并删除了代理部分)

SELECT
    sq.ID, 
    sq.name, 
    SUM(sq.Cost) AS total_cost
FROM
(
    SELECT
            m.id, 
            m.name, 
            g.Cost
    FROM 
            members m
            INNER JOIN member_selections s USING(ID) 
            INNER JOIN selection_details g USING(planid)

        GROUP BY
            m.ID,
            m.NAME,
            g.Cost
) sq 
group by 
    sq.ID,
    sq.NAME

You can add a group by clause into the inner query, which groups by all three columns, basically returning only unique rows. (I also changed 'premium' to 'cost' to match your example tables, and dropped the agent part)

SELECT
    sq.ID, 
    sq.name, 
    SUM(sq.Cost) AS total_cost
FROM
(
    SELECT
            m.id, 
            m.name, 
            g.Cost
    FROM 
            members m
            INNER JOIN member_selections s USING(ID) 
            INNER JOIN selection_details g USING(planid)

        GROUP BY
            m.ID,
            m.NAME,
            g.Cost
) sq 
group by 
    sq.ID,
    sq.NAME
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文