如何使用 Distinct/group by 和 JOINS 来过滤行中的重复项
为简单起见,我将给出一个我想要实现的目标的快速示例:
表 1 - Members
ID | Name
--------------------
1 | John
2 | Mike
3 | Sam
表 1 - Member_Selections
ID | planID
--------------------
1 | 1
1 | 2
1 | 1
2 | 2
2 | 3
3 | 2
3 | 1
表 3 - Selection_Details
planID | Cost
--------------------
1 | 5
2 | 10
3 | 12
当我运行查询时,我想返回所有成员的总和按成员分组的成员选择。然而,我面临的问题(例如表2数据)是某些成员可能错误地在系统内拥有重复的信息。虽然我们尽最大努力预先过滤这些数据,但有时它会被漏掉,因此当我对系统进行必要的调用以提取信息时,我也想过滤这些数据。
结果应该显示:
结果表,
ID | Name | Total_Cost
-----------------------------
1 | John | 15
2 | Mike | 22
3 | Sam | 15
但约翰的值为 $20,因为他错误地插入了两次计划 ID #1。
我的查询当前是:
SELECT
sq.ID, sq.name, SUM(sq.premium) AS total_cost
FROM
(
SELECT
m.id, m.name, g.premium
FROM members m
INNER JOIN member_selections s USING(ID)
INNER JOIN selection_details g USING(planid)
) sq group by sq.agent
添加 DISTINCT s.planID 会错误地过滤结果,因为它只会显示已售出的单个 PlanID 1(即使成员 1 和 3 购买了它)。
任何帮助表示赞赏。
编辑
还有另一个表我忘了提及,它是代理表(将计划出售给会员的代理)。
最终的 group by 语句对按代理 ID 销售的所有商品进行分组(这会将最终结果转换为一行)。
For simplicity, I will give a quick example of what i am trying to achieve:
Table 1 - Members
ID | Name
--------------------
1 | John
2 | Mike
3 | Sam
Table 1 - Member_Selections
ID | planID
--------------------
1 | 1
1 | 2
1 | 1
2 | 2
2 | 3
3 | 2
3 | 1
Table 3 - Selection_Details
planID | Cost
--------------------
1 | 5
2 | 10
3 | 12
When i run my query, I want to return the sum of the all member selections grouped by member. The issue I face however (e.g. table 2 data) is that some members may have duplicate information within the system by mistake. While we do our best to filter this data up front, sometimes it slips through the cracks so when I make the necessary calls to the system to pull information, I also want to filter this data.
the results SHOULD show:
Results Table
ID | Name | Total_Cost
-----------------------------
1 | John | 15
2 | Mike | 22
3 | Sam | 15
but instead have John as $20 because he has plan ID #1 inserted twice by mistake.
My query is currently:
SELECT
sq.ID, sq.name, SUM(sq.premium) AS total_cost
FROM
(
SELECT
m.id, m.name, g.premium
FROM members m
INNER JOIN member_selections s USING(ID)
INNER JOIN selection_details g USING(planid)
) sq group by sq.agent
Adding DISTINCT s.planID filters the results incorrectly as it will only show a single PlanID 1 sold (even though members 1 and 3 bought it).
Any help is appreciated.
EDIT
There is also another table I forgot to mention which is the agent table (the agent who sold the plans to members).
the final group by statement groups ALL items sold by the agent ID (which turns the final results into a single row).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
也许最简单的解决方案是在 member_selections 表上放置一个唯一的组合键:
这将防止在表中其他位置已存在 ID/planID 的唯一组合的情况下添加任何记录。这将只允许单个 (1,1)
评论后续:
刚刚看到您关于“更改忽略...”的评论。这工作得很好,但表中仍然会留下错误的重复项。我建议执行唯一键,然后手动清理表。我在评论中输入的查询应该为您找到所有重复项,然后您可以手动删除它们。一旦表干净,就不需要查询的重复处理版本。
Perhaps the simplest solution is to put a unique composite key on the member_selections table:
which would prevent any records from being added where the unique combo of ID/planID already exist elsewhere in the table. That'd allow only a single (1,1)
comment followup:
just saw your comment about the 'alter ignore...'. That's work fine, but you'd still be left with the bad duplicates in the table. I'd suggest doing the unique key, then manually cleaning up the table. The query I put in the comments should find all the duplicates for you, which you can then weed out by hand. once the table's clean, there'll be no need for the duplicate-handling version of the query.
使用 UNIQUE 键可防止意外重复输入。这将从源头上消除问题,而不是等到问题开始出现症状时才解决。它还使以后的查询变得更容易,因为您可以依靠拥有一致的数据库。
Use UNIQUE keys to prevent accidental duplicate entries. This will eliminate the problem at the source, instead of when it starts to show symptoms. It also makes later queries easier, because you can count on having a consistent database.
怎么样:
顺便说一句,是否有一个原因导致您在 member_selections 上没有主键来防止这些重复发生?
What about:
By the way, is there a reason you don't have a primary key on member_selections that will prevent these duplicates from happening in the first place?
您可以在内部查询中添加一个 group by 子句,该子句按所有三列进行分组,基本上只返回唯一的行。 (我还将“溢价”更改为“成本”以匹配您的示例表,并删除了代理部分)
You can add a group by clause into the inner query, which groups by all three columns, basically returning only unique rows. (I also changed 'premium' to 'cost' to match your example tables, and dropped the agent part)