连接两个表*包括*一个表中的重复项的最佳方法

发布于 2024-12-15 14:22:51 字数 2173 浏览 2 评论 0原文

账户(表)

+----+----------+----------+-------+
| id | account# | supplier | RepID |
+----+----------+----------+-------+
|  1 | 123xyz   | Boston   |     2 |
|  2 | 245xyz   | Chicago  |     2 |
|  3 | 425xyz   | Chicago  |     3 |
+----+----------+----------+-------+

支出(表)

+----+----------+----------+-------------+--------+
| id | account# | supplier | datecreated | Amount |
+----+----------+----------+-------------+--------+
|  5 | 245xyz   | Chicago  | 01-15-2009  | 25     |
|  6 | 123xyz   | Boston   | 10-15-2011  | 50     |
|  7 | 123xyz   | Boston   | 10-15-2011  | -50    |
|  8 | 123xyz   | Boston   | 10-15-2011  | 50     |
|  9 | 425xyz   | Chicago  | 10-15-2011  | 100    |
+----+----------+----------+-------------+--------+

我有账户表和支出表。赔率表来自国外,因此我们无法对其进行任何控制。这给我们带来了一个问题,我们无法根据记录ID字段连接两个表,这是一个我们无法解决的问题。因此,我们根据帐户#、供应商 ID(第 2 列和第 3 列)加入。这产生了一个问题,即它创建了(可能)多对多关系。但是,如果记录处于活动状态,我们会对其进行过滤,并且在创建支付时,我们会在支付表上使用第二个过滤器。付款是按月创建的。在我看来,这有两个问题

  1. 查询需要相当多的时间才能完成(可能效率低下)
  2. 某些重复项已被删除,但不应删除。例如支付表中的记录 6 和 8。这里发生的事情是,我们有一个客户,然后客户取消了,然后他又回来了。在本例中为+50、-50 和+50。同样,所有值都是有效的,并且出于审计目的必须显示在报告中。目前仅显示一个+50,另一个已丢失。报告中偶尔还会出现一些其他问题。

这是查询。它使用分组依据来删除重复项。我想要一个预先查询,它的性能优于其他查询,并且它确实考虑到 PayOut 表中的记录不会重复,只要它们出现在报告月份中。

这是我们当前的查询

/* Supplied  to Store Procedure */
-----------------------------------
@RepID // the person for whome payout is calculated
@Month // of payment date
@year  // year of payment date
-----------------------------------
select distinct 
A.col1, 
A.col2,
...
A.col10, 
B.col2, 
B.Col2, 
B.Amount /* this is the important column, portion of which goes to Rep */
from records A
JOIN payout B 
on A.Supplier = B.Supplier AND A.Account# = B.Account#
where datepart(mm, B.datecreated) = @Month /* parameter to stored procedure */
  and datepart(yyyy, B.datecreated) = @Year
  and A.[rep ID] = @RepID /* parameter to SP */
group by
col1,col2,col3,....col10
order by customerName

这个查询是否最优?我可以使用 CROSS APPLY 或 WHERE EXIST 来改进它,从而使其更快并消除重复问题吗?

请注意,此查询用于获取代表的付款。因此,每条记录都有一个repid字段,它被分配给谁。理想情况下,我想使用 Select WHERE Exist 查询。

Accounts (table)

+----+----------+----------+-------+
| id | account# | supplier | RepID |
+----+----------+----------+-------+
|  1 | 123xyz   | Boston   |     2 |
|  2 | 245xyz   | Chicago  |     2 |
|  3 | 425xyz   | Chicago  |     3 |
+----+----------+----------+-------+

PayOut (table)

+----+----------+----------+-------------+--------+
| id | account# | supplier | datecreated | Amount |
+----+----------+----------+-------------+--------+
|  5 | 245xyz   | Chicago  | 01-15-2009  | 25     |
|  6 | 123xyz   | Boston   | 10-15-2011  | 50     |
|  7 | 123xyz   | Boston   | 10-15-2011  | -50    |
|  8 | 123xyz   | Boston   | 10-15-2011  | 50     |
|  9 | 425xyz   | Chicago  | 10-15-2011  | 100    |
+----+----------+----------+-------------+--------+

I have accounts table and I have payout table. Payout table comes from abroad so we do not have any control over it. This leaves us with a problem that we can't join the two tables based on record ID field, that is one problem which we can't solved. We therefore join based on Account#, SupplierID (2nd and 3rd column). This creates a problem that it creates (possibly) many to many relationship. But we filter our records if they are active and we use a second filter on payout table when the payout was created. Payout are created months to month. There are two problems with this in my view

  1. The query takes quite a bit of time to complete (could be inefficient)
  2. There are certain duplicates that are removed which should not be removed. Example is record 6 and 8 in payout table. What happened here is, we got a customer, then the customer cancelled then he got him back. In this case +50, -50 and +50. Again all values are valid and must show in the report for audit purposes. Currently only one +50 is shown, the other is lost. There are a couple of other problems within the report that comes once in a while.

Here is the query. It uses groups by to remove duplicates. I would like to have an advance query which outperforms and which does takes into account that no record in PayOut table is duplicated as long as they come up in the month of the report.

Here is our current query

/* Supplied  to Store Procedure */
-----------------------------------
@RepID // the person for whome payout is calculated
@Month // of payment date
@year  // year of payment date
-----------------------------------
select distinct 
A.col1, 
A.col2,
...
A.col10, 
B.col2, 
B.Col2, 
B.Amount /* this is the important column, portion of which goes to Rep */
from records A
JOIN payout B 
on A.Supplier = B.Supplier AND A.Account# = B.Account#
where datepart(mm, B.datecreated) = @Month /* parameter to stored procedure */
  and datepart(yyyy, B.datecreated) = @Year
  and A.[rep ID] = @RepID /* parameter to SP */
group by
col1,col2,col3,....col10
order by customerName

Is this query optimum? Can I improve it using CROSS APPLY or WHERE EXISTs that will make it faster as well as remove the duplicate problem?

Note that this query is used to get payout of a rep. Hence every record has repid field who it is assigned to. Ideally I would like to use Select WHERE Exist query.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

牵你手 2024-12-22 14:22:51

很难准确理解您想要什么,因为在一个地方您说您“想要”重复项,但随后又说您正在使用分组依据来删除重复项。所以第一个想法是“为什么不直接摆脱分组呢?”。但我必须相信你足够聪明,能够自己想到这一点,所以我认为它的存在一定是有原因的。

我认为如果您可以发布实际的查询,这里的人可以很容易地帮助您,但既然您说不能,我只会尝试为您提供一些解决问题的方向...

而不是尝试在一个声明中完成所有操作,使用临时表或视图来分割它。对您来说,考虑如何删除不需要的重复项并保留您首先执行的重复项并将其放入临时表中,然后将这些表连接在一起并使用它可能会更容易。

It's difficult to understand exactly what you want because in one place you say you 'want' the duplicates but then you say that you are using the group by to remove duplicates. So the first thought would be "Why not just get rid of the group by?". But I have to believe you are smart enough to have thought of that yourself, so I assume it's got to be there for a reason.

I think someone here could help you pretty easily if you could post the actual query, but since you say you can't I will just try to give you some direction in solving the problem...

Instead of trying to do everything in one statement, use temporary tables or views to split it up. It may be easier for you to think about how to get rid of the duplicates you don't want and keep the ones you do first and put those into a temporary table, and then join the tables together and work with that.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文