连接两个表包括一个表中的重复项的最佳方法

发布于 2024-12-15 14:22:51 字数 2173 浏览 2 评论 0原文

账户（表）

+----+----------+----------+-------+
| id | account# | supplier | RepID |
+----+----------+----------+-------+
|  1 | 123xyz   | Boston   |     2 |
|  2 | 245xyz   | Chicago  |     2 |
|  3 | 425xyz   | Chicago  |     3 |
+----+----------+----------+-------+

支出（表）

+----+----------+----------+-------------+--------+
| id | account# | supplier | datecreated | Amount |
+----+----------+----------+-------------+--------+
|  5 | 245xyz   | Chicago  | 01-15-2009  | 25     |
|  6 | 123xyz   | Boston   | 10-15-2011  | 50     |
|  7 | 123xyz   | Boston   | 10-15-2011  | -50    |
|  8 | 123xyz   | Boston   | 10-15-2011  | 50     |
|  9 | 425xyz   | Chicago  | 10-15-2011  | 100    |
+----+----------+----------+-------------+--------+

我有账户表和支出表。赔率表来自国外，因此我们无法对其进行任何控制。这给我们带来了一个问题，我们无法根据记录ID字段连接两个表，这是一个我们无法解决的问题。因此，我们根据帐户#、供应商 ID（第 2 列和第 3 列）加入。这产生了一个问题，即它创建了（可能）多对多关系。但是，如果记录处于活动状态，我们会对其进行过滤，并且在创建支付时，我们会在支付表上使用第二个过滤器。付款是按月创建的。在我看来，这有两个问题

查询需要相当多的时间才能完成（可能效率低下）
某些重复项已被删除，但不应删除。例如支付表中的记录 6 和 8。这里发生的事情是，我们有一个客户，然后客户取消了，然后他又回来了。在本例中为+50、-50 和+50。同样，所有值都是有效的，并且出于审计目的必须显示在报告中。目前仅显示一个+50，另一个已丢失。报告中偶尔还会出现一些其他问题。

这是查询。它使用分组依据来删除重复项。我想要一个预先查询，它的性能优于其他查询，并且它确实考虑到 PayOut 表中的记录不会重复，只要它们出现在报告月份中。

这是我们当前的查询

/* Supplied  to Store Procedure */
-----------------------------------
@RepID // the person for whome payout is calculated
@Month // of payment date
@year  // year of payment date
-----------------------------------
select distinct 
A.col1, 
A.col2,
...
A.col10, 
B.col2, 
B.Col2, 
B.Amount /* this is the important column, portion of which goes to Rep */
from records A
JOIN payout B 
on A.Supplier = B.Supplier AND A.Account# = B.Account#
where datepart(mm, B.datecreated) = @Month /* parameter to stored procedure */
  and datepart(yyyy, B.datecreated) = @Year
  and A.[rep ID] = @RepID /* parameter to SP */
group by
col1,col2,col3,....col10
order by customerName

这个查询是否最优？我可以使用 CROSS APPLY 或 WHERE EXIST 来改进它，从而使其更快并消除重复问题吗？

请注意，此查询用于获取代表的付款。因此，每条记录都有一个repid字段，它被分配给谁。理想情况下，我想使用 Select WHERE Exist 查询。

原文

Accounts (table)

+----+----------+----------+-------+
| id | account# | supplier | RepID |
+----+----------+----------+-------+
|  1 | 123xyz   | Boston   |     2 |
|  2 | 245xyz   | Chicago  |     2 |
|  3 | 425xyz   | Chicago  |     3 |
+----+----------+----------+-------+

PayOut (table)

+----+----------+----------+-------------+--------+
| id | account# | supplier | datecreated | Amount |
+----+----------+----------+-------------+--------+
|  5 | 245xyz   | Chicago  | 01-15-2009  | 25     |
|  6 | 123xyz   | Boston   | 10-15-2011  | 50     |
|  7 | 123xyz   | Boston   | 10-15-2011  | -50    |
|  8 | 123xyz   | Boston   | 10-15-2011  | 50     |
|  9 | 425xyz   | Chicago  | 10-15-2011  | 100    |
+----+----------+----------+-------------+--------+

I have accounts table and I have payout table. Payout table comes from abroad so we do not have any control over it. This leaves us with a problem that we can't join the two tables based on record ID field, that is one problem which we can't solved. We therefore join based on Account#, SupplierID (2nd and 3rd column). This creates a problem that it creates (possibly) many to many relationship. But we filter our records if they are active and we use a second filter on payout table when the payout was created. Payout are created months to month. There are two problems with this in my view

The query takes quite a bit of time to complete (could be inefficient)
There are certain duplicates that are removed which should not be removed. Example is record 6 and 8 in payout table. What happened here is, we got a customer, then the customer cancelled then he got him back. In this case +50, -50 and +50. Again all values are valid and must show in the report for audit purposes. Currently only one +50 is shown, the other is lost. There are a couple of other problems within the report that comes once in a while.

Here is the query. It uses groups by to remove duplicates. I would like to have an advance query which outperforms and which does takes into account that no record in PayOut table is duplicated as long as they come up in the month of the report.

Here is our current query

/* Supplied  to Store Procedure */
-----------------------------------
@RepID // the person for whome payout is calculated
@Month // of payment date
@year  // year of payment date
-----------------------------------
select distinct 
A.col1, 
A.col2,
...
A.col10, 
B.col2, 
B.Col2, 
B.Amount /* this is the important column, portion of which goes to Rep */
from records A
JOIN payout B 
on A.Supplier = B.Supplier AND A.Account# = B.Account#
where datepart(mm, B.datecreated) = @Month /* parameter to stored procedure */
  and datepart(yyyy, B.datecreated) = @Year
  and A.[rep ID] = @RepID /* parameter to SP */
group by
col1,col2,col3,....col10
order by customerName

Is this query optimum? Can I improve it using CROSS APPLY or WHERE EXISTs that will make it faster as well as remove the duplicate problem?

Note that this query is used to get payout of a rep. Hence every record has repid field who it is assigned to. Ideally I would like to use Select WHERE Exist query.

分享到QQ

分享到微博