在存在许多行的 JOIN 中仅匹配一个特定行
(优势数据库服务器)我有一个服务提供商表,出于审计目的,该表永远不会被删除。他们有开始日期和结束日期;如果发生名称或地址等更改,则会在现有行中添加结束日期,创建新行,并为更改的数据分配新的开始日期。
在处理向这些提供商付款的过程中,我需要一个摘要页面,其中列出提供商名称、地址、标识符 (ProvID) 和支付总额。这是通过使用 SUM() 和 GROUP BY 的相当简单的查询来完成的。
当指定的提供程序标识符有两行或更多行时,就会出现此问题。我最终得到了重复的行(如果不被发现,可能会导致向该提供商多次付款)。
我的第一个想法是使用像子选择这样的东西(丑陋,但执行速度相当快):
SELECT ... FROM service s
INNER JOIN provider p ON p.ProvID = s.ProvID
AND (p.EndDate IS NULL or p.EndDate = (SELECT Max(EndDate) FROM
provider lu WHERE lu.ProvID = s.ProvID))
不幸的是,这最终仍然找到了两行;一行表示 NULL EndDate,一行表示 MAX(EndDate)。
我在其他情况下处理此问题(例如,为在特定日期提供的服务查找正确的 ProvID),使用
p.EndDate is null or (s.ServiceDate BETWEEN p.StartDate AND p.EndDate)
不幸的是,由于问题查询是带有聚合的 GROUP BY,因此服务日期不可用。
有什么建议吗?
编辑:我正在寻找的是具有 NULL EndDate 的行(如果存在),或者是具有 Max(EndDate) 的行(如果 NULL 行不存在)。例如,这涵盖了供应商昨天被终止但上周确实工作的情况,我们将在下周向他们付款。
(Advantage Database Server) I have a table of service providers that, for auditing purposes, are never deleted. They have a start date and end date; in the case of changes like name or address, the existing row is end dated, a new row is created, and a new start date is assigned for the changed data.
During processing of payments to those providers, I need a summary page that lists the provider name, address, identifier (ProvID), and total amount being paid. This is done in a fairly straightforward query with a SUM() and GROUP BY.
The problem appears when there are two or more rows for a specified provider identifier. I end up with duplicate rows (which could result in multiple payments to that provider if not caught).
My first thought was to use something (ugly, but performs reasonably quickly) like a subselect:
SELECT ... FROM service s
INNER JOIN provider p ON p.ProvID = s.ProvID
AND (p.EndDate IS NULL or p.EndDate = (SELECT Max(EndDate) FROM
provider lu WHERE lu.ProvID = s.ProvID))
Unfortunately, this still ended up finding two rows; one row for the NULL EndDate and one for the MAX(EndDate).
I handle this in other cases (eg., locating the proper ProvID for a service provided on a specific date) using
p.EndDate is null or (s.ServiceDate BETWEEN p.StartDate AND p.EndDate)
Unfortunately, since the problem query is a GROUP BY with an aggregate, the service date isn't available.
Any suggestions?
EDIT: What I'm looking for is either the row with the NULL EndDate if it exists, OR the row with the Max(EndDate) if the NULL row doesn't exist. This covers the case, for instance, where a supplier was terminated yesterday, but did work last week, and we'll be paying them next week.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
所以我想如果有一行具有 NULL 结束日期,您想要该行,否则您想要具有最大结束日期的行?
我不确定 ADS,但以下内容适用于 SQL Server:
COALESCE 运算符返回第一个非空参数,因此这基本上只是将空值设置为遥远的未来时间,以便 SELECT MAX 将为您提供带有 NULL 结束日期的参数(如果有)。
So I guess if there is a row with NULL end date, you want that one, otherwise you want the one with the largest end date?
I'm not sure about ADS, but the following would work on SQL Server:
The COALESCE operator returns the first non-null parameter, so this is basically just setting the nulls to a time far in the future, so that SELECT MAX will give you the one with the NULL end date if there is one.
在第二个条件中,只有在没有 NULL EndDate 时才必须获取最大值
in the 2nd condition, you have to get the max only if there is no NULL EndDate
也许使用子查询代替第二个表:
这是假设如果没有最大结束日期,您将得到 NULL。
Perhaps use a subquery in place of the second table:
This is assuming you'd get NULL back if there is no max enddate.
您指的是数据仓库的 Type-2 维度。
您必须通过 ID 和通过 StartDate 和 EndDate 加入才能获取正确的数据。
奥托姆代码
What you are referring to is a Type-2 Dimension of a Data Warehouse.
You have to join by the ID and by the StartDate and EndDate to get the proper data.
OTTOMH code
您的提供商表中的什么表示当前日期? EndDate=NULL、EndDate=Max(EndDate) 或 EndDate='9999-01-01'?这三个都是有效的选择,但这应该是明确的,因为如果不是这样,无论您如何巧妙地设计这个特定的查询,您都会在查询中始终得到重复的行。所以我建议在提供者表中修复这个问题,然后类似这样的事情应该可以工作:
What in your provider table denotes the current date? EndDate=NULL, EndDate=Max(EndDate) or EndDate='9999-01-01'? All three are valid choices, but this should really be unambiguous, since if it's not you're going to end up with duplicate rows in queries all the time, no matter how cleverly you craft this particular query. So i suggest fixing that in the provider table, and then something like this should work: