从结果集中获取开始日期和结束日期的 SQL 查询
我正在处理一项要求,原始数据采用以下格式
要求 - 开始日期应该是状态更改为 1 时的日期,结束日期应该是记录状态从 1 更改为任何其他状态后的第一个日期数字。
客户 | 状态 | 日期 |
---|---|---|
A123 | 0 | 7/2/2021 |
A123 | 0 | 7/15/2021 |
A123 | 0 | 7/22/2021 |
A123 | 1 | 8/18/2021 |
A123 | 1 | 9/8/2021 |
A123 | 0 | 12/1/2021 |
A123 | 0 | 1/21 /2022 |
A123 | 1 | 3/6/2022 |
A123 | 1 | 3/7/2022 |
A123 | 0 | 3/15/2022 |
B123 | 1 | 1/1/2022 |
B123 | 0 | 1/6/2022 |
C123 | 1 | 1/2/2022 |
C123 | 2 | 1/8/2022 |
C123 | 0 | 2022 年 1 月 9 日 |
预期输出
客户 | StartDate | EndDate |
---|---|---|
A123 | 8/18/2021 | 12/1/2021 |
A123 | 9/8/2021 | 12/1/2021 |
A123 | 3/6/2022 | 3/15/2022 |
A123 | 3/7/ 2022 | 2022 年 3 月 15 日 |
B123 | 1/1/2022 | 1/6/2022 |
C123 | 1/2/2022 | 1/8/2022 |
我试图获取的输出如下,我得到了客户 B123 和 C123 的输出,但没有得到预期的 A123 的输出。
查询说明 - 在查询的第一部分中,我将获取状态 = 1 的所有记录,在下一部分中仅获取状态不等于 1 的记录,并根据客户和客户连接这两个数据集生成的行号。
SELECT A.[Customer],A.StartDate,B.EndDate
from
(
SELECT [Customer],MIN(Date) AS STARTDATE,[Status],RANK() OVER (PARTITION BY [STATUS] ORDER BY Date ASC) AS ROWNUM
FROM table1
WHERE [STATUS] = 1
GROUP BY Customer,Date,[Status]
) A
LEFT JOIN
(
SELECT [Customer],MIN(Date) AS ENDDATE,[Status],RANK() OVER (PARTITION BY [STATUS] ORDER BY Date ASC) AS ROWNUM
FROM table1
WHERE [STATUS] != 1
AND Date>(
SELECT MIN(Date) AS STARTDATE
FROM table1
WHERE [STATUS] = 1
)
GROUP BY Customer,Date,[Status]
) B
ON
(
A.[Customer] = B.[Customer]
AND A.RowNum = B.RowNum
)
ORDER BY A.Startdate
I am working on one of requirement the raw data is in following format
Requirement - Startdate should be the date when status changed to 1 and enddate should be the 1st date after the record status changed from 1 to any other number.
Customer | Status | Date |
---|---|---|
A123 | 0 | 7/2/2021 |
A123 | 0 | 7/15/2021 |
A123 | 0 | 7/22/2021 |
A123 | 1 | 8/18/2021 |
A123 | 1 | 9/8/2021 |
A123 | 0 | 12/1/2021 |
A123 | 0 | 1/21/2022 |
A123 | 1 | 3/6/2022 |
A123 | 1 | 3/7/2022 |
A123 | 0 | 3/15/2022 |
B123 | 1 | 1/1/2022 |
B123 | 0 | 1/6/2022 |
C123 | 1 | 1/2/2022 |
C123 | 2 | 1/8/2022 |
C123 | 0 | 1/9/2022 |
expected output
Customer | StartDate | EndDate |
---|---|---|
A123 | 8/18/2021 | 12/1/2021 |
A123 | 9/8/2021 | 12/1/2021 |
A123 | 3/6/2022 | 3/15/2022 |
A123 | 3/7/2022 | 3/15/2022 |
B123 | 1/1/2022 | 1/6/2022 |
C123 | 1/2/2022 | 1/8/2022 |
Query I tried to get the output is below, I am getting the output for Customer B123 and C123, but not for A123 as expected.
Query Explanation - In 1st part of query I am taking all the records with status = 1 and in next part taking only those records where status is not equal to 1, and joining these 2 datasets based on Customer and row number generated.
SELECT A.[Customer],A.StartDate,B.EndDate
from
(
SELECT [Customer],MIN(Date) AS STARTDATE,[Status],RANK() OVER (PARTITION BY [STATUS] ORDER BY Date ASC) AS ROWNUM
FROM table1
WHERE [STATUS] = 1
GROUP BY Customer,Date,[Status]
) A
LEFT JOIN
(
SELECT [Customer],MIN(Date) AS ENDDATE,[Status],RANK() OVER (PARTITION BY [STATUS] ORDER BY Date ASC) AS ROWNUM
FROM table1
WHERE [STATUS] != 1
AND Date>(
SELECT MIN(Date) AS STARTDATE
FROM table1
WHERE [STATUS] = 1
)
GROUP BY Customer,Date,[Status]
) B
ON
(
A.[Customer] = B.[Customer]
AND A.RowNum = B.RowNum
)
ORDER BY A.Startdate
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
首先,您列出
status = 1
的行,然后使用交叉应用
获取状态不等于1的相应最小日期First you list the rows where
Status = 1
and then useCROSS APPLY
to get the corresponding minimum Date where the Status is not equal to 1这是一种无需自连接即可更有效地执行此操作的方法。
这里的关键技巧是第二个 CTE,它使用两个
ROW_NUMBER()
函数的差异,通过连续运行将客户记录(使用StatusGroup
列)标记到单独的分区中状态为 1 或非 1 的记录。之后可以根据该标签对它们进行分组以获取开始日期,然后使用LEAD()
函数获取后面组的StartDate
作为当前分组EndDate
。(可能有更紧凑的方式来表达这一点,但我喜欢将每个阶段布局为单独的 CTE。)
Here is a more efficient way to do this without a self-join.
The key trick here is the second CTE which uses the difference of two
ROW_NUMBER()
functions to tag the customer records (with theStatusGroup
column) into separate partitions by contiguous runs of records whose status is 1 or not 1. After that they can be grouped according to that tag to get the start dates, and then use theLEAD()
function to get the following group'sStartDate
as the current groupingsEndDate
.(There may be a more compact way to express this, but I like to layout each stage as a separate CTE.)