SQL 中的日期范围交集分割

发布于 2024-08-04 16:11:55 字数 1007 浏览 2 评论 0原文

我有一个 SQL Server 2005 数据库,其中包含一个名为“成员资格”的表。

表架构是:

PersonID int, Surname nvarchar(30), FirstName nvarchar(30), Description nvarchar(100), StartDate datetime, EndDate datetime

我目前正在开发一个网格功能,该功能显示按人员划分的成员资格细目。要求之一是拆分存在日期范围交集的成员资格行。交集必须受姓氏和名字约束,即分割仅发生在具有相同姓氏和名字的成员记录中。

示例表数据:

18  Smith  John  Poker Club  01/01/2009  NULL
18  Smith  John  Library     05/01/2009  18/01/2009
18  Smith  John  Gym         10/01/2009  28/01/2009
26  Adams  Jane  Pilates     03/01/2009  16/02/2009

预期结果集:

18  Smith  John  Poker Club                  01/01/2009  04/01/2009
18  Smith  John  Poker Club / Library        05/01/2009  09/01/2009
18  Smith  John  Poker Club / Library / Gym  10/01/2009  18/01/2009
18  Smith  John  Poker Club / Gym            19/01/2009  28/01/2009
18  Smith  John  Poker Club                  29/01/2009  NULL
26  Adams  Jane  Pilates                     03/01/2009  16/02/2009

有谁知道我如何编写一个存储过程来返回具有上述细分的结果集。

I have a SQL Server 2005 database which contains a table called Memberships.

The table schema is:

PersonID int, Surname nvarchar(30), FirstName nvarchar(30), Description nvarchar(100), StartDate datetime, EndDate datetime

I'm currently working on a grid feature which shows a break-down of memberships by person. One of the requirements is to split membership rows where there is an intersection of date ranges. The intersection must be bound by the Surname and FirstName, ie splits only occur with membership records of the same Surname and FirstName.

Example table data:

18  Smith  John  Poker Club  01/01/2009  NULL
18  Smith  John  Library     05/01/2009  18/01/2009
18  Smith  John  Gym         10/01/2009  28/01/2009
26  Adams  Jane  Pilates     03/01/2009  16/02/2009

Expected result set:

18  Smith  John  Poker Club                  01/01/2009  04/01/2009
18  Smith  John  Poker Club / Library        05/01/2009  09/01/2009
18  Smith  John  Poker Club / Library / Gym  10/01/2009  18/01/2009
18  Smith  John  Poker Club / Gym            19/01/2009  28/01/2009
18  Smith  John  Poker Club                  29/01/2009  NULL
26  Adams  Jane  Pilates                     03/01/2009  16/02/2009

Does anyone have any idea how I could write a stored procedure that will return a result set which has the break-down described above.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

我不是你的备胎 2024-08-11 16:11:55

对于这个问题,您将遇到的问题是,随着数据集的增长,使用 TSQL 解决它的解决方案将无法很好地扩展。下面使用一系列动态构建的临时表来解决该问题。它使用数字表将每个日期范围条目拆分为各自的日期。这是它无法扩展的地方,主要是由于开放范围的 NULL 值看起来无穷大,因此您必须将固定日期交换到遥远的未来,从而将转换范围限制为可行的时间长度。通过构建具有适当索引的日期表或日历表来优化每天的渲染,您可能会看到更好的性能。

拆分范围后,将使用 XML PATH 合并描述,以便范围系列中的每一天都包含为其列出的所有描述。按 PersonID 和日期进行行编号允许使用两个 NOT EXISTS 检查来查找每个范围的第一行和最后一行,以查找匹配的 PersonID 和描述集的前一行不存在的实例,或者下一行不存在的实例不存在匹配的 PersonID 和 Description 集。

然后使用 ROW_NUMBER 对该结果集重新编号,以便将它们配对以构建最终结果。

/*
SET DATEFORMAT dmy
USE tempdb;
GO
CREATE TABLE Schedule
( PersonID int, 
 Surname nvarchar(30), 
 FirstName nvarchar(30), 
 Description nvarchar(100), 
 StartDate datetime, 
 EndDate datetime)
GO
INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Poker Club', '01/01/2009', NULL)
INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Library', '05/01/2009', '18/01/2009')
INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Gym', '10/01/2009', '28/01/2009')
INSERT INTO Schedule VALUES (26, 'Adams', 'Jane', 'Pilates', '03/01/2009', '16/02/2009')
GO

*/

SELECT 
 PersonID, 
 Description, 
 theDate
INTO #SplitRanges
FROM Schedule, (SELECT DATEADD(dd, number, '01/01/2008') AS theDate
    FROM master..spt_values
    WHERE type = N'P') AS DayTab
WHERE theDate >= StartDate 
  AND theDate <= isnull(EndDate, '31/12/2012')

SELECT 
 ROW_NUMBER() OVER (ORDER BY PersonID, theDate) AS rowid,
 PersonID, 
 theDate, 
 STUFF((
  SELECT '/' + Description
  FROM #SplitRanges AS s
  WHERE s.PersonID = sr.PersonID 
    AND s.theDate = sr.theDate
  FOR XML PATH('')
  ), 1, 1,'') AS Descriptions
INTO #MergedDescriptions
FROM #SplitRanges AS sr
GROUP BY PersonID, theDate


SELECT 
 ROW_NUMBER() OVER (ORDER BY PersonID, theDate) AS ID, 
 *
INTO #InterimResults
FROM
(
 SELECT * 
 FROM #MergedDescriptions AS t1
 WHERE NOT EXISTS 
  (SELECT 1 
   FROM #MergedDescriptions AS t2 
   WHERE t1.PersonID = t2.PersonID 
     AND t1.RowID - 1 = t2.RowID 
     AND t1.Descriptions = t2.Descriptions)
UNION ALL
 SELECT * 
 FROM #MergedDescriptions AS t1
 WHERE NOT EXISTS 
  (SELECT 1 
   FROM #MergedDescriptions AS t2 
   WHERE t1.PersonID = t2.PersonID 
     AND t1.RowID = t2.RowID - 1
     AND t1.Descriptions = t2.Descriptions)
) AS t

SELECT DISTINCT 
 PersonID, 
 Surname, 
 FirstName
INTO #DistinctPerson
FROM Schedule

SELECT 
 t1.PersonID, 
 dp.Surname, 
 dp.FirstName, 
 t1.Descriptions, 
 t1.theDate AS StartDate, 
 CASE 
  WHEN t2.theDate = '31/12/2012' THEN NULL 
  ELSE t2.theDate 
 END AS EndDate
FROM #DistinctPerson AS dp
JOIN #InterimResults AS t1 
 ON t1.PersonID = dp.PersonID
JOIN #InterimResults AS t2 
 ON t2.PersonID = t1.PersonID 
  AND t1.ID + 1 = t2.ID 
  AND t1.Descriptions = t2.Descriptions

DROP TABLE #SplitRanges
DROP TABLE #MergedDescriptions
DROP TABLE #DistinctPerson
DROP TABLE #InterimResults

/*

DROP TABLE Schedule

*/

上述解决方案还将处理其他描述之间的间隙,因此,如果您要为 PersonID 18 添加另一个描述并留下间隙:

INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Gym', '10/02/2009', '28/02/2009')

它将适当地填充间隙。正如评论中所指出的,您不应该在此表中包含姓名信息,它应该标准化为可以在最终结果中联接的人员表。我通过使用 SELECT DISTINCT 构建临时表来创建该 JOIN 来模拟另一个表。

The problem you are going to have with this problem is that as the data set grows, the solutions to solve it with TSQL won't scale well. The below uses a series of temporary tables built on the fly to solve the problem. It splits each date range entry into its respective days using a numbers table. This is where it won't scale, primarily due to your open ranged NULL values which appear to be inifinity, so you have to swap in a fixed date far into the future that limits the range of conversion to a feasible length of time. You could likely see better performance by building a table of days or a calendar table with appropriate indexing for optimized rendering of each day.

Once the ranges are split, the descriptions are merged using XML PATH so that each day in the range series has all of the descriptions listed for it. Row Numbering by PersonID and Date allows for the first and last row of each range to be found using two NOT EXISTS checks to find instances where a previous row doesn't exist for a matching PersonID and Description set, or where the next row doesn't exist for a matching PersonID and Description set.

This result set is then renumbered using ROW_NUMBER so that they can be paired up to build the final results.

/*
SET DATEFORMAT dmy
USE tempdb;
GO
CREATE TABLE Schedule
( PersonID int, 
 Surname nvarchar(30), 
 FirstName nvarchar(30), 
 Description nvarchar(100), 
 StartDate datetime, 
 EndDate datetime)
GO
INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Poker Club', '01/01/2009', NULL)
INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Library', '05/01/2009', '18/01/2009')
INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Gym', '10/01/2009', '28/01/2009')
INSERT INTO Schedule VALUES (26, 'Adams', 'Jane', 'Pilates', '03/01/2009', '16/02/2009')
GO

*/

SELECT 
 PersonID, 
 Description, 
 theDate
INTO #SplitRanges
FROM Schedule, (SELECT DATEADD(dd, number, '01/01/2008') AS theDate
    FROM master..spt_values
    WHERE type = N'P') AS DayTab
WHERE theDate >= StartDate 
  AND theDate <= isnull(EndDate, '31/12/2012')

SELECT 
 ROW_NUMBER() OVER (ORDER BY PersonID, theDate) AS rowid,
 PersonID, 
 theDate, 
 STUFF((
  SELECT '/' + Description
  FROM #SplitRanges AS s
  WHERE s.PersonID = sr.PersonID 
    AND s.theDate = sr.theDate
  FOR XML PATH('')
  ), 1, 1,'') AS Descriptions
INTO #MergedDescriptions
FROM #SplitRanges AS sr
GROUP BY PersonID, theDate


SELECT 
 ROW_NUMBER() OVER (ORDER BY PersonID, theDate) AS ID, 
 *
INTO #InterimResults
FROM
(
 SELECT * 
 FROM #MergedDescriptions AS t1
 WHERE NOT EXISTS 
  (SELECT 1 
   FROM #MergedDescriptions AS t2 
   WHERE t1.PersonID = t2.PersonID 
     AND t1.RowID - 1 = t2.RowID 
     AND t1.Descriptions = t2.Descriptions)
UNION ALL
 SELECT * 
 FROM #MergedDescriptions AS t1
 WHERE NOT EXISTS 
  (SELECT 1 
   FROM #MergedDescriptions AS t2 
   WHERE t1.PersonID = t2.PersonID 
     AND t1.RowID = t2.RowID - 1
     AND t1.Descriptions = t2.Descriptions)
) AS t

SELECT DISTINCT 
 PersonID, 
 Surname, 
 FirstName
INTO #DistinctPerson
FROM Schedule

SELECT 
 t1.PersonID, 
 dp.Surname, 
 dp.FirstName, 
 t1.Descriptions, 
 t1.theDate AS StartDate, 
 CASE 
  WHEN t2.theDate = '31/12/2012' THEN NULL 
  ELSE t2.theDate 
 END AS EndDate
FROM #DistinctPerson AS dp
JOIN #InterimResults AS t1 
 ON t1.PersonID = dp.PersonID
JOIN #InterimResults AS t2 
 ON t2.PersonID = t1.PersonID 
  AND t1.ID + 1 = t2.ID 
  AND t1.Descriptions = t2.Descriptions

DROP TABLE #SplitRanges
DROP TABLE #MergedDescriptions
DROP TABLE #DistinctPerson
DROP TABLE #InterimResults

/*

DROP TABLE Schedule

*/

The above solution will also handle gaps between additional Descriptions as well, so if you were to add another Description for PersonID 18 leaving a gap:

INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Gym', '10/02/2009', '28/02/2009')

It will fill the gap appropriately. As pointed out in the comments, you shouldn't have name information in this table, it should be normalized out to a Persons Table that can be JOIN'd to in the final result. I simulated this other table by using a SELECT DISTINCT to build a temp table to create that JOIN.

枯叶蝶 2024-08-11 16:11:55

试试这个

SET DATEFORMAT dmy
DECLARE @Membership TABLE( 
    PersonID    int, 
    Surname     nvarchar(16), 
    FirstName   nvarchar(16), 
    Description nvarchar(16), 
    StartDate   datetime, 
    EndDate     datetime)   
INSERT INTO @Membership VALUES (18, 'Smith', 'John', 'Poker Club', '01/01/2009', NULL)
INSERT INTO @Membership VALUES (18, 'Smith', 'John','Library', '05/01/2009', '18/01/2009')
INSERT INTO @Membership VALUES (18, 'Smith', 'John','Gym', '10/01/2009', '28/01/2009')
INSERT INTO @Membership VALUES (26, 'Adams', 'Jane','Pilates', '03/01/2009', '16/02/2009')

--Program Starts
declare @enddate datetime
--Measuring extreme condition when all the enddates are null(i.e. all the memberships for all members are in progress)
-- in such a case taking any arbitary date e.g. '31/12/2009' here else add 1 more day to the highest enddate
select @enddate =  case when max(enddate) is null then '31/12/2009' else max(enddate) + 1 end from @Membership

--Fill the null enddates
; with fillNullEndDates_cte as
(
    select
            row_number() over(partition by PersonId order by PersonId) RowNum
            ,PersonId
            ,Surname
            ,FirstName
            ,Description
            ,StartDate
            ,isnull(EndDate,@enddate) EndDate
    from @Membership
)
--Generate a date calender
, generateCalender_cte as
(
    select 
        1 as CalenderRows
        ,min(startdate) DateValue
    from @Membership
       union all
        select 
            CalenderRows+1
            ,DateValue + 1
        from    generateCalender_cte   
        where   DateValue + 1 <= @enddate
)
--Generate Missing Dates based on Membership
,datesBasedOnMemberships_cte as
 (
    select 
            t.RowNum
            ,t.PersonId
            ,t.Surname
            ,t.FirstName
            ,t.Description          
            , d.DateValue
            ,d.CalenderRows
    from generateCalender_cte d 
    join fillNullEndDates_cte t ON d.DateValue between t.startdate and t.enddate
)
--Generate Dscription Based On Membership Dates
, descriptionBasedOnMembershipDates_cte as
(
    select    
        PersonID
        ,Surname
        ,FirstName
        ,stuff((
            select '/' + Description
            from datesBasedOnMemberships_cte d1
            where d1.PersonID = d2.PersonID 
            and d1.DateValue = d2.DateValue
            for xml path('')
        ), 1, 1,'') as Description
        , DateValue
        ,CalenderRows
    from datesBasedOnMemberships_cte d2
    group by PersonID, Surname,FirstName,DateValue,CalenderRows
)
--Grouping based on membership dates
,groupByMembershipDates_cte as
(
    select d.*,
    CalenderRows - row_number() over(partition by Description order by PersonID, DateValue) AS  [Group]
    from descriptionBasedOnMembershipDates_cte d
)
select PersonId
,Surname
,FirstName
,Description
,convert(varchar(10), convert(datetime, min(DateValue)), 103) as StartDate
,case when max(DateValue)= @enddate then null else convert(varchar(10), convert(datetime, max(DateValue)), 103) end as EndDate
from groupByMembershipDates_cte 
group by [Group],PersonId,Surname,FirstName,Description
order by PersonId,StartDate
option(maxrecursion 0)

Try this

SET DATEFORMAT dmy
DECLARE @Membership TABLE( 
    PersonID    int, 
    Surname     nvarchar(16), 
    FirstName   nvarchar(16), 
    Description nvarchar(16), 
    StartDate   datetime, 
    EndDate     datetime)   
INSERT INTO @Membership VALUES (18, 'Smith', 'John', 'Poker Club', '01/01/2009', NULL)
INSERT INTO @Membership VALUES (18, 'Smith', 'John','Library', '05/01/2009', '18/01/2009')
INSERT INTO @Membership VALUES (18, 'Smith', 'John','Gym', '10/01/2009', '28/01/2009')
INSERT INTO @Membership VALUES (26, 'Adams', 'Jane','Pilates', '03/01/2009', '16/02/2009')

--Program Starts
declare @enddate datetime
--Measuring extreme condition when all the enddates are null(i.e. all the memberships for all members are in progress)
-- in such a case taking any arbitary date e.g. '31/12/2009' here else add 1 more day to the highest enddate
select @enddate =  case when max(enddate) is null then '31/12/2009' else max(enddate) + 1 end from @Membership

--Fill the null enddates
; with fillNullEndDates_cte as
(
    select
            row_number() over(partition by PersonId order by PersonId) RowNum
            ,PersonId
            ,Surname
            ,FirstName
            ,Description
            ,StartDate
            ,isnull(EndDate,@enddate) EndDate
    from @Membership
)
--Generate a date calender
, generateCalender_cte as
(
    select 
        1 as CalenderRows
        ,min(startdate) DateValue
    from @Membership
       union all
        select 
            CalenderRows+1
            ,DateValue + 1
        from    generateCalender_cte   
        where   DateValue + 1 <= @enddate
)
--Generate Missing Dates based on Membership
,datesBasedOnMemberships_cte as
 (
    select 
            t.RowNum
            ,t.PersonId
            ,t.Surname
            ,t.FirstName
            ,t.Description          
            , d.DateValue
            ,d.CalenderRows
    from generateCalender_cte d 
    join fillNullEndDates_cte t ON d.DateValue between t.startdate and t.enddate
)
--Generate Dscription Based On Membership Dates
, descriptionBasedOnMembershipDates_cte as
(
    select    
        PersonID
        ,Surname
        ,FirstName
        ,stuff((
            select '/' + Description
            from datesBasedOnMemberships_cte d1
            where d1.PersonID = d2.PersonID 
            and d1.DateValue = d2.DateValue
            for xml path('')
        ), 1, 1,'') as Description
        , DateValue
        ,CalenderRows
    from datesBasedOnMemberships_cte d2
    group by PersonID, Surname,FirstName,DateValue,CalenderRows
)
--Grouping based on membership dates
,groupByMembershipDates_cte as
(
    select d.*,
    CalenderRows - row_number() over(partition by Description order by PersonID, DateValue) AS  [Group]
    from descriptionBasedOnMembershipDates_cte d
)
select PersonId
,Surname
,FirstName
,Description
,convert(varchar(10), convert(datetime, min(DateValue)), 103) as StartDate
,case when max(DateValue)= @enddate then null else convert(varchar(10), convert(datetime, max(DateValue)), 103) end as EndDate
from groupByMembershipDates_cte 
group by [Group],PersonId,Surname,FirstName,Description
order by PersonId,StartDate
option(maxrecursion 0)
尝蛊 2024-08-11 16:11:55

[只有很多很多年之后。]

我创建了一个存储过程,它将按单个表中的分区对齐和打断段,然后您可以使用这些对齐的打断点,使用子查询和 XML PATH 将描述旋转到不规则的列中。

查看以下内容是否有帮助:

  1. 文档:https://github.com/Quebe/SQL-Algorithms/blob/master/Temporal/Date%20Segment%20Manipulation/DateSegments_AlignWithinTable.md

  2. 存储过程:https://github.com/Quebe/SQL-Algorithms/blob/master/Temporal/Date%20Segment%20Manipulation/DateSegments_AlignWithinTable.sql

例如,您的调用可能如下所示:

EXEC dbo.DateSegments_AlignWithinTable
@tableName = 'tableName',
@keyFieldList = 'PersonID',
@nonKeyFieldList = 'Description',
@effectivveDateFieldName = 'StartDate',
@terminationDateFieldName = 'EndDate'

您将希望捕获结果(这是一个表)到另一个表或临时表(假设在下面的示例中称为“AlignedDataTable”)。然后,您可以使用子查询进行透视。

SELECT 
    PersonID, StartDate, EndDate,

    SUBSTRING ((SELECT ',' + [Description] FROM AlignedDataTable AS innerTable 
        WHERE 
            innerTable.PersonID = AlignedDataTable.PersonID
            AND (innerTable.StartDate = AlignedDataTable.StartDate) 
            AND (innerTable.EndDate = AlignedDataTable.EndDate)
        ORDER BY id
        FOR XML PATH ('')), 2, 999999999999999) AS IdList

 FROM AlignedDataTable
 GROUP BY PersonID, StartDate, EndDate 
 ORDER BY PersonID, StartDate

[Only many, many years later.]

I created a stored procedure that will align and break segments by a partition within a single table, and then you can use those aligned breaks to pivot the description into a ragged column using a subquery and XML PATH.

See if the below help:

  1. Documentation: https://github.com/Quebe/SQL-Algorithms/blob/master/Temporal/Date%20Segment%20Manipulation/DateSegments_AlignWithinTable.md

  2. Stored Procedure: https://github.com/Quebe/SQL-Algorithms/blob/master/Temporal/Date%20Segment%20Manipulation/DateSegments_AlignWithinTable.sql

For example, your call might look like:

EXEC dbo.DateSegments_AlignWithinTable
@tableName = 'tableName',
@keyFieldList = 'PersonID',
@nonKeyFieldList = 'Description',
@effectivveDateFieldName = 'StartDate',
@terminationDateFieldName = 'EndDate'

You will want to capture the result (which is a table) into another table or temporary table (assuming it is called "AlignedDataTable" in below example). Then, you can pivot using a subquery.

SELECT 
    PersonID, StartDate, EndDate,

    SUBSTRING ((SELECT ',' + [Description] FROM AlignedDataTable AS innerTable 
        WHERE 
            innerTable.PersonID = AlignedDataTable.PersonID
            AND (innerTable.StartDate = AlignedDataTable.StartDate) 
            AND (innerTable.EndDate = AlignedDataTable.EndDate)
        ORDER BY id
        FOR XML PATH ('')), 2, 999999999999999) AS IdList

 FROM AlignedDataTable
 GROUP BY PersonID, StartDate, EndDate 
 ORDER BY PersonID, StartDate
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文