SQL 确定多个日期范围 (SQL Server 2000)

发布于 2024-11-05 00:52:35 字数 825 浏览 3 评论 0原文

我有一个表,其中包含事件的 IDDate。每一行代表一个日期。我正在尝试确定连续的日期范围并合并输出以显示 ID,StartDate,EndDate

ID      Date
200236  2011-01-02 00:00:00.000
200236  2011-01-03 00:00:00.000
200236  2011-01-05 00:00:00.000
200236  2011-01-06 00:00:00.000
200236  2011-01-07 00:00:00.000
200236  2011-01-08 00:00:00.000
200236  2011-01-09 00:00:00.000
200236  2011-01-10 00:00:00.000
200236  2011-01-11 00:00:00.000
200236  2011-01-12 00:00:00.000
200236  2011-01-13 00:00:00.000
200236  2011-01-15 00:00:00.000
200236  2011-01-16 00:00:00.000
200236  2011-01-17 00:00:00.000

输出将如下所示:

ID       StartDate    EndDate
200236   2011-01-02   2011-01-03
200236   2011-01-05   2011-01-13
200236   2011-01-15   2011-01-17

关于如何在 SQL Server 2000 中处理此问题的任何想法?

I have a table which contains an ID and a Date for an event. Each row is for one date. I am trying to determine consecutive date ranges and consolidate output to show the ID,StartDate,EndDate

ID      Date
200236  2011-01-02 00:00:00.000
200236  2011-01-03 00:00:00.000
200236  2011-01-05 00:00:00.000
200236  2011-01-06 00:00:00.000
200236  2011-01-07 00:00:00.000
200236  2011-01-08 00:00:00.000
200236  2011-01-09 00:00:00.000
200236  2011-01-10 00:00:00.000
200236  2011-01-11 00:00:00.000
200236  2011-01-12 00:00:00.000
200236  2011-01-13 00:00:00.000
200236  2011-01-15 00:00:00.000
200236  2011-01-16 00:00:00.000
200236  2011-01-17 00:00:00.000

Output would look like:

ID       StartDate    EndDate
200236   2011-01-02   2011-01-03
200236   2011-01-05   2011-01-13
200236   2011-01-15   2011-01-17

Any thoughts on how to handle this in SQL Server 2000?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

殊姿 2024-11-12 00:52:35
SELECT ...
FROM   ...
WHERE  date_column BETWEEN '2011-01-02' AND '2011-01-15'

也许? 参考

或者您可以进行子查询并链接下一条记录使用 MAX where date is <= current date:

SELECT id, date, (SELECT MAX(date) FROM mytable WHERE date <= mytable.date) AS nextDate
FROM   mytable

或使用:

SELECT TOP 1 date
FROM         mytable
WHERE        date <= mytable.date AND id <> mytable.id
ORDER BY     date

作为子查询,以便获取当前记录之后的下一个日期。

SELECT ...
FROM   ...
WHERE  date_column BETWEEN '2011-01-02' AND '2011-01-15'

perhaps? Reference

Or you can do a sub-query and link the next record using a MAX where date is <= current date:

SELECT id, date, (SELECT MAX(date) FROM mytable WHERE date <= mytable.date) AS nextDate
FROM   mytable

Or use:

SELECT TOP 1 date
FROM         mytable
WHERE        date <= mytable.date AND id <> mytable.id
ORDER BY     date

As the sub-query so it grabs the next date in line after the current record.

×眷恋的温暖 2024-11-12 00:52:35

我刚刚在 SQL Server 2008 中完成了类似的操作。我认为以下翻译适用于 SQL Server 2000:

-- Create table variable
DECLARE @StartTable TABLE
(
  rowid INT IDENTITY(1,1) NOT NULL,
  userid int,
  startDate date
)

Insert Into @StartTable(userid, startDate)
--This finds the start dates by finding unmatched values
SELECT t1.ID, t1.[Date]
FROM Example As t1
LEFT OUTER JOIN Example As t2 ON t1.ID=t2.ID 
   And DateAdd(day, 1, t2.[Date]) = t1.[Date]
WHERE t2.[Date] Is NULL
ORDER BY t1.ID, t1.[Date]

-- Create table variable
DECLARE @EndTable TABLE
(
  rowid INT IDENTITY(1,1) NOT NULL,
  userid int,
  endDate date
)

Insert Into @EndTable(userid, endDate)
--This finds the end dates by getting unmatched values 
SELECT t1.ID, t1.[Date]
FROM Example As t1
LEFT OUTER JOIN Example As t2 ON t1.ID=t2.ID
   And DateAdd(day, -1, t2.[Date]) = t1.[Date]
WHERE t2.[Date] IS NULL
ORDER BY t1.ID, t1.[Date]

Select eT.userid, startDate, endDate 
From @EndTable eT
INNER JOIN @StartTable sT On eT.userid = sT.userid 
AND eT.rowid = sT.rowid;

因此,如您所见,我通过自连接创建了两个表变量,一个用于开始,一个用于结束表中的日期可以是 [日期] 列中的日期之前或之后的日期。这意味着我只为开始表选择没有先前日期的记录(因此这些记录将位于一个周期的开头)以及那些没有后续日期的记录(因此这些记录将位于一个周期的末尾)期)为结束表。

当它们插入表变量时,由于 Identity 列,它们会按顺序编号。然后我将两个表变量连接在一起。由于它们是有序的,因此开始日期和结束日期应始终正确匹配。

这个解决方案对我来说很有效,因为我每天每个 ID 最多有一条记录,而且我只对天感兴趣,而不是小时等。尽管它有几个步骤,但我喜欢它,因为它在概念上很简单,并且消除了匹配的记录,而无需游标或循环。我希望它也对你有用。

I've just done this similar thing in SQL Server 2008. I think the following translation will work for SQL Server 2000:

-- Create table variable
DECLARE @StartTable TABLE
(
  rowid INT IDENTITY(1,1) NOT NULL,
  userid int,
  startDate date
)

Insert Into @StartTable(userid, startDate)
--This finds the start dates by finding unmatched values
SELECT t1.ID, t1.[Date]
FROM Example As t1
LEFT OUTER JOIN Example As t2 ON t1.ID=t2.ID 
   And DateAdd(day, 1, t2.[Date]) = t1.[Date]
WHERE t2.[Date] Is NULL
ORDER BY t1.ID, t1.[Date]

-- Create table variable
DECLARE @EndTable TABLE
(
  rowid INT IDENTITY(1,1) NOT NULL,
  userid int,
  endDate date
)

Insert Into @EndTable(userid, endDate)
--This finds the end dates by getting unmatched values 
SELECT t1.ID, t1.[Date]
FROM Example As t1
LEFT OUTER JOIN Example As t2 ON t1.ID=t2.ID
   And DateAdd(day, -1, t2.[Date]) = t1.[Date]
WHERE t2.[Date] IS NULL
ORDER BY t1.ID, t1.[Date]

Select eT.userid, startDate, endDate 
From @EndTable eT
INNER JOIN @StartTable sT On eT.userid = sT.userid 
AND eT.rowid = sT.rowid;

So as you can see, I created two table variables, one for starts and one for ends, by self-joining the table on the date either just prior to or just after the date in the [Date] column. This means that I'm selecting only records that don't have a date prior (so these would be at the beginning of a period) for the Start Table and those that have no date following (so these would be at the end of a period) for the End Table.

When these are inserted into the table variable, they are numbered in sequence because of the Identity column. Then I join the two table variables together. Because they are ordered, the start and end dates should always match up properly.

This solution works for me because I have at most one record per ID per day and I am only interested in days, not hours, etc. Even though it is several steps, I like it because it is conceptually simple and eliminates matched records without having cursors or loops. I hope it will work for you too.

无人问我粥可暖 2024-11-12 00:52:35

这个问题可能会对您有所帮助。我直接链接到罗布·法利的答案,因为我觉得这是一个类似的问题。

This SO Question might help you. I linked directly to Rob Farley's answer as I feel this is a similar problem.

作业与我同在 2024-11-12 00:52:35

您可以采取的一种方法是添加一个字段来指示序列中的下一个日期。 (将其添加到当前表或使用临时表,将基础数据存储到临时表,然后更新序列中的下一个日期)。

您的起始数据结构将如下所示:

ID, PerfDate, NextDate
200236, 2011-01-02, 2011-01-03
200236, 2011-01-03, 2011-01-04
etc.

然后您可以使用一系列相关子查询将数据汇总到所需的输出中:

SELECT ID, StartDate, EndDate
FROM (
SELECT DISTINCT ID, PerfDate AS StartDate, 
    (SELECT MIN([PerfDate]) FROM [SourceTable] S3
    WHERE S3.ID = S1.ID
    AND S3.NextDate > S1.PerfDate
    AND ISNULL(
        (SELECT MIN(PerfDate) 
        FROM [SourceTable] AS S4
        WHERE S4.ID = S1.ID 
        AND S4.NextDate > S3.NextDate), S3.NextDate + 1) > S3.NextDate) AS EndDate
FROM [SourceTable] S1
WHERE 
    ISNULL(
        (SELECT MAX(NextDate) 
        FROM [SourceTable] S2 
        WHERE S2.ID = S1.ID 
        AND S2.PerfDate < S1.PerfDate), PerfDate -1) < S1.PerfDate)q
ORDER BY q.ID, q.StartDate

One approach you can take is to add a field that indicates the next date in the sequence. (Either add it to your current table or use a temporary table, store the underlying data to the temp table and then update the next date in the sequence).

Your starting data structure would look something like this:

ID, PerfDate, NextDate
200236, 2011-01-02, 2011-01-03
200236, 2011-01-03, 2011-01-04
etc.

You can then use a series of correlated subqueries to roll the data up into the desired output:

SELECT ID, StartDate, EndDate
FROM (
SELECT DISTINCT ID, PerfDate AS StartDate, 
    (SELECT MIN([PerfDate]) FROM [SourceTable] S3
    WHERE S3.ID = S1.ID
    AND S3.NextDate > S1.PerfDate
    AND ISNULL(
        (SELECT MIN(PerfDate) 
        FROM [SourceTable] AS S4
        WHERE S4.ID = S1.ID 
        AND S4.NextDate > S3.NextDate), S3.NextDate + 1) > S3.NextDate) AS EndDate
FROM [SourceTable] S1
WHERE 
    ISNULL(
        (SELECT MAX(NextDate) 
        FROM [SourceTable] S2 
        WHERE S2.ID = S1.ID 
        AND S2.PerfDate < S1.PerfDate), PerfDate -1) < S1.PerfDate)q
ORDER BY q.ID, q.StartDate
瞎闹 2024-11-12 00:52:35

我过去就是这样做的。这是一个两步过程:

  1. 构建候选连续周期集
  2. 如果存在任何重叠周期,则删除除最长的周期之外的所有周期。

这是一个脚本,展示了它是如何完成的。你也许可以通过一个[bug,丑陋]查询来完成它,但尝试这样做会让我头疼。我使用临时表,因为它使调试变得更加容易。

drop table #source
create table #source
(
  id    int      not null ,
  dtCol datetime not null ,

  -----------------------------------------------------------------------
  -- ASSUMPTION 1: Each date must be unique for a given ID value.
  -----------------------------------------------------------------------
  unique clustered ( id , dtCol ) ,

  -----------------------------------------------------------------------
  -- ASSUMPTION 2: The datetime column only represents a day.
  -- The value of the time component is always 00:00:00.000
  -----------------------------------------------------------------------
  check ( dtCol = convert(datetime,convert(varchar,dtCol,112),112) ) ,

)
go

insert #source values(1,'jan 1, 2011')
insert #source values(1,'jan 4, 2011')
insert #source values(1,'jan 5, 2011')
insert #source values(2,'jan 1, 2011')
insert #source values(2,'jan 2, 2011')
insert #source values(2,'jan 3, 2011')
insert #source values(2,'jan 5, 2011')
insert #source values(3,'jan 1, 2011')
insert #source values(4,'jan 1, 2011')
insert #source values(4,'jan 2, 2011')
insert #source values(4,'jan 3, 2011')
insert #source values(4,'jan 4, 2011')
go

insert #source values( 200236 , '2011-01-02')
insert #source values( 200236 , '2011-01-03')
insert #source values( 200236 , '2011-01-05')
insert #source values( 200236 , '2011-01-06')
insert #source values( 200236 , '2011-01-07')
insert #source values( 200236 , '2011-01-08')
insert #source values( 200236 , '2011-01-09')
insert #source values( 200236 , '2011-01-10')
insert #source values( 200236 , '2011-01-11')
insert #source values( 200236 , '2011-01-12')
insert #source values( 200236 , '2011-01-13')
insert #source values( 200236 , '2011-01-15')
insert #source values( 200236 , '2011-01-16')
insert #source values( 200236 , '2011-01-17')
go

drop table #candidate_range
go
create table #candidate_range
(
  rowId   int      not null identity(1,1) ,
  id      int      not null ,
  dtFrom  datetime not null ,
  dtThru  datetime not null ,
  length  as 1+datediff(day,dtFrom,dtThru) ,

  primary key nonclustered ( rowID ) ,
  unique clustered (id,dtFrom,dtThru) ,

)
go

--
-- seed the candidate range table with the set of all possible contiguous ranges for each id
--
insert #candidate_range ( id , dtFrom , dtThru )
select id      = tFrom.id    ,
       valFrom = tFrom.dtCol ,
       valThru = tThru.dtCol
from #source tFrom
join #source tThru on tThru.id     = tFrom.id
                  and tThru.dtCol >= tFrom.dtCol
where 1+datediff(day,tFrom.dtCol,tThru.dtCol) = ( select count(*)
                                                  from #source t
                                                  where t.id = tFrom.id
                                                    and t.dtCol between tFrom.dtCol and tThru.dtCol
                                                )
order by 1,2,3
go

--
-- compare the table to itself. If we find overlapping periods,
-- we'll keep the longest such period and delete the shorter overlapping periods.
--
delete t2
from #candidate_range t1
join #candidate_range t2 on t2.id      = t1.id
                        and t2.rowId  != t1.rowID
                        and t2.length <  t1.length
                        and t2.dtFrom <= t1.dtThru
                        and t2.dtThru >= t1.dtFrom
go

这就是全部内容了。

This is the way I've done it in the past. It's a two step process:

  1. Build the set of candidate contiguous periods
  2. If there are any overlapping periods, delete all but the longest such period.

Here's a script that shows how it's done. You might be able to pull it off in a single [bug, ugly] query, but trying to do that makes my head hurt. I'm using temp tables as it makes the debugging a whole lot easier.

drop table #source
create table #source
(
  id    int      not null ,
  dtCol datetime not null ,

  -----------------------------------------------------------------------
  -- ASSUMPTION 1: Each date must be unique for a given ID value.
  -----------------------------------------------------------------------
  unique clustered ( id , dtCol ) ,

  -----------------------------------------------------------------------
  -- ASSUMPTION 2: The datetime column only represents a day.
  -- The value of the time component is always 00:00:00.000
  -----------------------------------------------------------------------
  check ( dtCol = convert(datetime,convert(varchar,dtCol,112),112) ) ,

)
go

insert #source values(1,'jan 1, 2011')
insert #source values(1,'jan 4, 2011')
insert #source values(1,'jan 5, 2011')
insert #source values(2,'jan 1, 2011')
insert #source values(2,'jan 2, 2011')
insert #source values(2,'jan 3, 2011')
insert #source values(2,'jan 5, 2011')
insert #source values(3,'jan 1, 2011')
insert #source values(4,'jan 1, 2011')
insert #source values(4,'jan 2, 2011')
insert #source values(4,'jan 3, 2011')
insert #source values(4,'jan 4, 2011')
go

insert #source values( 200236 , '2011-01-02')
insert #source values( 200236 , '2011-01-03')
insert #source values( 200236 , '2011-01-05')
insert #source values( 200236 , '2011-01-06')
insert #source values( 200236 , '2011-01-07')
insert #source values( 200236 , '2011-01-08')
insert #source values( 200236 , '2011-01-09')
insert #source values( 200236 , '2011-01-10')
insert #source values( 200236 , '2011-01-11')
insert #source values( 200236 , '2011-01-12')
insert #source values( 200236 , '2011-01-13')
insert #source values( 200236 , '2011-01-15')
insert #source values( 200236 , '2011-01-16')
insert #source values( 200236 , '2011-01-17')
go

drop table #candidate_range
go
create table #candidate_range
(
  rowId   int      not null identity(1,1) ,
  id      int      not null ,
  dtFrom  datetime not null ,
  dtThru  datetime not null ,
  length  as 1+datediff(day,dtFrom,dtThru) ,

  primary key nonclustered ( rowID ) ,
  unique clustered (id,dtFrom,dtThru) ,

)
go

--
-- seed the candidate range table with the set of all possible contiguous ranges for each id
--
insert #candidate_range ( id , dtFrom , dtThru )
select id      = tFrom.id    ,
       valFrom = tFrom.dtCol ,
       valThru = tThru.dtCol
from #source tFrom
join #source tThru on tThru.id     = tFrom.id
                  and tThru.dtCol >= tFrom.dtCol
where 1+datediff(day,tFrom.dtCol,tThru.dtCol) = ( select count(*)
                                                  from #source t
                                                  where t.id = tFrom.id
                                                    and t.dtCol between tFrom.dtCol and tThru.dtCol
                                                )
order by 1,2,3
go

--
-- compare the table to itself. If we find overlapping periods,
-- we'll keep the longest such period and delete the shorter overlapping periods.
--
delete t2
from #candidate_range t1
join #candidate_range t2 on t2.id      = t1.id
                        and t2.rowId  != t1.rowID
                        and t2.length <  t1.length
                        and t2.dtFrom <= t1.dtThru
                        and t2.dtThru >= t1.dtFrom
go

That's about all there is to it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文