具有重复日期的时间范围内的最大值

发布于 2024-09-29 16:54:41 字数 1403 浏览 3 评论 0原文

我有一个像这样的表，

DateTime start_time not null,
DateTime end_time not null,
Status_Id int not null,
Entry_Id int not null

我想获取一段时间内每个状态的计数，其中只有最后一个启动的状态对于给定的entry_id 有效。

我现在使用的是这个（带有动态日期）：

with c (Status_Id, Entry_Id, Start_Date) AS (
  select Status_Id, Entry_Id, Start_Date from tbl where
  (End_Date BETWEEN '19000101' AND '21000101')
  AND ((Start_Date BETWEEN '19000101' AND '21000101')
  OR End_Date <= '21000101'))
select Status_Id, count(*) as cnt from 
 (select Entry_Id, max(start_date) as start_date from c
  group by Entry_Id) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
GROUP BY Status_Id WITH ROLLUP

问题是，当某些entry_id具有多个相同start_date的条目时，它会计数错误。（我并不特别关心在这种情况下选择哪种状态，只是只选择1）

一些测试数据：

status_id   Entry_id    Start_date
496 45173   2010-09-29 18:04:33.000
490 45173   2010-09-29 18:48:20.100
495 45173   2010-09-29 19:25:29.300
489 45174   2010-09-29 18:43:01.500
493 45175   2010-09-29 18:48:00.500
493 45175   2010-09-29 21:16:02.700
489 45175   2010-09-30 17:52:12.100
493 45176   2010-09-29 17:55:21.300
492 45176   2010-09-29 18:20:52.200 <------ This is the one that gives the problems
493 45176   2010-09-29 18:20:52.200 <------ This is the one that gives the problems

结果应该是

495 1
489 2
492 1 (or 493 1)

原文

I have a table like this

DateTime start_time not null,
DateTime end_time not null,
Status_Id int not null,
Entry_Id int not null

I want to get the count of each status within a time period, where only the last started is valid for a given entry_id.

What I am using now is this (with dynamic dates):

with c (Status_Id, Entry_Id, Start_Date) AS (
  select Status_Id, Entry_Id, Start_Date from tbl where
  (End_Date BETWEEN '19000101' AND '21000101')
  AND ((Start_Date BETWEEN '19000101' AND '21000101')
  OR End_Date <= '21000101'))
select Status_Id, count(*) as cnt from 
 (select Entry_Id, max(start_date) as start_date from c
  group by Entry_Id) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
GROUP BY Status_Id WITH ROLLUP

The problem is that it counts wrong when there are some entry_id that have multiple entries the same start_date. (I don't particularly care which status is chosen in this case, just that only 1 is chosen)

Some test data:

status_id   Entry_id    Start_date
496 45173   2010-09-29 18:04:33.000
490 45173   2010-09-29 18:48:20.100
495 45173   2010-09-29 19:25:29.300
489 45174   2010-09-29 18:43:01.500
493 45175   2010-09-29 18:48:00.500
493 45175   2010-09-29 21:16:02.700
489 45175   2010-09-30 17:52:12.100
493 45176   2010-09-29 17:55:21.300
492 45176   2010-09-29 18:20:52.200 <------ This is the one that gives the problems
493 45176   2010-09-29 18:20:52.200 <------ This is the one that gives the problems

The result should be

495 1
489 2
492 1 (or 493 1)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

日暮斜阳 2024-10-06 16:54:41

如果我正确理解，您想要计算您的时间段内特定状态的不同条目...如果是这样，您应该在 count()DISTINCT 子句code> 从 count(*) 更改为 count(distinct Entry_id)

with c (Status_Id, Entry_Id, Start_Date) AS (
  select Status_Id, Entry_Id, Start_Date from tbl where
  (End_Date BETWEEN '19000101' AND '21000101')
  AND ((Start_Date BETWEEN '19000101' AND '21000101')
  OR End_Date <= '21000101'))
select Status_Id, count(distinct Entry_Id) as cnt from 
 (select Entry_Id, max(start_date) as start_date from c
  group by Entry_Id) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
GROUP BY Status_Id WITH ROLLUP

编辑

只要您不关心给定条目返回哪个状态，我认为您可以修改内部查询以返回第一个状态并加入状态

with c (Status_Id, Entry_Id, Start_Date) AS (
  select Status_Id, Entry_Id, Start_Date from tbl where
  (End_Date BETWEEN '19000101' AND '21000101')
  AND ((Start_Date BETWEEN '19000101' AND '21000101')
  OR End_Date <= '21000101'))
select c.Status_Id, count(c.Entry_Id) as cnt from 
 (select Entry_Id, Start_Date, (select top 1 Status_id from c where Entry_Id = CC.Entry_Id and Start_Date = CC.Start_Date) as Status_Id
  from (select Entry_Id, max(start_date) as start_date from c
  group by Entry_Id) as CC) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
and c.status_id = d.status_id
GROUP BY c.Status_Id

结果

Status_id Count
 489       2
 492       1
 495       1

If i correctly understood, you want to count distinct entry for a specific status in your time period... if it is so, you should use the DISTINCT clause in your count() changing from count(*) to count(distinct Entry_id)

with c (Status_Id, Entry_Id, Start_Date) AS (
  select Status_Id, Entry_Id, Start_Date from tbl where
  (End_Date BETWEEN '19000101' AND '21000101')
  AND ((Start_Date BETWEEN '19000101' AND '21000101')
  OR End_Date <= '21000101'))
select Status_Id, count(distinct Entry_Id) as cnt from 
 (select Entry_Id, max(start_date) as start_date from c
  group by Entry_Id) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
GROUP BY Status_Id WITH ROLLUP

EDIT

AS long as you do not care which status is return for a given entry, i think you could modify the inner query to return the first Status and join the status too

with c (Status_Id, Entry_Id, Start_Date) AS (
  select Status_Id, Entry_Id, Start_Date from tbl where
  (End_Date BETWEEN '19000101' AND '21000101')
  AND ((Start_Date BETWEEN '19000101' AND '21000101')
  OR End_Date <= '21000101'))
select c.Status_Id, count(c.Entry_Id) as cnt from 
 (select Entry_Id, Start_Date, (select top 1 Status_id from c where Entry_Id = CC.Entry_Id and Start_Date = CC.Start_Date) as Status_Id
  from (select Entry_Id, max(start_date) as start_date from c
  group by Entry_Id) as CC) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
and c.status_id = d.status_id
GROUP BY c.Status_Id

Result

Status_id Count
 489       2
 492       1
 495       1

回复收藏 0 原文

夏见 2024-10-06 16:54:41

基于 OP 可爱评论的替代答案。

WITH
   [sequenced_data]
AS
(
  SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY entry_id ORDER BY start_time DESC, status_id DESC) AS [sequence_id]
  FROM
    tbl
  WHERE
    start_time < '21:00' AND end_time > '19:00'
)
SELECT status_id, COUNT(*)
FROM [sequenced_data]
WHERE sequence_id = 1
GROUP BY status_id

仅当没有可以唯一标识各个记录的单个字段时才需要 ROW_NUMBER() 函数。可以在数据中存在唯一标识列的地方编写替代查询。然而，SQL Server 在优化 ROW_NUMBER() 查询（如上面的查询）方面非常有效，并且它应该（假设有相关索引）有效。

编辑

有人刚刚向我建议，人们不喜欢长代码，他们更喜欢紧凑的代码。因此，CTE 版本已替换为内联版本（CTE 实际上只是出于解释性原因帮助分解查询，并且如果需要，则位于编辑历史记录中）...

EDIT

ROW_NUMBER() 不能正如 OP 所发现的那样，构成 WHERE 子句的一部分。通过放回一个 CTE 来更新查询。

Alternative answer based on OPs lovely comments.

WITH
   [sequenced_data]
AS
(
  SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY entry_id ORDER BY start_time DESC, status_id DESC) AS [sequence_id]
  FROM
    tbl
  WHERE
    start_time < '21:00' AND end_time > '19:00'
)
SELECT status_id, COUNT(*)
FROM [sequenced_data]
WHERE sequence_id = 1
GROUP BY status_id

The ROW_NUMBER() function is only needed where there isn't a single field that can uniquely identify individul records. Alternative queries can be written where there is a unique identity column in the data. SQL Server, however, is extremely effective at optimising ROW_NUMBER() queries such as above and it should (assuming relevant indexes) be effective.

EDIT

Someone just suggested to me that people don't like long code, they prefer compact code. So the CTE version has been replaced with an inline version (The CTEs really just helped breakdown the query for explanatory reasons, and is in the edit history if needed)...

EDIT

ROW_NUMBER() can't form part of the WHERE clause, as spotted by OP. Query updated by putting one CTE back in.

回复收藏 0 原文

記柔刀 2024-10-06 16:54:41

我自己找到了解决方案：

with c (Status_Id, Entry_Id, Start_Date) AS (
  select Status_Id, Entry_Id, Start_Date from tbl where
  (End_Date BETWEEN '19000101' AND '21000101')
  AND ((Start_Date BETWEEN '19000101' AND '21000101')
  OR End_Date <= '21000101'))
select Status_Id, count(*) as cnt from 
(select max(Status_Id) as Status_Id, c.Entry_Id from --<--- ADDED
 (select Entry_Id, max(start_date) as start_date from c
  group by Entry_Id) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
group by c.Entry_Id) y  --<--- ADDED
GROUP BY Status_Id WITH ROLLUP

I found a solution myself:

with c (Status_Id, Entry_Id, Start_Date) AS (
  select Status_Id, Entry_Id, Start_Date from tbl where
  (End_Date BETWEEN '19000101' AND '21000101')
  AND ((Start_Date BETWEEN '19000101' AND '21000101')
  OR End_Date <= '21000101'))
select Status_Id, count(*) as cnt from 
(select max(Status_Id) as Status_Id, c.Entry_Id from --<--- ADDED
 (select Entry_Id, max(start_date) as start_date from c
  group by Entry_Id) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
group by c.Entry_Id) y  --<--- ADDED
GROUP BY Status_Id WITH ROLLUP

回复收藏 0 原文

~没有更多了~