查询“运行”; Postgres 中连续列的数量

发布于 2024-12-17 22:56:05 字数 749 浏览 1 评论 0原文

我有一个表:

create table table1 (event_id integer, event_time timestamp without time zone);
insert into table1 (event_id, event_time) values
(1, '2011-01-01 00:00:00'),
(2, '2011-01-01 00:00:15'),
(3, '2011-01-01 00:00:29'),
(4, '2011-01-01 00:00:58'),
(5, '2011-01-02 06:03:00'),
(6, '2011-01-02 06:03:09'),
(7, '2011-01-05 11:01:31'),
(8, '2011-01-05 11:02:15'),
(9, '2011-01-06 09:34:19'),
(10, '2011-01-06 09:34:41'),
(11, '2011-01-06 09:35:06');

我想构造一个语句,给定一个事件可以返回从该事件开始的事件“运行”的长度。一次跑步的定义是:

  1. 如果两个事件的时间间隔在 30 秒内,则它们一起进行一次跑步。
  2. 如果 A 和 B 一起参加比赛,并且 B 和 C 一起参加比赛,则 A 参加比赛 但是

我的查询不需要在时间上向后移动,所以如果我选择事件 2,那么只有事件 2、3 和 4 应该算作以 2 开头的事件运行的一部分,而 3 应该是返回作为运行的长度。

有什么想法吗?我很困惑。

I have a table:

create table table1 (event_id integer, event_time timestamp without time zone);
insert into table1 (event_id, event_time) values
(1, '2011-01-01 00:00:00'),
(2, '2011-01-01 00:00:15'),
(3, '2011-01-01 00:00:29'),
(4, '2011-01-01 00:00:58'),
(5, '2011-01-02 06:03:00'),
(6, '2011-01-02 06:03:09'),
(7, '2011-01-05 11:01:31'),
(8, '2011-01-05 11:02:15'),
(9, '2011-01-06 09:34:19'),
(10, '2011-01-06 09:34:41'),
(11, '2011-01-06 09:35:06');

I would like to construct a statement that given an event could return the length of the 'run' of events starting with that event. A run is defined by:

  1. Two events are in a run together if they are within 30 seconds of one another.
  2. If A and B are in a run together, and B and C are in a run together then A is in a run
    with C.

However my query does not need to go backwards in time, so if I select on event 2, then only events 2, 3, and 4 should be counted as part of the run of events starting with 2, and 3 should be returned as the length of the run.

Any ideas? I'm stumped.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

献世佛 2024-12-24 22:56:05

这是递归 CTE 解决方案。 (岛屿和间隙问题自然适合递归 CTE)

WITH RECURSIVE runrun AS (
    SELECT event_id, event_time
    , event_time - ('30 sec'::interval) AS low_time
    , event_time + ('30 sec'::interval) AS high_time
    FROM table1
    UNION
    SELECT t1.event_id, t1.event_time
    , LEAST ( rr.low_time, t1.event_time - ('30 sec'::interval) ) AS low_time
    , GREATEST ( rr.high_time, t1.event_time + ('30 sec'::interval) ) AS high_time
    FROM table1 t1
    JOIN runrun rr ON t1.event_time >= rr.low_time
                  AND t1.event_time < rr.high_time
    )
SELECT DISTINCT ON (event_id) *
FROM runrun rr
WHERE rr.event_time >= '2011-01-01 00:00:15'
AND rr.low_time <= '2011-01-01 00:00:15'
AND rr.high_time > '2011-01-01 00:00:15'
    ;

结果:

 event_id |     event_time      |      low_time       |      high_time      
----------+---------------------+---------------------+---------------------
        2 | 2011-01-01 00:00:15 | 2010-12-31 23:59:45 | 2011-01-01 00:00:45
        3 | 2011-01-01 00:00:29 | 2010-12-31 23:59:45 | 2011-01-01 00:01:28
        4 | 2011-01-01 00:00:58 | 2010-12-31 23:59:30 | 2011-01-01 00:01:28
(3 rows)

Here is the RECURSIVE CTE-solution. (islands-and-gaps problems naturally lend themselves to recursive CTE)

WITH RECURSIVE runrun AS (
    SELECT event_id, event_time
    , event_time - ('30 sec'::interval) AS low_time
    , event_time + ('30 sec'::interval) AS high_time
    FROM table1
    UNION
    SELECT t1.event_id, t1.event_time
    , LEAST ( rr.low_time, t1.event_time - ('30 sec'::interval) ) AS low_time
    , GREATEST ( rr.high_time, t1.event_time + ('30 sec'::interval) ) AS high_time
    FROM table1 t1
    JOIN runrun rr ON t1.event_time >= rr.low_time
                  AND t1.event_time < rr.high_time
    )
SELECT DISTINCT ON (event_id) *
FROM runrun rr
WHERE rr.event_time >= '2011-01-01 00:00:15'
AND rr.low_time <= '2011-01-01 00:00:15'
AND rr.high_time > '2011-01-01 00:00:15'
    ;

Result:

 event_id |     event_time      |      low_time       |      high_time      
----------+---------------------+---------------------+---------------------
        2 | 2011-01-01 00:00:15 | 2010-12-31 23:59:45 | 2011-01-01 00:00:45
        3 | 2011-01-01 00:00:29 | 2010-12-31 23:59:45 | 2011-01-01 00:01:28
        4 | 2011-01-01 00:00:58 | 2010-12-31 23:59:30 | 2011-01-01 00:01:28
(3 rows)
暮年慕年 2024-12-24 22:56:05

可能如下所示:

WITH x AS (
    SELECT event_time
          ,row_number() OVER w AS rn
          ,lead(event_time) OVER w AS next_time
    FROM   table1
    WHERE  event_id >= <start_id>
    WINDOW w AS (ORDER BY event_time, event_id)
    )
SELECT COALESCE(
      (SELECT x.rn
       FROM   x
       WHERE  (x.event_time + interval '30s') < x.next_time
       ORDER  BY x.rn
       LIMIT  1)
     ,(SELECT count(*) FROM x)
      ) AS run_length

此版本不依赖于无间隙 ID 序列,而是仅依赖于 event_time
相同的 event_time 还会按 event_id 进行排序,以确保不会产生歧义。

了解窗口函数 row_number()< /code> 和 lead()CTE (和条款)在手册中。

编辑

如果我们不能假设较大的 event_id 具有较晚(或相等)的 event_time,请将其替换为第一个 WHERE 子句

WHERE event_time >= (SELECT event_time FROM table1 WHERE event_id = <start_id>)

:与起始行相同的 event_time 但较小的 event_id 仍将被忽略。

一次运行直到结束的特殊情况下,找不到结束并且没有返回行。 COALESCE 返回所有行的计数。

Could look like this:

WITH x AS (
    SELECT event_time
          ,row_number() OVER w AS rn
          ,lead(event_time) OVER w AS next_time
    FROM   table1
    WHERE  event_id >= <start_id>
    WINDOW w AS (ORDER BY event_time, event_id)
    )
SELECT COALESCE(
      (SELECT x.rn
       FROM   x
       WHERE  (x.event_time + interval '30s') < x.next_time
       ORDER  BY x.rn
       LIMIT  1)
     ,(SELECT count(*) FROM x)
      ) AS run_length

This version does not rely on a gap-less sequence of IDs, but on event_time only.
Identical event_time's are additionally sorted by event_id to be unambiguous.

Read about the window functions row_number() and lead() and CTE (With clause) in the manual.

Edit

If we cannot assume that a bigger event_id has a later (or equal) event_time, substitute this for the first WHERE clause:

WHERE event_time >= (SELECT event_time FROM table1 WHERE event_id = <start_id>)

Rows with the same event_time as the starting row but a a smaller event_id will still be ignored.

In the special case of one run till the end no end is found and no row returned. COALESCE returns the count of all rows instead.

迷路的信 2024-12-24 22:56:05

您可以通过日期差异语句将表自身连接起来。实际上,这是postgres,一个简单的减法就可以了。

该子查询将查找所有属于“开始事件”的记录。也就是说,所有在其之前 30 秒内没有发生另一个事件记录的事件记录:

(Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
 left join 
 (select event_id, event_time from table1) b
 on a.event_time - b.event_time < '00:00:30' and a.event_time - b.event_time > '00:00:00'
 where b.event_time is null) startevent

经过一些更改...相同的逻辑,除了选择“结束”事件:

(Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
 left join 
 (select event_id, event_time from table1) b
 on b.event_time - a.event_time < '00:00:30' and b.event_time - a.event_time > '00:00:00'
 where b.event_time is null) end_event

现在我们可以将它们连接在一起以关联哪个事件开始事件转到哪个结束事件:(

仍在写...有几种方法可以进行此操作。我假设只有示例具有线性 ID 号,因此您需要将开始事件时间加入到结束事件事件时间上具有最小正差的时间)。

这是我的最终结果...有点嵌套了很多子选择

 select a.start_id, case when a.event_id is null then t1.event_id::varchar else 'single  event' end as end_id
 from
 (select start_event.event_id as start_id, start_event.event_time as start_time,      last_event.event_id, min(end_event.event_time - start_event.event_time) as min_interval   
 from
    (Select a.event_id, a.event_time from
    (Select event_id, event_time from table1) a
     left join 
    (select event_id, event_time from table1) b
   on a.event_time - b.event_time < '00:00:30' and a.event_time - b.event_time > '00:00:00'
 where b.event_time is null) start_event

inner join

   (Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
 left join 
 (select event_id, event_time from table1) b
 on b.event_time - a.event_time < '00:00:30' and b.event_time - a.event_time > '00:00:00'
 where b.event_time is null) end_event     
on end_event.event_time > start_event.event_time

--check for only event
 left join
 (Select a.event_id, a.event_time from
 (Select event_id, event_time from table1) a
  left join 
  (select event_id, event_time from table1) b
  on b.event_time - a.event_time < '00:00:30' and b.event_time - a.event_time > '00:00:00'
  where b.event_time is null) last_event
    on start_event.event_id = last_event.event_id
group by 1,2,3) a
    left join table1 t1 on t1.event_time = a.start_time + a.min_interval

结果作为 start_id, end_Id:

1;"4"
5;“6”
7;“单一事件”
8;“单一事件”
9;“11”

我必须使用第三个左连接来挑选单个事件作为检测既是开始事件又是结束事件的事件的方法。最终结果位于 ID 中,如果您需要 ID 以外的其他信息,则可以链接回原始表。不确定此解决方案将如何扩展,如果您有数百万个事件......可能是一个问题。

You can join a table onto itself on a date difference statement. Actually, this is postgres, a simple minus works.

This subquery will find all records that is a 'start event'. That is to say, all event records that does not have another event record occurring within 30 seconds before it:

(Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
 left join 
 (select event_id, event_time from table1) b
 on a.event_time - b.event_time < '00:00:30' and a.event_time - b.event_time > '00:00:00'
 where b.event_time is null) startevent

With a few changes...same logic, except picking up an 'end' event:

(Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
 left join 
 (select event_id, event_time from table1) b
 on b.event_time - a.event_time < '00:00:30' and b.event_time - a.event_time > '00:00:00'
 where b.event_time is null) end_event

Now we can join these together to associate which start event goes to which end event:

(still writing...there's a couple ways at going on this. I'm assuming only the example has linear ID numbers, so you'll want to join the start event time to the end event time having the smallest positive difference on the event times).

Here's my end result...kinda nested a lot of subselects

 select a.start_id, case when a.event_id is null then t1.event_id::varchar else 'single  event' end as end_id
 from
 (select start_event.event_id as start_id, start_event.event_time as start_time,      last_event.event_id, min(end_event.event_time - start_event.event_time) as min_interval   
 from
    (Select a.event_id, a.event_time from
    (Select event_id, event_time from table1) a
     left join 
    (select event_id, event_time from table1) b
   on a.event_time - b.event_time < '00:00:30' and a.event_time - b.event_time > '00:00:00'
 where b.event_time is null) start_event

inner join

   (Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
 left join 
 (select event_id, event_time from table1) b
 on b.event_time - a.event_time < '00:00:30' and b.event_time - a.event_time > '00:00:00'
 where b.event_time is null) end_event     
on end_event.event_time > start_event.event_time

--check for only event
 left join
 (Select a.event_id, a.event_time from
 (Select event_id, event_time from table1) a
  left join 
  (select event_id, event_time from table1) b
  on b.event_time - a.event_time < '00:00:30' and b.event_time - a.event_time > '00:00:00'
  where b.event_time is null) last_event
    on start_event.event_id = last_event.event_id
group by 1,2,3) a
    left join table1 t1 on t1.event_time = a.start_time + a.min_interval

Results as start_id, end_Id:

1;"4"
5;"6"
7;"single event"
8;"single event"
9;"11"

I had to use a third left join to pick out single events as a method of detecting events that were both start events and end events. End result is in ID's and can be linked back to your original table if you want different information than just the ID. Unsure how this solution will scale, if you've got millions of events...could be an issue.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文