查询“运行”; Postgres 中连续列的数量
我有一个表:
create table table1 (event_id integer, event_time timestamp without time zone);
insert into table1 (event_id, event_time) values
(1, '2011-01-01 00:00:00'),
(2, '2011-01-01 00:00:15'),
(3, '2011-01-01 00:00:29'),
(4, '2011-01-01 00:00:58'),
(5, '2011-01-02 06:03:00'),
(6, '2011-01-02 06:03:09'),
(7, '2011-01-05 11:01:31'),
(8, '2011-01-05 11:02:15'),
(9, '2011-01-06 09:34:19'),
(10, '2011-01-06 09:34:41'),
(11, '2011-01-06 09:35:06');
我想构造一个语句,给定一个事件可以返回从该事件开始的事件“运行”的长度。一次跑步的定义是:
- 如果两个事件的时间间隔在 30 秒内,则它们一起进行一次跑步。
- 如果 A 和 B 一起参加比赛,并且 B 和 C 一起参加比赛,则 A 参加比赛 但是
我的查询不需要在时间上向后移动,所以如果我选择事件 2,那么只有事件 2、3 和 4 应该算作以 2 开头的事件运行的一部分,而 3 应该是返回作为运行的长度。
有什么想法吗?我很困惑。
I have a table:
create table table1 (event_id integer, event_time timestamp without time zone);
insert into table1 (event_id, event_time) values
(1, '2011-01-01 00:00:00'),
(2, '2011-01-01 00:00:15'),
(3, '2011-01-01 00:00:29'),
(4, '2011-01-01 00:00:58'),
(5, '2011-01-02 06:03:00'),
(6, '2011-01-02 06:03:09'),
(7, '2011-01-05 11:01:31'),
(8, '2011-01-05 11:02:15'),
(9, '2011-01-06 09:34:19'),
(10, '2011-01-06 09:34:41'),
(11, '2011-01-06 09:35:06');
I would like to construct a statement that given an event could return the length of the 'run' of events starting with that event. A run is defined by:
- Two events are in a run together if they are within 30 seconds of one another.
- If A and B are in a run together, and B and C are in a run together then A is in a run
with C.
However my query does not need to go backwards in time, so if I select on event 2, then only events 2, 3, and 4 should be counted as part of the run of events starting with 2, and 3 should be returned as the length of the run.
Any ideas? I'm stumped.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是递归 CTE 解决方案。 (岛屿和间隙问题自然适合递归 CTE)
结果:
Here is the RECURSIVE CTE-solution. (islands-and-gaps problems naturally lend themselves to recursive CTE)
Result:
可能如下所示:
此版本不依赖于无间隙 ID 序列,而是仅依赖于
event_time
。相同的
event_time
还会按event_id
进行排序,以确保不会产生歧义。了解窗口函数
row_number()< /code> 和
lead()
和 CTE (和条款)在手册中。编辑
如果我们不能假设较大的
event_id
具有较晚(或相等)的event_time
,请将其替换为第一个WHERE
子句:与起始行相同的
event_time
但较小的event_id
仍将被忽略。在一次运行直到结束的特殊情况下,找不到结束并且没有返回行。
COALESCE
返回所有行的计数。Could look like this:
This version does not rely on a gap-less sequence of IDs, but on
event_time
only.Identical
event_time
's are additionally sorted byevent_id
to be unambiguous.Read about the window functions
row_number()
andlead()
and CTE (With clause) in the manual.Edit
If we cannot assume that a bigger
event_id
has a later (or equal)event_time
, substitute this for the firstWHERE
clause:Rows with the same
event_time
as the starting row but a a smallerevent_id
will still be ignored.In the special case of one run till the end no end is found and no row returned.
COALESCE
returns the count of all rows instead.您可以通过日期差异语句将表自身连接起来。实际上,这是postgres,一个简单的减法就可以了。
该子查询将查找所有属于“开始事件”的记录。也就是说,所有在其之前 30 秒内没有发生另一个事件记录的事件记录:
经过一些更改...相同的逻辑,除了选择“结束”事件:
现在我们可以将它们连接在一起以关联哪个事件开始事件转到哪个结束事件:(
仍在写...有几种方法可以进行此操作。我假设只有示例具有线性 ID 号,因此您需要将开始事件时间加入到结束事件事件时间上具有最小正差的时间)。
这是我的最终结果...有点嵌套了很多子选择
结果作为 start_id, end_Id:
1;"4"
5;“6”
7;“单一事件”
8;“单一事件”
9;“11”
我必须使用第三个左连接来挑选单个事件作为检测既是开始事件又是结束事件的事件的方法。最终结果位于 ID 中,如果您需要 ID 以外的其他信息,则可以链接回原始表。不确定此解决方案将如何扩展,如果您有数百万个事件......可能是一个问题。
You can join a table onto itself on a date difference statement. Actually, this is postgres, a simple minus works.
This subquery will find all records that is a 'start event'. That is to say, all event records that does not have another event record occurring within 30 seconds before it:
With a few changes...same logic, except picking up an 'end' event:
Now we can join these together to associate which start event goes to which end event:
(still writing...there's a couple ways at going on this. I'm assuming only the example has linear ID numbers, so you'll want to join the start event time to the end event time having the smallest positive difference on the event times).
Here's my end result...kinda nested a lot of subselects
Results as start_id, end_Id:
1;"4"
5;"6"
7;"single event"
8;"single event"
9;"11"
I had to use a third left join to pick out single events as a method of detecting events that were both start events and end events. End result is in ID's and can be linked back to your original table if you want different information than just the ID. Unsure how this solution will scale, if you've got millions of events...could be an issue.