当行中的时间戳小于或等于某个值时，使用分析函数对一组记录进行分组

发布于 08-26 15:07 字数 842 浏览 9 评论 0原文

我在 Oracle 数据库中有一个表，其中包含一个 DATE 列，其中加载了每行的插入时间戳。我需要使用此类表中的现有数据来分析某些事件之间的相关性，以便使用这样的数据：

COL_1         COL_2         TS
    A             1         Mon 15, February 2010 10:03:22
    B             2         Mon 15, February 2010 10:05:37
    C             3         Mon 15, February 2010 10:20:21
    D             4         Mon 15, February 2010 10:20:21
    E             5         Mon 15, February 2010 10:20:24
    F             6         Mon 15, February 2010 10:23:35
    G             7         Mon 15, February 2010 10:45:22

我想关联到这样的内容，假设相关记录在当前记录和下一个记录之间的最大差异为 5 分钟” TS”：

FIRST_TS                            COUNT
Mon 15, February 2010 10:03:22          2
Mon 15, February 2010 10:20:21          4
Mon 15, February 2010 10:45:22          1

是否可以使用分析函数来实现这一目标？如何？

原文

I have a table in an Oracle Database that has, among others, a DATE column that is loaded with the insertion timestamp of each row. I need to use existing data in such table to analyze the correlation between some events, so that with data like this:

COL_1         COL_2         TS
    A             1         Mon 15, February 2010 10:03:22
    B             2         Mon 15, February 2010 10:05:37
    C             3         Mon 15, February 2010 10:20:21
    D             4         Mon 15, February 2010 10:20:21
    E             5         Mon 15, February 2010 10:20:24
    F             6         Mon 15, February 2010 10:23:35
    G             7         Mon 15, February 2010 10:45:22

I would like to correlate to something like this, assumming related records are between a 5-minutes max difference between current and next "TS":

FIRST_TS                            COUNT
Mon 15, February 2010 10:03:22          2
Mon 15, February 2010 10:20:21          4
Mon 15, February 2010 10:45:22          1

Is is possible to use analytic functions to achieve this? How?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

灰色世界里的红玫瑰2024-09-02 15:07:55

这会将与前一行相距不到 5 分钟的行分组在一起：

--ALTER SESSION SET nls_date_format= 'dy dd, month yyyy hh24:mi:ss';
--ALTER SESSION SET nls_date_language='ENGLISH';
SQL> WITH DATA AS (
  2  SELECT to_date('Mon 15, February 2010 10:03:22') ts FROM dual
  3  UNION ALL SELECT to_date('Mon 15, February 2010 10:05:37') FROM dual
  4  UNION ALL SELECT to_date('Mon 15, February 2010 10:20:21') FROM dual
  5  UNION ALL SELECT to_date('Mon 15, February 2010 10:20:21') FROM dual
  6  UNION ALL SELECT to_date('Mon 15, February 2010 10:20:24') FROM dual
  7  UNION ALL SELECT to_date('Mon 15, February 2010 10:23:35') FROM dual
  8  UNION ALL SELECT to_date('Mon 15, February 2010 10:45:22') FROM dual
  9  )
 10  SELECT MIN(ts) first_ts, COUNT(*) COUNT
 11    FROM (SELECT ts, SUM(gap) over(ORDER BY ts) ts_group
 12             FROM (SELECT ts,
 13                           CASE
 14                              WHEN ts - lag(ts) over(ORDER BY ts)
 15                                    <= 5 / (60 * 24) THEN
 16                               0
 17                              ELSE
 18                               1
 19                           END gap
 20                      FROM DATA))
 21  GROUP BY ts_group;

FIRST_TS                              COUNT
-------------------------------- ----------
mon 15, february  2010 10:03:22           2
mon 15, february  2010 10:20:21           4
mon 15, february  2010 10:45:22           1

This will group together rows that are less than 5 minutes distant from the previous row:

--ALTER SESSION SET nls_date_format= 'dy dd, month yyyy hh24:mi:ss';
--ALTER SESSION SET nls_date_language='ENGLISH';
SQL> WITH DATA AS (
  2  SELECT to_date('Mon 15, February 2010 10:03:22') ts FROM dual
  3  UNION ALL SELECT to_date('Mon 15, February 2010 10:05:37') FROM dual
  4  UNION ALL SELECT to_date('Mon 15, February 2010 10:20:21') FROM dual
  5  UNION ALL SELECT to_date('Mon 15, February 2010 10:20:21') FROM dual
  6  UNION ALL SELECT to_date('Mon 15, February 2010 10:20:24') FROM dual
  7  UNION ALL SELECT to_date('Mon 15, February 2010 10:23:35') FROM dual
  8  UNION ALL SELECT to_date('Mon 15, February 2010 10:45:22') FROM dual
  9  )
 10  SELECT MIN(ts) first_ts, COUNT(*) COUNT
 11    FROM (SELECT ts, SUM(gap) over(ORDER BY ts) ts_group
 12             FROM (SELECT ts,
 13                           CASE
 14                              WHEN ts - lag(ts) over(ORDER BY ts)
 15                                    <= 5 / (60 * 24) THEN
 16                               0
 17                              ELSE
 18                               1
 19                           END gap
 20                      FROM DATA))
 21  GROUP BY ts_group;

FIRST_TS                              COUNT
-------------------------------- ----------
mon 15, february  2010 10:03:22           2
mon 15, february  2010 10:20:21           4
mon 15, february  2010 10:45:22           1

回复收藏 0 原文

瑕疵2024-09-02 15:07:55

这是带有分析函数的版本。只需将您的表替换为联合子查询，我将在其中使用您的数据创建一个表：

select distinct 
  first_value(ts) over (partition by continuous_group order by ts) first_ts
  , count(ts) over (partition by continuous_group) count
from (
  select col_1, col_2, ts, sum(discontinuity) over (order by ts) continuous_group
  from (
    select col_1, col_2, ts, case when lag(ts) over (order by ts) + numtodsinterval(5,'MINUTE') <= ts then 1 else 0 end discontinuity
    from (
    select 'A' col_1, 1 col_2, to_date('2010-2-15 10:03:22', 'YYYY-MM-DD HH24:MI:SS') ts from dual
    union (
    select 'B' col_1, 2 col_2, to_date('2010-2-15 10:05:37', 'YYYY-MM-DD HH24:MI:SS') ts from dual)
    union (
    select 'C' col_1, 3 col_2, to_date('2010-2-15 10:20:21', 'YYYY-MM-DD HH24:MI:SS') ts from dual)
    union (
    select 'D' col_1, 4 col_2, to_date('2010-2-15 10:20:21', 'YYYY-MM-DD HH24:MI:SS') ts from dual)
    union (
    select 'E' col_1, 5 col_2, to_date('2010-2-15 10:20:24', 'YYYY-MM-DD HH24:MI:SS') ts from dual)
    union (
    select 'F' col_1, 6 col_2, to_date('2010-2-15 10:23:35', 'YYYY-MM-DD HH24:MI:SS') ts from dual)
    union (
    select 'G' col_1, 7 col_2, to_date('2010-2-15 10:45:22', 'YYYY-MM-DD HH24:MI:SS') ts from dual)
))
) order by first_value(ts) over (partition by continuous_group order by ts);

Here is a version with the analytic functions. Just substitute your table for the union subquery where I create a table with your data:

select distinct 
  first_value(ts) over (partition by continuous_group order by ts) first_ts
  , count(ts) over (partition by continuous_group) count
from (
  select col_1, col_2, ts, sum(discontinuity) over (order by ts) continuous_group
  from (
    select col_1, col_2, ts, case when lag(ts) over (order by ts) + numtodsinterval(5,'MINUTE') <= ts then 1 else 0 end discontinuity
    from (
    select 'A' col_1, 1 col_2, to_date('2010-2-15 10:03:22', 'YYYY-MM-DD HH24:MI:SS') ts from dual
    union (
    select 'B' col_1, 2 col_2, to_date('2010-2-15 10:05:37', 'YYYY-MM-DD HH24:MI:SS') ts from dual)
    union (
    select 'C' col_1, 3 col_2, to_date('2010-2-15 10:20:21', 'YYYY-MM-DD HH24:MI:SS') ts from dual)
    union (
    select 'D' col_1, 4 col_2, to_date('2010-2-15 10:20:21', 'YYYY-MM-DD HH24:MI:SS') ts from dual)
    union (
    select 'E' col_1, 5 col_2, to_date('2010-2-15 10:20:24', 'YYYY-MM-DD HH24:MI:SS') ts from dual)
    union (
    select 'F' col_1, 6 col_2, to_date('2010-2-15 10:23:35', 'YYYY-MM-DD HH24:MI:SS') ts from dual)
    union (
    select 'G' col_1, 7 col_2, to_date('2010-2-15 10:45:22', 'YYYY-MM-DD HH24:MI:SS') ts from dual)
))
) order by first_value(ts) over (partition by continuous_group order by ts);

回复收藏 0 原文

折戟2024-09-02 15:07:55

我认为您不需要为此进行分析，您只需要生成大约五分钟的间隔即可。以下代码使用公共表表达式（也称为子查询分解）从给定的开始日期生成五分钟的间隔。主查询使用 SUM() 和 CASE() 生成落在区间内的记录计数

这是测试数据：

SQL> select * from t23
  2  /

C       COL2 COL3
- ---------- -----------------
A          1 15-feb-2010 10:03
B          2 15-feb-2010 10:05
C          3 15-feb-2010 10:20
D          4 15-feb-2010 10:20
E          5 15-feb-2010 10:20
F          6 15-feb-2010 10:23
G          7 15-feb-2010 10:45

7 rows selected.

SQL>

这是结果

SQL> with t_range as (
  2      select to_date('15 February 2010 10:00','DD Month YYYY hh24:mi')
  3                                                + ((level-1)/288) as this_5mins
  4             , to_date('15 February 2010 10:00','DD Month YYYY hh24:mi')
  5                                                + (level/288) as next_5mins
  6      from dual
  7      connect by level <= 12
  8      )
  9  select t_range.this_5mins
 10         , sum(case when t23.col3 >= t_range.this_5mins
 11                    and t23.col3 < t_range.next_5mins
 12                    then 1
 13                    else 0 end ) as cnt
 14  from t23 cross join t_range
 15  group by t_range.this_5mins
 16  /

THIS_5MINS               CNT
----------------- ----------
15-feb-2010 10:10          0
15-feb-2010 10:20          4
15-feb-2010 10:30          0
15-feb-2010 10:05          1
15-feb-2010 10:55          0
15-feb-2010 10:15          0
15-feb-2010 10:40          0
15-feb-2010 10:45          1
15-feb-2010 10:00          1
15-feb-2010 10:35          0
15-feb-2010 10:25          0
15-feb-2010 10:50          0

12 rows selected.

SQL>

I don't think you need analytics for this, you just need to generate some five minute intervals. The following code uses a Common Table Expression (AKA sub-query factoring) to generate five minute intervals from a given start date. The main query uses SUM() and CASE() to produce a count of records which fall within the interval

Here is the test data:

SQL> select * from t23
  2  /

C       COL2 COL3
- ---------- -----------------
A          1 15-feb-2010 10:03
B          2 15-feb-2010 10:05
C          3 15-feb-2010 10:20
D          4 15-feb-2010 10:20
E          5 15-feb-2010 10:20
F          6 15-feb-2010 10:23
G          7 15-feb-2010 10:45

7 rows selected.

SQL>

And here is the outcome

SQL> with t_range as (
  2      select to_date('15 February 2010 10:00','DD Month YYYY hh24:mi')
  3                                                + ((level-1)/288) as this_5mins
  4             , to_date('15 February 2010 10:00','DD Month YYYY hh24:mi')
  5                                                + (level/288) as next_5mins
  6      from dual
  7      connect by level <= 12
  8      )
  9  select t_range.this_5mins
 10         , sum(case when t23.col3 >= t_range.this_5mins
 11                    and t23.col3 < t_range.next_5mins
 12                    then 1
 13                    else 0 end ) as cnt
 14  from t23 cross join t_range
 15  group by t_range.this_5mins
 16  /

THIS_5MINS               CNT
----------------- ----------
15-feb-2010 10:10          0
15-feb-2010 10:20          4
15-feb-2010 10:30          0
15-feb-2010 10:05          1
15-feb-2010 10:55          0
15-feb-2010 10:15          0
15-feb-2010 10:40          0
15-feb-2010 10:45          1
15-feb-2010 10:00          1
15-feb-2010 10:35          0
15-feb-2010 10:25          0
15-feb-2010 10:50          0

12 rows selected.

SQL>

回复收藏 0 原文

~没有更多了~