Oracle - 在记录对之间进行分组

发布于 2024-08-23 20:54:54 字数 1274 浏览 8 评论 0原文

我有一个日志表，其中包含如下所示的数据：

ID     MSG                DATE
---------------------------------------------
1      TEst               2010-01-01 09:00:00
2      Job Start          2010-01-01 09:03:00
3      Do something       2010-01-01 09:03:10
4      Do something else  2010-01-01 09:03:12
5      Do something       2010-01-01 09:04:19
6      Job End            2010-01-01 09:06:30
7      Job Start          2010-01-01 09:18:03
8      Do something       2010-01-01 09:18:17
9      Do other thing     2010-01-01 09:19:48
10     Job End            2010-01-01 09:20:27

它包含（除其他外）由应用程序编写的消息。我有兴趣为所有此类“作业开始”和“作业结束”对生成“作业开始”记录和“作业结束”记录之间写入的所有内容的报告。理想情况下，报告应如下所示：（

BATCH_NUM   ID     MSG                DATE
---------------------------------------------------------         
1           3      Do something       2010-01-01 09:03:10
1           4      Do something else  2010-01-01 09:03:12
1           5      Do something       2010-01-01 09:04:19

2           8      Do something       2010-01-01 09:18:17
2           9      Do other thing     2010-01-01 09:19:48

添加批次之间的换行符以使其更易于阅读）

输出报告省略“作业开始”和“作业结束”消息，以及“TEst”消息（存在于一对之外）。

我什至不知道从哪里开始编写这种查询，或者使用 PL/SQL 是否更好，哎呀，我什至不确定是否有一个我正在尝试做的事情的正确技术术语；）

（oracle 版本是 10g）。

原文

I have a logging table that contains data that looks like this:

ID     MSG                DATE
---------------------------------------------
1      TEst               2010-01-01 09:00:00
2      Job Start          2010-01-01 09:03:00
3      Do something       2010-01-01 09:03:10
4      Do something else  2010-01-01 09:03:12
5      Do something       2010-01-01 09:04:19
6      Job End            2010-01-01 09:06:30
7      Job Start          2010-01-01 09:18:03
8      Do something       2010-01-01 09:18:17
9      Do other thing     2010-01-01 09:19:48
10     Job End            2010-01-01 09:20:27

It contains (among other things) messags written by the application. I am interested in producing a report of everything that gets written between a "Job Start" record and a "Job End" record, for all such pairs of "Job Start" and "Job End". Ideally the report would look like this:

BATCH_NUM   ID     MSG                DATE
---------------------------------------------------------         
1           3      Do something       2010-01-01 09:03:10
1           4      Do something else  2010-01-01 09:03:12
1           5      Do something       2010-01-01 09:04:19

2           8      Do something       2010-01-01 09:18:17
2           9      Do other thing     2010-01-01 09:19:48

(line break between batches added to make it easier to read)

The output report omits the "Job STart and "Job End" messages, as well as the "TEst" message (which exists outside a pair of "Job Start" and "Job End".

I'm not even sure where to begin writing this kind of query, or if it's better to go with PL/SQL for this. Heck, I'm not even sure if there's a proper technical term for what I'm trying to do. ;)

(oracle version is 10g)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

清晰传感 2024-08-30 20:54:54

我确信有一种更具分析性的方法可以做到这一点，但我用标量子查询来构建窗口。

SQL> select * from logging_table;

        ID MSG                            LOG_DT
---------- ------------------------------ -------------------
         1 TEst                           2010-01-01 09:00:00
         2 Job Start                      2010-01-01 09:03:00
         3 Do something                   2010-01-01 09:03:10
         4 Do something else              2010-01-01 09:03:12
         5 Do something                   2010-01-01 09:04:19
         6 Job End                        2010-01-01 09:06:30
         7 Job Start                      2010-01-01 09:18:03
         8 Do something                   2010-01-01 09:18:17
         9 Do other thing                 2010-01-01 09:19:48
        10 Job End                        2010-01-01 09:20:27

SQL> l
  1      select dense_rank() over (order by job_start_id) as batch, 
         -- ^-- this part gets the batch
  2             job_step_id, msg, log_dt
  3             -- nested select to filter out rows outside of the boundaries
  4        from (select *
  5                from (select id as job_step_id, msg, log_dt,
  6                             -- scalar subquery to get start of "window"
  7                             (select max(id)
  8                                from logging_table
  9                               where msg = 'Job Start'
 10                                 and id < log.id) as job_start_id,
 11                             -- scalar subquery to get end of "window"
 12                             (select min(id)
 13                                from logging_table
 14                               where msg = 'Job End'
 15                                 and id > log.id) as job_end_id
 16                       from logging_table log
 17                      -- filter out the "window" rows themselves
 18                      where msg not in ('Job Start', 'Job End')
 19                     )
 20               -- the filtering out of "unbounded" records
 21               where job_start_id is not null
 22                 and job_end_id is not null
 23             )
 24*      order by job_step_id
SQL> /

 BATCH JOB_STEP_ID MSG                            LOG_DT

     1           3 Do something                   2010-01-01 09:03:10
     1           4 Do something else              2010-01-01 09:03:12
     1           5 Do something                   2010-01-01 09:04:19
     2           8 Do something                   2010-01-01 09:18:17
     2           9 Do other thing                 2010-01-01 09:19:48

I'm sure there's a more analytic-ish way to do this, but I'm cheating with scalar subqueries to build the windows.

SQL> select * from logging_table;

        ID MSG                            LOG_DT
---------- ------------------------------ -------------------
         1 TEst                           2010-01-01 09:00:00
         2 Job Start                      2010-01-01 09:03:00
         3 Do something                   2010-01-01 09:03:10
         4 Do something else              2010-01-01 09:03:12
         5 Do something                   2010-01-01 09:04:19
         6 Job End                        2010-01-01 09:06:30
         7 Job Start                      2010-01-01 09:18:03
         8 Do something                   2010-01-01 09:18:17
         9 Do other thing                 2010-01-01 09:19:48
        10 Job End                        2010-01-01 09:20:27

SQL> l
  1      select dense_rank() over (order by job_start_id) as batch, 
         -- ^-- this part gets the batch
  2             job_step_id, msg, log_dt
  3             -- nested select to filter out rows outside of the boundaries
  4        from (select *
  5                from (select id as job_step_id, msg, log_dt,
  6                             -- scalar subquery to get start of "window"
  7                             (select max(id)
  8                                from logging_table
  9                               where msg = 'Job Start'
 10                                 and id < log.id) as job_start_id,
 11                             -- scalar subquery to get end of "window"
 12                             (select min(id)
 13                                from logging_table
 14                               where msg = 'Job End'
 15                                 and id > log.id) as job_end_id
 16                       from logging_table log
 17                      -- filter out the "window" rows themselves
 18                      where msg not in ('Job Start', 'Job End')
 19                     )
 20               -- the filtering out of "unbounded" records
 21               where job_start_id is not null
 22                 and job_end_id is not null
 23             )
 24*      order by job_step_id
SQL> /

 BATCH JOB_STEP_ID MSG                            LOG_DT

     1           3 Do something                   2010-01-01 09:03:10
     1           4 Do something else              2010-01-01 09:03:12
     1           5 Do something                   2010-01-01 09:04:19
     2           8 Do something                   2010-01-01 09:18:17
     2           9 Do other thing                 2010-01-01 09:19:48

回复收藏 0 原文

黑凤梨 2024-08-30 20:54:54

以下示例使用与 Adam Musch 的答案大致相同的想法，但在连接到日志记录语句的单个选择中获取日志记录集的开始和结束，而不是使用标量子查询。

我认为 DENSE_RANK() 不能再直接使用，因为没有可用于分区的列。

此外，这些解决方案中的任何一个都假设日志集永远不会重叠。如果第二组在第一组结束之前开始，那就是一个全新的问题......

WITH logging_sets AS
     (SELECT DENSE_RANK () OVER (ORDER BY start_date) AS set_rank, start_date, end_date
        FROM (SELECT CASE msg
                        WHEN 'Job End'
                           THEN NULL
                        ELSE LEAD (log_dt, 1, NULL) OVER (ORDER BY log_dt)
                     END AS end_date, log_dt AS start_date, msg
                FROM logging_table lt
               WHERE msg IN ('Job Start', 'Job End') )
       WHERE msg = 'Job Start')
SELECT ls.set_rank, lt.ID, lt.msg, lt.log_dt
  FROM logging_table lt, logging_sets ls
 WHERE lt.log_dt > ls.start_date AND lt.log_dt < ls.end_date 
   AND msg NOT IN ('Job Start', 'Job End')
ORDER BY ls.set_rank, lt.log_dt;

The following sample uses much the same idea as Adam Musch's answer, but gets the logging sets' start and end in a single select which is joined to the logging statement, rather than using scalar sub-queries.

I don't think DENSE_RANK() can be used any more directly because there is no column available to partition on.

Also, either of these solutions assume that the will never be overlapping sets of logs. If a second set starts before the first one ends, it's a whole new problem...

WITH logging_sets AS
     (SELECT DENSE_RANK () OVER (ORDER BY start_date) AS set_rank, start_date, end_date
        FROM (SELECT CASE msg
                        WHEN 'Job End'
                           THEN NULL
                        ELSE LEAD (log_dt, 1, NULL) OVER (ORDER BY log_dt)
                     END AS end_date, log_dt AS start_date, msg
                FROM logging_table lt
               WHERE msg IN ('Job Start', 'Job End') )
       WHERE msg = 'Job Start')
SELECT ls.set_rank, lt.ID, lt.msg, lt.log_dt
  FROM logging_table lt, logging_sets ls
 WHERE lt.log_dt > ls.start_date AND lt.log_dt < ls.end_date 
   AND msg NOT IN ('Job Start', 'Job End')
ORDER BY ls.set_rank, lt.log_dt;

回复收藏 0 原文

心碎的声音 2024-08-30 20:54:54

为了以防万一，这里有一种不使用窗口函数来完成相同技巧的方法。

with logging_table as (
    select 1 id, 'job start' msg from dual union
    select 2, 'do somenthing in batch 1' from dual union
    select 3, 'do somenthing else in batch 1' from dual union
    select 4, 'job end' from dual union
    select 5, 'job start' from dual union
    select 6, 'do somenthing in batch 2' from dual union
    select 7, 'do somenthing else in batch 2' from dual union    
    select 8, 'job end' from dual 
),
jobs as (
    select  lt_start.id id_start,
           (select min(id) 
            from logging_table lt_end 
            where lt_end.id > lt_start.id 
            and msg = 'job end') id_end,
            rownum as batch_no
    from    logging_table lt_start
    where   msg = 'job start'
)
select  *
from    logging_table join jobs 
        on id > id_start and id < id_end
order   by batch_no, id

Here's a way of doing the same trick without using window functions, just in case.

with logging_table as (
    select 1 id, 'job start' msg from dual union
    select 2, 'do somenthing in batch 1' from dual union
    select 3, 'do somenthing else in batch 1' from dual union
    select 4, 'job end' from dual union
    select 5, 'job start' from dual union
    select 6, 'do somenthing in batch 2' from dual union
    select 7, 'do somenthing else in batch 2' from dual union    
    select 8, 'job end' from dual 
),
jobs as (
    select  lt_start.id id_start,
           (select min(id) 
            from logging_table lt_end 
            where lt_end.id > lt_start.id 
            and msg = 'job end') id_end,
            rownum as batch_no
    from    logging_table lt_start
    where   msg = 'job start'
)
select  *
from    logging_table join jobs 
        on id > id_start and id < id_end
order   by batch_no, id

回复收藏 0 原文

~没有更多了~