使用历史表,我如何确定任何任务处于活动的时间段,并将数据设置减少到非重叠的日期期间

发布于 2025-01-29 21:47:27 字数 6358 浏览 3 评论 0原文

我在Teradata中有一个历史表,其中包含有关帐户上发生的任务的信息。在典型的历史表方式中,存在的数据具有负载(有效),并替换(有效为)日期。

在我的环境中,当前/未终止的记录的日期为1/1/3000,所有数据负载都发生在系统中更改后的第二天午夜之后不久 - 因此,负载日期始终将开始日期拖延一天。

一些示例数据可能看起来像这样(我已经在参考行号中添加了col“ r”,表中不存在):

racct_numtask_nmtask_statstart_dtend_dt end_dtld_dt ld_dtrplc_dt lglc_dtlglc_delcurr_row
10123456TASS_01O2018-05-05-01NULL2018 NULL 2018 -05-022018-05-16NN
20123456任务_01C2018-05-012018-05-152018-05-162018-05-16NN
30123456_01N2018-08-08-16任务 152018-08-163000-01-01y yy
40123456任务_022018-05-05NULL2018-05-05-062018-05-19NNN N
50123456_02 TASK_02C任务O2018 -05-192018-08-19NN
60123456TASK_03O2020-02-01NULL2020-02-022020-05-16NN
70123456TASK_03C2020-02-012020-02-152020-02 -162020-04-16NN
80123456任务_03C2020-02-012020-02-152020-04-163000-01-01-01YY
90123456_04C任务20222-03Y3000-01-01ny

我挣扎的地方是我需要确定每个唯一的时间段,其中任何任何任务都在给定的帐户上处于活动状态。如果多个任务在重叠的时间段中处于活动状态,则我需要确定打开第一个任务的开始日期以及最后一个重叠任务的结束日期。 (Task_02也有可能在Task_01完成之前开始和结束,因此在这种类型的方案中,我需要与Task_01相关的开始/结束日期)。

使用上述示例数据,我想获得以下输出:

帐户#主动启动DTActive End DT
01234562018-05-012018-05-19
01234562020-02-02-02-01 2018-02-152018-02-15
01234562022-22-2-03 -013000-01-01

任务1开始,然后打开任务2。任务1完成了,但任务2仍然打开。因此,我需要任务1的开始日期(从第1行)和任务2的结束日期(从第4行或第5行)。后来,任务3在其自己的时间范围内打开,该时间表创建新记录,最后任务4当前打开。

我尝试了很多事情未能成功,包括但不幸的是:

  • 仅在结束日期中更新替换日期,因此只需“任务状态= open”记录,但是我不确定如何最好地使比较以解决重叠的时间表。
  • 利用引线/滞后函数来识别下一个记录加载日期,但是再次,由于任务可以按任何顺序进行,因此创建的不正确时间范围,因为我无法动态地识别我需要
  • 尝试识别打开的唯一负载日期的 下一个替换日期子查询中的任务,然后自我加入到任务表中,但这只是创建了重复,在给定的一天中,

我无法提供由于隐私限制而编写的文字代码,但是,这里是任何人的模拟代码都将提供指导,以帮助我朝着正确的方向前进:

SELECT
   ACCT_NUM
   ,TASK_NM
   ,TASK_STAT
   ,START_DT
   ,END_DT
   --,[other 'unimportant' fields that can change and create change records]
   ,LD_DT - INTERVAL '1' DAY AS LD_DT --SIMULATE START_DT
   ,RPLC_DT - INTERVAL '1' DAY AS LD_DT --SIMULATE END_DT ON "TASK_STAT = 'O'" RECORDS
   ,LGLC_DEL
   ,CURR_ROW
FROM
   TBL_TASK_HIST
WHERE
   TASK_STAT = 'O'

对不起,长期摘要,试图提供可靠/清晰的细节。

I have a history table in Teradata that contains information about tasks that occurred on an account. In typical history table fashion, the data present has a Load (Valid From) and Replace (Valid To) date.

In my environment, current/unterminated records have a 1/1/3000 date and all data loads occur shortly after midnight the day after the changes were completed in the system - so Load Dates always trail the Start Date by one day.

Some sample data might look like this (I have added Col "R" to reference row numbers, it does not exist in the table):

RACCT_NUMTASK_NMTASK_STATSTART_DTEND_DTLD_DTRPLC_DTLGLC_DELCURR_ROW
10123456TASK_01O2018-05-01NULL2018-05-022018-05-16NN
20123456TASK_01C2018-05-012018-05-152018-05-162018-08-16NN
30123456TASK_01C2018-05-012018-05-152018-08-163000-01-01YY
40123456TASK_02O2018-05-05NULL2018-05-062018-05-19NN
50123456TASK_02C2018-05-052018-05-182018-05-192018-08-19NN
60123456TASK_03O2020-02-01NULL2020-02-022020-05-16NN
70123456TASK_03C2020-02-012020-02-152020-02-162020-04-16NN
80123456TASK_03C2020-02-012020-02-152020-04-163000-01-01YY
90123456TASK_04C2022-03-01NULL2022-03-023000-01-01NY

The place where I am struggling is that I need to identify each unique time period in which any task is active on a given account. If multiple tasks were active during overlapping time periods, then I need to identify the start date for the first task opened and the end date for the last overlapping task. (It is also possible that TASK_02 could both start and finish before TASK_01 finishes, so I would need the start/end dates related to task_01 in that type of scenario).

Using the above sample data, I would want to get an output as follows:

Account #Active Start DtActive End Dt
01234562018-05-012018-05-19
01234562020-02-012018-02-15
01234562022-03-013000-01-01

Task 1 started, and then task 2 opened. Task 1 completed but task 2 was still open. So I need the Start Date of Task 1 (from Row 1) and the End Date of Task 2 (from Row 4 or 5). Later, Task 3 is opened during its own timeframe which creates a new record, and finally Task 4 is currently open.

I have tried quite a few things unsuccessfully, including but unfortunately not limited to:

  • Taking only 'Task Status = Open' records since the Replace Date would be updated with the ending date, but from here I'm unsure of how to best make the comparison to address the overlapping timeframes.
  • Utilizing Lead/Lag functions to identify the next record Load Date, but again, since tasks can occur in any order, this created incorrect timeframes since I couldn't dynamically identify the next replace date that I needed
  • Attempting to identify unique Load Dates of open tasks in a sub query, and then self joining back to the to the task table but this just created duplicates where multiple things were valid on a given day

I cannot provide the literal code I've written due to privacy restrictions, however, here is mock code for anyone will to provide guidance that could help get me going in the right direction:

SELECT
   ACCT_NUM
   ,TASK_NM
   ,TASK_STAT
   ,START_DT
   ,END_DT
   --,[other 'unimportant' fields that can change and create change records]
   ,LD_DT - INTERVAL '1' DAY AS LD_DT --SIMULATE START_DT
   ,RPLC_DT - INTERVAL '1' DAY AS LD_DT --SIMULATE END_DT ON "TASK_STAT = 'O'" RECORDS
   ,LGLC_DEL
   ,CURR_ROW
FROM
   TBL_TASK_HIST
WHERE
   TASK_STAT = 'O'

Sorry about the long summary, was trying to provide solid/clear detail.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

茶花眉 2025-02-05 21:47:27

使用Teradata的归一化,以结合重叠或相遇的时期。根据您的示例数据,这似乎有效;您可能需要稍微调整实际数据。

SELECT ACCT_NUM, 
       BEGIN(PD) AS ACTIVE_START_DT,
       PRIOR(END(PD)) AS ACTIVE_END_DT
FROM 
 (SELECT NORMALIZE ACCT_NUM,
   PERIOD(START_DT,
       NEXT(COALESCE(END_DT,
          CASE WHEN TASK_STAT='O' and CURR_ROW='N'
          THEN START_DT
          ELSE DATE'3000-01-01' 
          END       )
            )
         ) AS PD
  FROM TBL_TASK_HIST
 ) N
 ORDER BY ACCT_NUM, ACTIVE_START_DT;

归一化需要一个周期的数据类型,因此我们在内部查询中使用ofient()构造方法,begin() and and end()函数函数将外部查询中的两个列转换回两个列。由于一个期间不包括结束日期/时间限制(这是“闭合/打开间隔”),我们使用next() and prient() functions调整结束值。

也许您可以将逻辑基于ld_dt,rplc_dt,以避免必须处理null end_dt,但是如果可能的话,最好使用“业务”列,而不是ETL元数据。

Use Teradata's NORMALIZE to combine periods that overlap or meet. Based on your sample data, it appears this would work; you may need to adjust slightly for the real data.

SELECT ACCT_NUM, 
       BEGIN(PD) AS ACTIVE_START_DT,
       PRIOR(END(PD)) AS ACTIVE_END_DT
FROM 
 (SELECT NORMALIZE ACCT_NUM,
   PERIOD(START_DT,
       NEXT(COALESCE(END_DT,
          CASE WHEN TASK_STAT='O' and CURR_ROW='N'
          THEN START_DT
          ELSE DATE'3000-01-01' 
          END       )
            )
         ) AS PD
  FROM TBL_TASK_HIST
 ) N
 ORDER BY ACCT_NUM, ACTIVE_START_DT;

NORMALIZE requires a PERIOD data type so we use the PERIOD() constructor in the inner query and the BEGIN() and END() functions to convert back to two columns in the outer query. Since a PERIOD does not include the ending date/time bound (it's a "closed/open interval") we adjust the ending value with NEXT() and PRIOR() functions.

Maybe you could base the logic on LD_DT, RPLC_DT to avoid having to handle NULL END_DT, but it seems better to use the "business" columns if possible versus the ETL metadata.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文