SQL-事件以来的衰减时间,然后在下一个活动开始

发布于 2025-02-12 06:31:14 字数 2840 浏览 1 评论 0原文

已经发布了许多类似的问题和答案,但是我找不到这些差异的问题。 1)nulls的计数重新开始,2)在替换值中有一个数学函数。

事件要么发生在客户日期。可以假设客户每个日期都有一排,只有一行。

我想根据连续的null数量(事件的时间)替换为衰减功能的nulls。客户每天都可以参加活动,每天跳过,跳过多天。但是事件发生后,衰减重新开始。目前,我的衰减除以2,但这就是这样。

DT客户活动想要
2022-01-01A11
2022-01-02A11
2022-01-03A11 1
2022-01-04A11
2022-01-01-01-05A11
2022-01b11
2022-01-02B0.5
2022-01-03B0.25
2022-01-04B1 11 1
2022-01-05B0.5

我可以产生所需的结果,但它非常笨拙。看是否有更好的方法。这将需要扩展到多个事件列。

create or replace temporary table the_data (
  dt date,
  customer char(10),
  event int,
  desired float)
;
insert into the_data values ('2022-01-01', 'a', 1, 1);
insert into the_data values ('2022-01-02', 'a', 1, 1);
insert into the_data values ('2022-01-03', 'a', 1, 1);
insert into the_data values ('2022-01-04', 'a', 1, 1);
insert into the_data values ('2022-01-05', 'a', 1, 1);

insert into the_data values ('2022-01-01', 'b', 1, 1);
insert into the_data values ('2022-01-02', 'b', NULL, 0.5);
insert into the_data values ('2022-01-03', 'b', NULL, 0.25);
insert into the_data values ('2022-01-04', 'b', 1, 1);
insert into the_data values ('2022-01-05', 'b', NULL, 0.5);

with
    base as (
      select * from the_data
    ),
    find_nan as (
      select *, case when event is null then 1 else 0 end as event_is_nan from base
    ),
    find_nan_diff as (
      select *, event_is_nan - coalesce(lag(event_is_nan) over (partition by customer order by dt), 0) as event_is_nan_diff from find_nan
    ),
    find_nan_group as (
      select *, sum(case when event_is_nan_diff = -1 then 1 else 0 end) over (partition by customer order by dt) as nan_group from find_nan_diff
    ),
    consec_nans as (
      select *, sum(event_is_nan) over (partition by customer, nan_group order by dt) as n_consec_nans from find_nan_group
    ),
    decay as (
      select *, case when n_consec_nans > 0 then 0.5 / n_consec_nans else 1 end as decay_factor from consec_nans
    ),
    ffill as (
      select *, first_value(event) over (partition by customer order by dt) as ffill_value from decay
    ),
    final as (
      select *, ffill_value * decay_factor as the_answer from ffill
    )
select * from final
order by customer, dt
;  

谢谢

There are many similar questions and answers already posted but I could not find one with these differences. 1) The count of NULLs starts over, and 2) there is a math function applied to the replaced value.

An event either takes place or not (NULL or 1), by date by customer. Can assume that a customer has one and only one row for every date.

I want to replace the NULLs with a decay function based on number of consecutive NULLs (time from event). A customer can have the event every day, skip a day, skip multiple days. But once the event takes place, the decay starts over. Currently my decay is divide by 2 but that is for example.

DTCUSTOMEREVENTDESIRED
2022-01-01a11
2022-01-02a11
2022-01-03a11
2022-01-04a11
2022-01-05a11
2022-01-01b11
2022-01-02b0.5
2022-01-03b0.25
2022-01-04b11
2022-01-05b0.5

I can produce the desired result, but it is very unwieldy. Looking if there is a better way. This will need to be extended for multiple event columns.

create or replace temporary table the_data (
  dt date,
  customer char(10),
  event int,
  desired float)
;
insert into the_data values ('2022-01-01', 'a', 1, 1);
insert into the_data values ('2022-01-02', 'a', 1, 1);
insert into the_data values ('2022-01-03', 'a', 1, 1);
insert into the_data values ('2022-01-04', 'a', 1, 1);
insert into the_data values ('2022-01-05', 'a', 1, 1);

insert into the_data values ('2022-01-01', 'b', 1, 1);
insert into the_data values ('2022-01-02', 'b', NULL, 0.5);
insert into the_data values ('2022-01-03', 'b', NULL, 0.25);
insert into the_data values ('2022-01-04', 'b', 1, 1);
insert into the_data values ('2022-01-05', 'b', NULL, 0.5);

with
    base as (
      select * from the_data
    ),
    find_nan as (
      select *, case when event is null then 1 else 0 end as event_is_nan from base
    ),
    find_nan_diff as (
      select *, event_is_nan - coalesce(lag(event_is_nan) over (partition by customer order by dt), 0) as event_is_nan_diff from find_nan
    ),
    find_nan_group as (
      select *, sum(case when event_is_nan_diff = -1 then 1 else 0 end) over (partition by customer order by dt) as nan_group from find_nan_diff
    ),
    consec_nans as (
      select *, sum(event_is_nan) over (partition by customer, nan_group order by dt) as n_consec_nans from find_nan_group
    ),
    decay as (
      select *, case when n_consec_nans > 0 then 0.5 / n_consec_nans else 1 end as decay_factor from consec_nans
    ),
    ffill as (
      select *, first_value(event) over (partition by customer order by dt) as ffill_value from decay
    ),
    final as (
      select *, ffill_value * decay_factor as the_answer from ffill
    )
select * from final
order by customer, dt
;  

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

娇俏 2025-02-19 06:31:18

可以通过使用 nofollow noreferrer“> condendAl_change_event_event_event )列:

WITH cte AS (
  SELECT *, CONDITIONAL_CHANGE_EVENT(event IS NULL) OVER(PARTITION BY CUSTOMER 
                                                         ORDER BY DT) AS subgrp
  FROM the_data
)
SELECT *, COALESCE(EVENT, 0.5 / ROW_NUMBER() OVER(PARTITION BY CUSTOMER, SUBGRP 
                                                  ORDER BY DT)) AS computed_decay
FROM cte
ORDER BY CUSTOMER, DT;

输出:


编辑:

不使用conditional_change_event

WITH cte AS (
  SELECT *, 
    CASE WHEN 
    event = LAG(event,1, event) OVER(PARTITION BY customer ORDER BY dt)
    OR (event IS NULL AND LAG(event) OVER(PARTITION BY customer ORDER BY dt) IS NULL)
    THEN 0 ELSE 1 END AS l
  FROM the_data

), cte2 AS (
  SELECT *, SUM(l) OVER(PARTITION BY customer ORDER BY dt) AS SUBGRP
  FROM cte
)
SELECT *, COALESCE(EVENT, 0.5 / ROW_NUMBER() OVER(PARTITION BY CUSTOMER, SUBGRP 
                                                  ORDER BY DT)) AS computed_decay
FROM cte2
ORDER BY CUSTOMER, DT;

The query could be simplified by using CONDITIONAL_CHANGE_EVENT to generate subgrp helper column:

WITH cte AS (
  SELECT *, CONDITIONAL_CHANGE_EVENT(event IS NULL) OVER(PARTITION BY CUSTOMER 
                                                         ORDER BY DT) AS subgrp
  FROM the_data
)
SELECT *, COALESCE(EVENT, 0.5 / ROW_NUMBER() OVER(PARTITION BY CUSTOMER, SUBGRP 
                                                  ORDER BY DT)) AS computed_decay
FROM cte
ORDER BY CUSTOMER, DT;

Output:

enter image description here


EDIT:

Without using CONDITIONAL_CHANGE_EVENT:

WITH cte AS (
  SELECT *, 
    CASE WHEN 
    event = LAG(event,1, event) OVER(PARTITION BY customer ORDER BY dt)
    OR (event IS NULL AND LAG(event) OVER(PARTITION BY customer ORDER BY dt) IS NULL)
    THEN 0 ELSE 1 END AS l
  FROM the_data

), cte2 AS (
  SELECT *, SUM(l) OVER(PARTITION BY customer ORDER BY dt) AS SUBGRP
  FROM cte
)
SELECT *, COALESCE(EVENT, 0.5 / ROW_NUMBER() OVER(PARTITION BY CUSTOMER, SUBGRP 
                                                  ORDER BY DT)) AS computed_decay
FROM cte2
ORDER BY CUSTOMER, DT;

db<>fiddle demo

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文