SQL-事件以来的衰减时间，然后在下一个活动开始

发布于 2025-02-12 06:31:14 字数 2840 浏览 1 评论 0原文

已经发布了许多类似的问题和答案，但是我找不到这些差异的问题。 1）nulls的计数重新开始，2）在替换值中有一个数学函数。

事件要么发生在客户日期。可以假设客户每个日期都有一排，只有一行。

我想根据连续的null数量（事件的时间）替换为衰减功能的nulls。客户每天都可以参加活动，每天跳过，跳过多天。但是事件发生后，衰减重新开始。目前，我的衰减除以2，但这就是这样。

DT	客户	活动	想要
2022-01-01	A	1	1
2022-01-02	A	1	1
2022-01-03	A	1	1 1
2022-01-04	A	1	1
2022-01-01-01-05	A	1	1
2022-01	b	1	1
2022-01-02	B		0.5
2022-01-03	B		0.25
2022-01-04	B	1 1	1 1
2022-01-05	B		0.5

我可以产生所需的结果，但它非常笨拙。看是否有更好的方法。这将需要扩展到多个事件列。

create or replace temporary table the_data (
  dt date,
  customer char(10),
  event int,
  desired float)
;
insert into the_data values ('2022-01-01', 'a', 1, 1);
insert into the_data values ('2022-01-02', 'a', 1, 1);
insert into the_data values ('2022-01-03', 'a', 1, 1);
insert into the_data values ('2022-01-04', 'a', 1, 1);
insert into the_data values ('2022-01-05', 'a', 1, 1);

insert into the_data values ('2022-01-01', 'b', 1, 1);
insert into the_data values ('2022-01-02', 'b', NULL, 0.5);
insert into the_data values ('2022-01-03', 'b', NULL, 0.25);
insert into the_data values ('2022-01-04', 'b', 1, 1);
insert into the_data values ('2022-01-05', 'b', NULL, 0.5);

with
    base as (
      select * from the_data
    ),
    find_nan as (
      select *, case when event is null then 1 else 0 end as event_is_nan from base
    ),
    find_nan_diff as (
      select *, event_is_nan - coalesce(lag(event_is_nan) over (partition by customer order by dt), 0) as event_is_nan_diff from find_nan
    ),
    find_nan_group as (
      select *, sum(case when event_is_nan_diff = -1 then 1 else 0 end) over (partition by customer order by dt) as nan_group from find_nan_diff
    ),
    consec_nans as (
      select *, sum(event_is_nan) over (partition by customer, nan_group order by dt) as n_consec_nans from find_nan_group
    ),
    decay as (
      select *, case when n_consec_nans > 0 then 0.5 / n_consec_nans else 1 end as decay_factor from consec_nans
    ),
    ffill as (
      select *, first_value(event) over (partition by customer order by dt) as ffill_value from decay
    ),
    final as (
      select *, ffill_value * decay_factor as the_answer from ffill
    )
select * from final
order by customer, dt
;

谢谢

原文

There are many similar questions and answers already posted but I could not find one with these differences. 1) The count of NULLs starts over, and 2) there is a math function applied to the replaced value.

An event either takes place or not (NULL or 1), by date by customer. Can assume that a customer has one and only one row for every date.

I want to replace the NULLs with a decay function based on number of consecutive NULLs (time from event). A customer can have the event every day, skip a day, skip multiple days. But once the event takes place, the decay starts over. Currently my decay is divide by 2 but that is for example.

DT	CUSTOMER	EVENT	DESIRED
2022-01-01	a	1	1
2022-01-02	a	1	1
2022-01-03	a	1	1
2022-01-04	a	1	1
2022-01-05	a	1	1
2022-01-01	b	1	1
2022-01-02	b		0.5
2022-01-03	b		0.25
2022-01-04	b	1	1
2022-01-05	b		0.5

I can produce the desired result, but it is very unwieldy. Looking if there is a better way. This will need to be extended for multiple event columns.

create or replace temporary table the_data (
  dt date,
  customer char(10),
  event int,
  desired float)
;
insert into the_data values ('2022-01-01', 'a', 1, 1);
insert into the_data values ('2022-01-02', 'a', 1, 1);
insert into the_data values ('2022-01-03', 'a', 1, 1);
insert into the_data values ('2022-01-04', 'a', 1, 1);
insert into the_data values ('2022-01-05', 'a', 1, 1);

insert into the_data values ('2022-01-01', 'b', 1, 1);
insert into the_data values ('2022-01-02', 'b', NULL, 0.5);
insert into the_data values ('2022-01-03', 'b', NULL, 0.25);
insert into the_data values ('2022-01-04', 'b', 1, 1);
insert into the_data values ('2022-01-05', 'b', NULL, 0.5);

with
    base as (
      select * from the_data
    ),
    find_nan as (
      select *, case when event is null then 1 else 0 end as event_is_nan from base
    ),
    find_nan_diff as (
      select *, event_is_nan - coalesce(lag(event_is_nan) over (partition by customer order by dt), 0) as event_is_nan_diff from find_nan
    ),
    find_nan_group as (
      select *, sum(case when event_is_nan_diff = -1 then 1 else 0 end) over (partition by customer order by dt) as nan_group from find_nan_diff
    ),
    consec_nans as (
      select *, sum(event_is_nan) over (partition by customer, nan_group order by dt) as n_consec_nans from find_nan_group
    ),
    decay as (
      select *, case when n_consec_nans > 0 then 0.5 / n_consec_nans else 1 end as decay_factor from consec_nans
    ),
    ffill as (
      select *, first_value(event) over (partition by customer order by dt) as ffill_value from decay
    ),
    final as (
      select *, ffill_value * decay_factor as the_answer from ffill
    )
select * from final
order by customer, dt
;

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

娇俏 2025-02-19 06:31:18

可以通过使用 nofollow noreferrer“> condendAl_change_event_event_event ）列：

WITH cte AS (
  SELECT *, CONDITIONAL_CHANGE_EVENT(event IS NULL) OVER(PARTITION BY CUSTOMER 
                                                         ORDER BY DT) AS subgrp
  FROM the_data
)
SELECT *, COALESCE(EVENT, 0.5 / ROW_NUMBER() OVER(PARTITION BY CUSTOMER, SUBGRP 
                                                  ORDER BY DT)) AS computed_decay
FROM cte
ORDER BY CUSTOMER, DT;

输出：

编辑：

不使用conditional_change_event：

WITH cte AS (
  SELECT *, 
    CASE WHEN 
    event = LAG(event,1, event) OVER(PARTITION BY customer ORDER BY dt)
    OR (event IS NULL AND LAG(event) OVER(PARTITION BY customer ORDER BY dt) IS NULL)
    THEN 0 ELSE 1 END AS l
  FROM the_data

), cte2 AS (
  SELECT *, SUM(l) OVER(PARTITION BY customer ORDER BY dt) AS SUBGRP
  FROM cte
)
SELECT *, COALESCE(EVENT, 0.5 / ROW_NUMBER() OVER(PARTITION BY CUSTOMER, SUBGRP 
                                                  ORDER BY DT)) AS computed_decay
FROM cte2
ORDER BY CUSTOMER, DT;

The query could be simplified by using CONDITIONAL_CHANGE_EVENT to generate subgrp helper column:

WITH cte AS (
  SELECT *, CONDITIONAL_CHANGE_EVENT(event IS NULL) OVER(PARTITION BY CUSTOMER 
                                                         ORDER BY DT) AS subgrp
  FROM the_data
)
SELECT *, COALESCE(EVENT, 0.5 / ROW_NUMBER() OVER(PARTITION BY CUSTOMER, SUBGRP 
                                                  ORDER BY DT)) AS computed_decay
FROM cte
ORDER BY CUSTOMER, DT;

Output:

EDIT:

Without using CONDITIONAL_CHANGE_EVENT:

WITH cte AS (
  SELECT *, 
    CASE WHEN 
    event = LAG(event,1, event) OVER(PARTITION BY customer ORDER BY dt)
    OR (event IS NULL AND LAG(event) OVER(PARTITION BY customer ORDER BY dt) IS NULL)
    THEN 0 ELSE 1 END AS l
  FROM the_data

), cte2 AS (
  SELECT *, SUM(l) OVER(PARTITION BY customer ORDER BY dt) AS SUBGRP
  FROM cte
)
SELECT *, COALESCE(EVENT, 0.5 / ROW_NUMBER() OVER(PARTITION BY CUSTOMER, SUBGRP 
                                                  ORDER BY DT)) AS computed_decay
FROM cte2
ORDER BY CUSTOMER, DT;

db<>fiddle demo

回复收藏 0 原文

~没有更多了~