SQL-事件以来的衰减时间,然后在下一个活动开始
已经发布了许多类似的问题和答案,但是我找不到这些差异的问题。 1)nulls的计数重新开始,2)在替换值中有一个数学函数。
事件要么发生在客户日期。可以假设客户每个日期都有一排,只有一行。
我想根据连续的null数量(事件的时间)替换为衰减功能的nulls。客户每天都可以参加活动,每天跳过,跳过多天。但是事件发生后,衰减重新开始。目前,我的衰减除以2,但这就是这样。
DT | 客户 | 活动 | 想要 |
---|---|---|---|
2022-01-01 | A | 1 | 1 |
2022-01-02 | A | 1 | 1 |
2022-01-03 | A | 1 | 1 1 |
2022-01-04 | A | 1 | 1 |
2022-01-01-01-05 | A | 1 | 1 |
2022-01 | b | 1 | 1 |
2022-01-02 | B | 0.5 | |
2022-01-03 | B | 0.25 | |
2022-01-04 | B | 1 1 | 1 1 |
2022-01-05 | B | 0.5 |
我可以产生所需的结果,但它非常笨拙。看是否有更好的方法。这将需要扩展到多个事件列。
create or replace temporary table the_data (
dt date,
customer char(10),
event int,
desired float)
;
insert into the_data values ('2022-01-01', 'a', 1, 1);
insert into the_data values ('2022-01-02', 'a', 1, 1);
insert into the_data values ('2022-01-03', 'a', 1, 1);
insert into the_data values ('2022-01-04', 'a', 1, 1);
insert into the_data values ('2022-01-05', 'a', 1, 1);
insert into the_data values ('2022-01-01', 'b', 1, 1);
insert into the_data values ('2022-01-02', 'b', NULL, 0.5);
insert into the_data values ('2022-01-03', 'b', NULL, 0.25);
insert into the_data values ('2022-01-04', 'b', 1, 1);
insert into the_data values ('2022-01-05', 'b', NULL, 0.5);
with
base as (
select * from the_data
),
find_nan as (
select *, case when event is null then 1 else 0 end as event_is_nan from base
),
find_nan_diff as (
select *, event_is_nan - coalesce(lag(event_is_nan) over (partition by customer order by dt), 0) as event_is_nan_diff from find_nan
),
find_nan_group as (
select *, sum(case when event_is_nan_diff = -1 then 1 else 0 end) over (partition by customer order by dt) as nan_group from find_nan_diff
),
consec_nans as (
select *, sum(event_is_nan) over (partition by customer, nan_group order by dt) as n_consec_nans from find_nan_group
),
decay as (
select *, case when n_consec_nans > 0 then 0.5 / n_consec_nans else 1 end as decay_factor from consec_nans
),
ffill as (
select *, first_value(event) over (partition by customer order by dt) as ffill_value from decay
),
final as (
select *, ffill_value * decay_factor as the_answer from ffill
)
select * from final
order by customer, dt
;
谢谢
There are many similar questions and answers already posted but I could not find one with these differences. 1) The count of NULLs starts over, and 2) there is a math function applied to the replaced value.
An event either takes place or not (NULL or 1), by date by customer. Can assume that a customer has one and only one row for every date.
I want to replace the NULLs with a decay function based on number of consecutive NULLs (time from event). A customer can have the event every day, skip a day, skip multiple days. But once the event takes place, the decay starts over. Currently my decay is divide by 2 but that is for example.
DT | CUSTOMER | EVENT | DESIRED |
---|---|---|---|
2022-01-01 | a | 1 | 1 |
2022-01-02 | a | 1 | 1 |
2022-01-03 | a | 1 | 1 |
2022-01-04 | a | 1 | 1 |
2022-01-05 | a | 1 | 1 |
2022-01-01 | b | 1 | 1 |
2022-01-02 | b | 0.5 | |
2022-01-03 | b | 0.25 | |
2022-01-04 | b | 1 | 1 |
2022-01-05 | b | 0.5 |
I can produce the desired result, but it is very unwieldy. Looking if there is a better way. This will need to be extended for multiple event columns.
create or replace temporary table the_data (
dt date,
customer char(10),
event int,
desired float)
;
insert into the_data values ('2022-01-01', 'a', 1, 1);
insert into the_data values ('2022-01-02', 'a', 1, 1);
insert into the_data values ('2022-01-03', 'a', 1, 1);
insert into the_data values ('2022-01-04', 'a', 1, 1);
insert into the_data values ('2022-01-05', 'a', 1, 1);
insert into the_data values ('2022-01-01', 'b', 1, 1);
insert into the_data values ('2022-01-02', 'b', NULL, 0.5);
insert into the_data values ('2022-01-03', 'b', NULL, 0.25);
insert into the_data values ('2022-01-04', 'b', 1, 1);
insert into the_data values ('2022-01-05', 'b', NULL, 0.5);
with
base as (
select * from the_data
),
find_nan as (
select *, case when event is null then 1 else 0 end as event_is_nan from base
),
find_nan_diff as (
select *, event_is_nan - coalesce(lag(event_is_nan) over (partition by customer order by dt), 0) as event_is_nan_diff from find_nan
),
find_nan_group as (
select *, sum(case when event_is_nan_diff = -1 then 1 else 0 end) over (partition by customer order by dt) as nan_group from find_nan_diff
),
consec_nans as (
select *, sum(event_is_nan) over (partition by customer, nan_group order by dt) as n_consec_nans from find_nan_group
),
decay as (
select *, case when n_consec_nans > 0 then 0.5 / n_consec_nans else 1 end as decay_factor from consec_nans
),
ffill as (
select *, first_value(event) over (partition by customer order by dt) as ffill_value from decay
),
final as (
select *, ffill_value * decay_factor as the_answer from ffill
)
select * from final
order by customer, dt
;
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
可以通过使用 nofollow noreferrer“> condendAl_change_event_event_event )列:
输出:
编辑:
不使用
conditional_change_event
:The query could be simplified by using CONDITIONAL_CHANGE_EVENT to generate subgrp helper column:
Output:
EDIT:
Without using
CONDITIONAL_CHANGE_EVENT
:db<>fiddle demo