如何通过连续日期变量检查值

发布于 2025-02-11 18:45:09 字数 804 浏览 2 评论 0原文

我在雪花中有数据库表,如果有flag_1 == 1在至少3天时,我需要检查每个客户。 FLAG_1指示该订单是否包含任何特定商品。并使用customer_id和flag_2创建新表。我真的不知道如何处理这个问题。

示例表:

CREATE TABLE TMP_TEST
(
CUSTOMER_ID INT,
ORDER_DATE DATE,
FLAG_1 INT
);


INSERT INTO TMP_TEST (CUSTOMER_ID, ORDER_DATE, FLAG_1)
VALUES
  (001, '2020-04-01', 0),
  (001, '2020-04-02', 1),
  (001, '2020-04-03', 1),
  (001, '2020-04-04', 1),
  (001, '2020-04-05', 1),
  (001, '2020-04-06', 0),
  (001, '2020-04-07', 0),
  (001, '2020-04-08', 0),
  (001, '2020-04-09', 1),
  (002, '2020-04-10', 1),
  (002, '2020-04-11', 0),
  (002, '2020-04-12', 0),
  (002, '2020-04-13', 1),
  (002, '2020-04-14', 1),
  (002, '2020-04-15', 0),
  (002, '2020-04-16', 1),
  (002, '2020-04-17', 1);

预期输出表:

CUSTOMER_ID   FLAG_2
001   1
002   0

I have database table in SNOWFLAKE, where I need check for each customer if there is FLAG_1 == 1 at minimum 3 days in row. Flag_1 indicates whether the order contained any specific goods. And create new table with customer_id and flag_2. I really don't know how to handle this problem.

Sample table:

CREATE TABLE TMP_TEST
(
CUSTOMER_ID INT,
ORDER_DATE DATE,
FLAG_1 INT
);


INSERT INTO TMP_TEST (CUSTOMER_ID, ORDER_DATE, FLAG_1)
VALUES
  (001, '2020-04-01', 0),
  (001, '2020-04-02', 1),
  (001, '2020-04-03', 1),
  (001, '2020-04-04', 1),
  (001, '2020-04-05', 1),
  (001, '2020-04-06', 0),
  (001, '2020-04-07', 0),
  (001, '2020-04-08', 0),
  (001, '2020-04-09', 1),
  (002, '2020-04-10', 1),
  (002, '2020-04-11', 0),
  (002, '2020-04-12', 0),
  (002, '2020-04-13', 1),
  (002, '2020-04-14', 1),
  (002, '2020-04-15', 0),
  (002, '2020-04-16', 1),
  (002, '2020-04-17', 1);

Expected output table:

CUSTOMER_ID   FLAG_2
001   1
002   0

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

笙痞 2025-02-18 18:45:09

也许这会有所帮助:

with calcflag as (
select customer_id, IFF( sum(flag_1) over (PARTITION by customer_id order by order_date rows between 3 preceding and 1 preceding) = 3, 1, 0 ) as new_flag 
from tmp_Test)
select customer_id, max(new_flag) flag_2
from calcflag
group by 1
order by 1;

+-------------+--------+
| CUSTOMER_ID | FLAG_2 |
+-------------+--------+
|           1 |      1 |
|           2 |      0 |
+-------------+--------+

Maybe this can be help:

with calcflag as (
select customer_id, IFF( sum(flag_1) over (PARTITION by customer_id order by order_date rows between 3 preceding and 1 preceding) = 3, 1, 0 ) as new_flag 
from tmp_Test)
select customer_id, max(new_flag) flag_2
from calcflag
group by 1
order by 1;

+-------------+--------+
| CUSTOMER_ID | FLAG_2 |
+-------------+--------+
|           1 |      1 |
|           2 |      0 |
+-------------+--------+
死开点丶别碍眼 2025-02-18 18:45:09

使用count_if也有效:

with calcflag as (
    select 
        customer_id, 
        IFF(
            count_if(flag_1 = 1) over (
                PARTITION by customer_id 
                order by order_date 
                rows between 2 preceding and current row
            ) = 3, 1, 0
        ) as new_flag 
    from tmp_Test
)
select 
    customer_id, 
    max(new_flag) flag_2
from calcflag
group by 1

+-------------+--------+
| CUSTOMER_ID | FLAG_2 |
|-------------+--------|
|           1 |      1 |
|           2 |      0 |
+-------------+--------+

using COUNT_IF also works:

with calcflag as (
    select 
        customer_id, 
        IFF(
            count_if(flag_1 = 1) over (
                PARTITION by customer_id 
                order by order_date 
                rows between 2 preceding and current row
            ) = 3, 1, 0
        ) as new_flag 
    from tmp_Test
)
select 
    customer_id, 
    max(new_flag) flag_2
from calcflag
group by 1

+-------------+--------+
| CUSTOMER_ID | FLAG_2 |
|-------------+--------|
|           1 |      1 |
|           2 |      0 |
+-------------+--------+
凝望流年 2025-02-18 18:45:09

雪花支持

要查找3个或更多事件,模式为模式(a {3,})

SELECT *
FROM TMP_TEST
MATCH_RECOGNIZE (
    PARTITION BY CUSTOMER_ID
    ORDER BY ORDER_DATE
    MEASURES MATCH_NUMBER() AS mn
    ALL ROWS PER MATCH WITH UNMATCHED ROWS
    PATTERN ( a{3,} )
    DEFINE a AS FLAG_1 = 1
) mr
ORDER BY CUSTOMER_ID, ORDER_DATE;

output:

”


每组单行倒数

SELECT CUSTOMER_ID, COALESCE(MIN(MN),0) AS FLAG_2
FROM TMP_TEST
MATCH_RECOGNIZE (
    PARTITION BY CUSTOMER_ID
    ORDER BY ORDER_DATE
    MEASURES MATCH_NUMBER() AS mn
    ALL ROWS PER MATCH WITH UNMATCHED ROWS
    PATTERN ( a{3,})
    DEFINE a AS FLAG_1 = 1
) mr
GROUP BY CUSTOMER_ID;

输出:


此解决方案的功能位于模式部分,可以在新条件下轻松扩展。例如:

PATTERN ( a b{1,2} a )
DEFINE a AS FLAG_1 = 1,
       b AS FLAT_2 = 0;

此处:查找flag = 1的序列,然后是一两个出现flag = 0的事件,并以flag = 1结束。

Snowflake supports MATCH_RECOGNIZE which is the easiest way to detect advanced patterns across multiple rows:

To find 3 or more occurences the pattern is PATTERN ( a{3,}):

SELECT *
FROM TMP_TEST
MATCH_RECOGNIZE (
    PARTITION BY CUSTOMER_ID
    ORDER BY ORDER_DATE
    MEASURES MATCH_NUMBER() AS mn
    ALL ROWS PER MATCH WITH UNMATCHED ROWS
    PATTERN ( a{3,} )
    DEFINE a AS FLAG_1 = 1
) mr
ORDER BY CUSTOMER_ID, ORDER_DATE;

Output:

enter image description here


Collapsing to single row per group:

SELECT CUSTOMER_ID, COALESCE(MIN(MN),0) AS FLAG_2
FROM TMP_TEST
MATCH_RECOGNIZE (
    PARTITION BY CUSTOMER_ID
    ORDER BY ORDER_DATE
    MEASURES MATCH_NUMBER() AS mn
    ALL ROWS PER MATCH WITH UNMATCHED ROWS
    PATTERN ( a{3,})
    DEFINE a AS FLAG_1 = 1
) mr
GROUP BY CUSTOMER_ID;

Output:

enter image description here


The power of this solution lies at the PATTERN part which could be easily extended with new conditions. For instance:

PATTERN ( a b{1,2} a )
DEFINE a AS FLAG_1 = 1,
       b AS FLAT_2 = 0;

Here: Find sequence of flag = 1, followed by one or two occurences of flag = 0 and ended by flag = 1.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文