根据14天的间隔创建组

发布于 2025-01-18 04:06:07 字数 2180 浏览 1 评论 0原文

我知道这是一个常见问题,但我找不到符合我的情况的东西。我有以下数据:

+---------+---------+
| user_id | view_dt |
+---------+---------+
| A       |     1/1 |
+---------+---------+
| A       |    1/10 |
+---------+---------+
| A       |    1/14 |
+---------+---------+
| A       |    1/22 |
+---------+---------+
| A       |    1/23 |
+---------+---------+
| A       |    1/30 |
+---------+---------+

我希望根据 14 天的间隔对这些数据进行分组。也就是说,组将是:

GROUP 1: 1/1, 1/10, 1/14

GROUP 2: 1/22, 1/23, 1/30

请注意,1/30 日期应该属于 GROUP 2,因为1/30 应与第 2 组的第一个日期 (1/22) 进行比较,而不是 (1/1)。

我遇到的问题是我自己的查询显示 1/30 属于组 3。

CREATE TABLE T (
    user_id VARCHAR(20),
    view_dt DATETIME
);

INSERT INTO t VALUES ('A', '2022-01-01');
INSERT INTO t VALUES ('A', '2022-01-10');
INSERT INTO t VALUES ('A', '2022-01-14');
INSERT INTO t VALUES ('A', '2022-01-22');
INSERT INTO t VALUES ('A', '2022-01-23');
INSERT INTO t VALUES ('A', '2022-01-30');

SELECT user_id,
    view_dt,
    DENSE_RANK() OVER(ORDER BY gr) grp
FROM (
SELECT 
    user_id,
    view_dt,
    CAST (view_dt - MIN (view_dt) OVER (PARTITION BY user_id ORDER BY view_dt) AS INT )/14 + 1 AS gr
    FROM T
) x
ORDER BY user_id

理想输出

+---------+---------+-------+
| user_id | view_dt | group |
+---------+---------+-------+
| A       |     1/1 |     1 |
+---------+---------+-------+
| A       |    1/10 |     1 |
+---------+---------+-------+
| A       |    1/14 |     1 |
+---------+---------+-------+
| A       |    1/22 |     2 |
+---------+---------+-------+
| A       |    1/23 |     2 |
+---------+---------+-------+
| A       |    1/30 | 2     |
+---------+---------+-------+

我之前的查询的输出:

+---------+---------+-------+
| user_id | view_dt | group |
+---------+---------+-------+
| A       |     1/1 |     1 |
+---------+---------+-------+
| A       |    1/10 |     1 |
+---------+---------+-------+
| A       |    1/14 |     1 |
+---------+---------+-------+
| A       |    1/22 |     2 |
+---------+---------+-------+
| A       |    1/23 |     2 |
+---------+---------+-------+
| A       |    1/30 | 3**   |
+---------+---------+-------+

I know this is a common question, but I couldn't find something that matches my case. I have this data:

+---------+---------+
| user_id | view_dt |
+---------+---------+
| A       |     1/1 |
+---------+---------+
| A       |    1/10 |
+---------+---------+
| A       |    1/14 |
+---------+---------+
| A       |    1/22 |
+---------+---------+
| A       |    1/23 |
+---------+---------+
| A       |    1/30 |
+---------+---------+

I want this data to be grouped based on a 14 day interval. That is, the groups would be:

GROUP 1: 1/1, 1/10, 1/14

GROUP 2: 1/22, 1/23, 1/30

Note that I the 1/30 date should belong to GROUP 2 since 1/30 should be compared to the first date of GROUP 2 (1/22) instead of (1/1).

The problem I am having is my own query shows 1/30 belonging to group 3.

CREATE TABLE T (
    user_id VARCHAR(20),
    view_dt DATETIME
);

INSERT INTO t VALUES ('A', '2022-01-01');
INSERT INTO t VALUES ('A', '2022-01-10');
INSERT INTO t VALUES ('A', '2022-01-14');
INSERT INTO t VALUES ('A', '2022-01-22');
INSERT INTO t VALUES ('A', '2022-01-23');
INSERT INTO t VALUES ('A', '2022-01-30');

SELECT user_id,
    view_dt,
    DENSE_RANK() OVER(ORDER BY gr) grp
FROM (
SELECT 
    user_id,
    view_dt,
    CAST (view_dt - MIN (view_dt) OVER (PARTITION BY user_id ORDER BY view_dt) AS INT )/14 + 1 AS gr
    FROM T
) x
ORDER BY user_id

Ideal Output

+---------+---------+-------+
| user_id | view_dt | group |
+---------+---------+-------+
| A       |     1/1 |     1 |
+---------+---------+-------+
| A       |    1/10 |     1 |
+---------+---------+-------+
| A       |    1/14 |     1 |
+---------+---------+-------+
| A       |    1/22 |     2 |
+---------+---------+-------+
| A       |    1/23 |     2 |
+---------+---------+-------+
| A       |    1/30 | 2     |
+---------+---------+-------+

Output from my previous query:

+---------+---------+-------+
| user_id | view_dt | group |
+---------+---------+-------+
| A       |     1/1 |     1 |
+---------+---------+-------+
| A       |    1/10 |     1 |
+---------+---------+-------+
| A       |    1/14 |     1 |
+---------+---------+-------+
| A       |    1/22 |     2 |
+---------+---------+-------+
| A       |    1/23 |     2 |
+---------+---------+-------+
| A       |    1/30 | 3**   |
+---------+---------+-------+

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

三寸金莲 2025-01-25 04:06:07

一种选择是使用递归 CTE

-- Recursive CTE solution
with cte as
(
    -- CTE for adding a row_number
    select rn = row_number() over (partition by user_id order by view_dt),
           user_id, view_dt
    from   T
),
rcte as
(
    -- RCTE - anchor member
    -- first_dt is the first date of the group
    select rn, user_id, view_dt, grp = 1, first_dt = view_dt
    from   cte
    where  rn = 1
    
    union all
    
    -- RCTE - recursive member
    -- if date is more than 14 days from first_dt, grp + 1, update first_dt
    select c.rn, c.user_id, c.view_dt,
           grp = case when datediff(day, r.first_dt, c.view_dt) > 14
                      then r.grp + 1
                      else r.grp
                      end,
           first_dt = case when datediff(day, r.first_dt, c.view_dt) > 14
                      then c.view_dt
                      else r.first_dt
                      end
    from   cte c
           inner join rcte r on  c.user_id = r.user_id
                             and c.rn      = r.rn + 1
)
select *
from   rcte

注意:请避免在日期上使用算术运算符。 view_dt - MIN (view_dt)。应该使用datediff()
请参阅坏习惯Kick :使用简写进行日期/时间操作

One option is to use Recursive CTE

-- Recursive CTE solution
with cte as
(
    -- CTE for adding a row_number
    select rn = row_number() over (partition by user_id order by view_dt),
           user_id, view_dt
    from   T
),
rcte as
(
    -- RCTE - anchor member
    -- first_dt is the first date of the group
    select rn, user_id, view_dt, grp = 1, first_dt = view_dt
    from   cte
    where  rn = 1
    
    union all
    
    -- RCTE - recursive member
    -- if date is more than 14 days from first_dt, grp + 1, update first_dt
    select c.rn, c.user_id, c.view_dt,
           grp = case when datediff(day, r.first_dt, c.view_dt) > 14
                      then r.grp + 1
                      else r.grp
                      end,
           first_dt = case when datediff(day, r.first_dt, c.view_dt) > 14
                      then c.view_dt
                      else r.first_dt
                      end
    from   cte c
           inner join rcte r on  c.user_id = r.user_id
                             and c.rn      = r.rn + 1
)
select *
from   rcte

Note : Please do avoid using arithmetic operator on date. view_dt - MIN (view_dt). Should use datediff()
See Bad Habits to Kick : Using shorthand with date/time operations

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文