按月滚动每日不同的计数分区

发布于 2025-02-09 15:20:13 字数 847 浏览 0 评论 0原文

我有下表：

CREATE TABLE tbl (
  id      int NOT NULL
, date    date NOT NULL
, cid     int NOT NULL
);
 
INSERT INTO tbl VALUES
  (1 , '2022-01-01', 1)
, (2 , '2022-01-01', 1)
, (3 , '2022-01-01', 2)
, (4 , '2022-01-01', 3)
, (5 , '2022-01-02', 1)
, (6 , '2022-01-02', 4)
, (7 , '2022-01-03', 5)
, (8 , '2022-01-03', 6)
, (9 , '2022-02-01', 1)
, (10, '2022-02-01', 5)
, (11, '2022-02-02', 5)
, (12, '2022-02-02', 3)
;

我每天都在尝试计算不同的用户（= CID），但结果是在一个月内滚动。例如，对于2022-01-01，仅计算具有date = 2022-01-01的不同用户。对于2022-01-02，计数的不同用户在2022-01-01和2022-01-02之间被计数，等等。计数应每月重新启动。

我所需的输出：

   date        distinct_cids
2022-01-01           3
2022-01-02           4
2022-01-03           6
2022-02-01           2
2022-02-02           3

原文

I have the following table:

CREATE TABLE tbl (
  id      int NOT NULL
, date    date NOT NULL
, cid     int NOT NULL
);
 
INSERT INTO tbl VALUES
  (1 , '2022-01-01', 1)
, (2 , '2022-01-01', 1)
, (3 , '2022-01-01', 2)
, (4 , '2022-01-01', 3)
, (5 , '2022-01-02', 1)
, (6 , '2022-01-02', 4)
, (7 , '2022-01-03', 5)
, (8 , '2022-01-03', 6)
, (9 , '2022-02-01', 1)
, (10, '2022-02-01', 5)
, (11, '2022-02-02', 5)
, (12, '2022-02-02', 3)
;

I'm trying to count distinct users (= cid) each day, but the result is rolling during the month. E.g., for 2022-01-01, only distinct users with date = 2022-01-01 are counted. For 2022-01-02, distinct users with date between 2022-01-01 and 2022-01-02 are counted, and so on. The count should restart each month.

My desired output:

   date        distinct_cids
2022-01-01           3
2022-01-02           4
2022-01-03           6
2022-02-01           2
2022-02-02           3

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

筱果果 2025-02-16 15:20:13

我无法使用雪花，所以我不能保证这会起作用，但是从声音中可以使用：

select date, count(distinct cid) over (partition by month(date) order by date) 
from tbl
order by date;

如果您有几年的数据，则可以按年，月份：

select date, count(distinct cid) over (partition by year(date), month(date) order by date) 
from tbl
order by date;

日期是一个保留的单词，所以您可以考虑重命名列

编辑：由于不允许独特，因此可以尝试一种香草SQL变体。一张大表格可能很慢：

select dt, count(cid)
from (
    select distinct dt.dt, x.cid
    from tbl x
    join (
        select distinct date as dt from tbl
    ) dt (dt)
        on x.date <= dt.dt 
        and month(x.date) = month(dt.dt)
) t
group by dt
order by dt   
;

想法是，我们创建了一个新的关系（t），其日期小于或等于当前月份的当前日期。然后，我们可以在每个日期计数这些用户。

I don't have access to snowflake so I can't guarantee that this will work, but from the sound of it:

select date, count(distinct cid) over (partition by month(date) order by date) 
from tbl
order by date;

If you have several years worth of data, you can partition by year, month:

select date, count(distinct cid) over (partition by year(date), month(date) order by date) 
from tbl
order by date;

Date is a reserved word, so you may consider renaming your column

EDIT: Since distinct is disallowed you can try a vanilla SQL variant. It is likely slow for a large table:

select dt, count(cid)
from (
    select distinct dt.dt, x.cid
    from tbl x
    join (
        select distinct date as dt from tbl
    ) dt (dt)
        on x.date <= dt.dt 
        and month(x.date) = month(dt.dt)
) t
group by dt
order by dt   
;

The idea is that we create a new relation (t) with distinct users with a date less than or equal to the current date in the current month. Then we can just count those users for each date.

回复收藏 0 原文

~没有更多了~