按月滚动每日不同的计数分区

发布于 2025-02-09 15:20:13 字数 847 浏览 0 评论 0原文

我有下表:

CREATE TABLE tbl (
  id      int NOT NULL
, date    date NOT NULL
, cid     int NOT NULL
);
 
INSERT INTO tbl VALUES
  (1 , '2022-01-01', 1)
, (2 , '2022-01-01', 1)
, (3 , '2022-01-01', 2)
, (4 , '2022-01-01', 3)
, (5 , '2022-01-02', 1)
, (6 , '2022-01-02', 4)
, (7 , '2022-01-03', 5)
, (8 , '2022-01-03', 6)
, (9 , '2022-02-01', 1)
, (10, '2022-02-01', 5)
, (11, '2022-02-02', 5)
, (12, '2022-02-02', 3)
;

我每天都在尝试计算不同的用户(= CID),但结果是在一个月内滚动。例如,对于2022-01-01,仅计算具有date = 2022-01-01的不同用户。对于2022-01-02,计数的不同用户在2022-01-01和2022-01-02之间被计数,等等。计数应每月重新启动。

我所需的输出:

   date        distinct_cids
2022-01-01           3
2022-01-02           4
2022-01-03           6
2022-02-01           2
2022-02-02           3

I have the following table:

CREATE TABLE tbl (
  id      int NOT NULL
, date    date NOT NULL
, cid     int NOT NULL
);
 
INSERT INTO tbl VALUES
  (1 , '2022-01-01', 1)
, (2 , '2022-01-01', 1)
, (3 , '2022-01-01', 2)
, (4 , '2022-01-01', 3)
, (5 , '2022-01-02', 1)
, (6 , '2022-01-02', 4)
, (7 , '2022-01-03', 5)
, (8 , '2022-01-03', 6)
, (9 , '2022-02-01', 1)
, (10, '2022-02-01', 5)
, (11, '2022-02-02', 5)
, (12, '2022-02-02', 3)
;

I'm trying to count distinct users (= cid) each day, but the result is rolling during the month. E.g., for 2022-01-01, only distinct users with date = 2022-01-01 are counted. For 2022-01-02, distinct users with date between 2022-01-01 and 2022-01-02 are counted, and so on. The count should restart each month.

My desired output:

   date        distinct_cids
2022-01-01           3
2022-01-02           4
2022-01-03           6
2022-02-01           2
2022-02-02           3

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

筱果果 2025-02-16 15:20:13

我无法使用雪花,所以我不能保证这会起作用,但是从声音中可以使用:

select date, count(distinct cid) over (partition by month(date) order by date) 
from tbl
order by date;

如果您有几年的数据,则可以按年,月份:

select date, count(distinct cid) over (partition by year(date), month(date) order by date) 
from tbl
order by date;

日期是一个保​​留的单词,所以您可以考虑重命名列

编辑:由于不允许独特,因此可以尝试一种香草SQL变体。一张大表格可能很慢:

select dt, count(cid)
from (
    select distinct dt.dt, x.cid
    from tbl x
    join (
        select distinct date as dt from tbl
    ) dt (dt)
        on x.date <= dt.dt 
        and month(x.date) = month(dt.dt)
) t
group by dt
order by dt   
;

想法是,我们创建了一个新的关系(t),其日期小于或等于当前月份的当前日期。然后,我们可以在每个日期计数这些用户。

I don't have access to snowflake so I can't guarantee that this will work, but from the sound of it:

select date, count(distinct cid) over (partition by month(date) order by date) 
from tbl
order by date;

If you have several years worth of data, you can partition by year, month:

select date, count(distinct cid) over (partition by year(date), month(date) order by date) 
from tbl
order by date;

Date is a reserved word, so you may consider renaming your column

EDIT: Since distinct is disallowed you can try a vanilla SQL variant. It is likely slow for a large table:

select dt, count(cid)
from (
    select distinct dt.dt, x.cid
    from tbl x
    join (
        select distinct date as dt from tbl
    ) dt (dt)
        on x.date <= dt.dt 
        and month(x.date) = month(dt.dt)
) t
group by dt
order by dt   
;

The idea is that we create a new relation (t) with distinct users with a date less than or equal to the current date in the current month. Then we can just count those users for each date.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文