按月滚动每日不同的计数分区
我有下表:
CREATE TABLE tbl (
id int NOT NULL
, date date NOT NULL
, cid int NOT NULL
);
INSERT INTO tbl VALUES
(1 , '2022-01-01', 1)
, (2 , '2022-01-01', 1)
, (3 , '2022-01-01', 2)
, (4 , '2022-01-01', 3)
, (5 , '2022-01-02', 1)
, (6 , '2022-01-02', 4)
, (7 , '2022-01-03', 5)
, (8 , '2022-01-03', 6)
, (9 , '2022-02-01', 1)
, (10, '2022-02-01', 5)
, (11, '2022-02-02', 5)
, (12, '2022-02-02', 3)
;
我每天都在尝试计算不同的用户(= CID)
,但结果是在一个月内滚动。例如,对于2022-01-01
,仅计算具有date = 2022-01-01
的不同用户。对于2022-01-02
,计数的不同用户在2022-01-01和2022-01-02
之间被计数,等等。计数应每月重新启动。
我所需的输出:
date distinct_cids
2022-01-01 3
2022-01-02 4
2022-01-03 6
2022-02-01 2
2022-02-02 3
I have the following table:
CREATE TABLE tbl (
id int NOT NULL
, date date NOT NULL
, cid int NOT NULL
);
INSERT INTO tbl VALUES
(1 , '2022-01-01', 1)
, (2 , '2022-01-01', 1)
, (3 , '2022-01-01', 2)
, (4 , '2022-01-01', 3)
, (5 , '2022-01-02', 1)
, (6 , '2022-01-02', 4)
, (7 , '2022-01-03', 5)
, (8 , '2022-01-03', 6)
, (9 , '2022-02-01', 1)
, (10, '2022-02-01', 5)
, (11, '2022-02-02', 5)
, (12, '2022-02-02', 3)
;
I'm trying to count distinct users (= cid)
each day, but the result is rolling during the month. E.g., for 2022-01-01
, only distinct users with date = 2022-01-01
are counted. For 2022-01-02
, distinct users with date between 2022-01-01 and 2022-01-02
are counted, and so on. The count should restart each month.
My desired output:
date distinct_cids
2022-01-01 3
2022-01-02 4
2022-01-03 6
2022-02-01 2
2022-02-02 3
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我无法使用雪花,所以我不能保证这会起作用,但是从声音中可以使用:
如果您有几年的数据,则可以按年,月份:
日期是一个保留的单词,所以您可以考虑重命名列
编辑:由于不允许独特,因此可以尝试一种香草SQL变体。一张大表格可能很慢:
想法是,我们创建了一个新的关系(t),其日期小于或等于当前月份的当前日期。然后,我们可以在每个日期计数这些用户。
I don't have access to snowflake so I can't guarantee that this will work, but from the sound of it:
If you have several years worth of data, you can partition by year, month:
Date is a reserved word, so you may consider renaming your column
EDIT: Since distinct is disallowed you can try a vanilla SQL variant. It is likely slow for a large table:
The idea is that we create a new relation (t) with distinct users with a date less than or equal to the current date in the current month. Then we can just count those users for each date.