根据重叠的活动时间间隔(有效起始时间和有效终止时间)对 SQL 行进行分组
我正在使用这个模拟数据在bigquery中工作:
create schema if not exists dbo;
create table if not exists dbo.player_history(team_id INT, player_id INT, active_from TIMESTAMP, active_to TIMESTAMP);
truncate table dbo.player_history;
INSERT INTO dbo.player_history VALUES(1,1,'2020-01-01', '2020-01-08');
INSERT INTO dbo.player_history VALUES(1,2,'2020-06-01', '2020-09-08');
INSERT INTO dbo.player_history VALUES(1,3,'2020-06-10', '2020-10-01');
INSERT INTO dbo.player_history VALUES(1,4,'2020-02-01', '2020-02-15');
INSERT INTO dbo.player_history VALUES(1,5,'2021-01-01', '2021-01-08');
INSERT INTO dbo.player_history VALUES(1,6,'2021-01-02', '2021-06-08');
INSERT INTO dbo.player_history VALUES(1,7,'2021-01-03', '2021-06-08');
INSERT INTO dbo.player_history VALUES(1,8,'2021-01-04', '2021-06-08');
INSERT INTO dbo.player_history VALUES(1,9,'2020-01-02', '2021-02-05');
INSERT INTO dbo.player_history VALUES(1,10,'2020-10-01', '2021-04-08');
INSERT INTO dbo.player_history VALUES(1,11,'2020-11-01', '2021-05-08');
select *
from dbo.player_history
order by 3, 4
我想要得到的是活跃的阵容。输出如下所示:
我几乎已经使用 valid_to 和 valid_from 之间的某种引导(valid_from)破解了它,并且做了一个案例,如果它是新阵容 0,则将其设置为 1,否则,然后执行某种累积总和来获取 ID,但我无法 100% 解决它......我非常绝望,不知道该去哪里寻找了。
**更正:阵容 4 & 5实际上应该只是一个阵容。
I'm working in bigquery with this mock data:
create schema if not exists dbo;
create table if not exists dbo.player_history(team_id INT, player_id INT, active_from TIMESTAMP, active_to TIMESTAMP);
truncate table dbo.player_history;
INSERT INTO dbo.player_history VALUES(1,1,'2020-01-01', '2020-01-08');
INSERT INTO dbo.player_history VALUES(1,2,'2020-06-01', '2020-09-08');
INSERT INTO dbo.player_history VALUES(1,3,'2020-06-10', '2020-10-01');
INSERT INTO dbo.player_history VALUES(1,4,'2020-02-01', '2020-02-15');
INSERT INTO dbo.player_history VALUES(1,5,'2021-01-01', '2021-01-08');
INSERT INTO dbo.player_history VALUES(1,6,'2021-01-02', '2021-06-08');
INSERT INTO dbo.player_history VALUES(1,7,'2021-01-03', '2021-06-08');
INSERT INTO dbo.player_history VALUES(1,8,'2021-01-04', '2021-06-08');
INSERT INTO dbo.player_history VALUES(1,9,'2020-01-02', '2021-02-05');
INSERT INTO dbo.player_history VALUES(1,10,'2020-10-01', '2021-04-08');
INSERT INTO dbo.player_history VALUES(1,11,'2020-11-01', '2021-05-08');
select *
from dbo.player_history
order by 3, 4
and what I want to get out is the active lineups. The output would look like so:
With the logic behind it being:
I've almost cracked it using some sort of lead(valid_from) between valid_to and valid_from and, doing a case when to make it 1 if its a new lineup 0 otherwise, and then doing some sort of cumulative sum on that to get the ID but I'm not able to solve it 100%... I'm very desperate, don't know where to look anymore.
**correction: lineup 4 & 5 should actually just be one lineup.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
鉴于我们在评论部分中讨论过,一个玩家可以属于多个阵容,您可以使用
JOIN
尝试以下方法:因为输出对我来说太长通过 Bigquery 控制台中的屏幕截图向您展示,我将结果提取到 Google 表格中。请参阅下面的输出屏幕截图:
Given that a player can belong to multiple line up as we discussed in the comment section, you might try the approach below using
JOIN
:Since the output is too long for me to show you via screenshot in Bigquery console, I extracted the results to Google sheets. See below screenshot of output: