创建指标等于 1 的日期范围
如何为 ids
创建日期范围的数据框,其中 indicator = 1
?
#Proxy main high frequency dataframe
main_data = [['site a', '2021-03-05 01:00:00', 1],
['site a', '2021-03-05 01:30:00', 1],
['site a', '2021-03-05 02:00:00', 0],
['site a', '2021-03-05 02:30:00', 1],
['site a', '2021-03-05 02:30:00', 1],
['site b', '2021-04-08 20:00:00', 0],
['site b', '2021-04-09 20:00:00', 1],
['site b', '2021-04-10 20:00:00', 1],
['site b', '2021-04-10 20:30:00', 1]]
# Create the pandas DataFrame
main_df = pd.DataFrame(main_data, columns = ['id', 'timestamp', 'indicator'])
main_df['timestamp'] = pd.to_datetime(main_df['timestamp'], infer_datetime_format=True)
print(main_df)
id timestamp indicator
0 site a 2021-03-05 01:00:00 1
1 site a 2021-03-05 01:30:00 1
2 site a 2021-03-05 02:00:00 0
3 site a 2021-03-05 02:30:00 1
4 site a 2021-03-05 02:30:00 1
5 site b 2021-04-08 20:00:00 0
6 site b 2021-04-09 20:00:00 1
7 site b 2021-04-10 20:00:00 1
8 site b 2021-04-10 20:30:00 1
所需的输出数据帧:
print(desired_df)
id start end
0 site a 2021-03-05 01:00:00 2021-03-05 01:30:00
1 site a 2021-03-05 02:30:00 2021-03-05 02:30:00
2 site b 2021-04-09 20:00:00 2021-04-10 20:30:00
How do I create dataframe of date ranges for ids
where indicator = 1
?
#Proxy main high frequency dataframe
main_data = [['site a', '2021-03-05 01:00:00', 1],
['site a', '2021-03-05 01:30:00', 1],
['site a', '2021-03-05 02:00:00', 0],
['site a', '2021-03-05 02:30:00', 1],
['site a', '2021-03-05 02:30:00', 1],
['site b', '2021-04-08 20:00:00', 0],
['site b', '2021-04-09 20:00:00', 1],
['site b', '2021-04-10 20:00:00', 1],
['site b', '2021-04-10 20:30:00', 1]]
# Create the pandas DataFrame
main_df = pd.DataFrame(main_data, columns = ['id', 'timestamp', 'indicator'])
main_df['timestamp'] = pd.to_datetime(main_df['timestamp'], infer_datetime_format=True)
print(main_df)
id timestamp indicator
0 site a 2021-03-05 01:00:00 1
1 site a 2021-03-05 01:30:00 1
2 site a 2021-03-05 02:00:00 0
3 site a 2021-03-05 02:30:00 1
4 site a 2021-03-05 02:30:00 1
5 site b 2021-04-08 20:00:00 0
6 site b 2021-04-09 20:00:00 1
7 site b 2021-04-10 20:00:00 1
8 site b 2021-04-10 20:30:00 1
Desired Output dataframe:
print(desired_df)
id start end
0 site a 2021-03-05 01:00:00 2021-03-05 01:30:00
1 site a 2021-03-05 02:30:00 2021-03-05 02:30:00
2 site b 2021-04-09 20:00:00 2021-04-10 20:30:00
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
IIUC:
groupby
指标列序列并记录“start”和“end”的min
和max
值。IIUC:
groupby
sequences of the indicator column and record themin
andmax
values for "start" and "end".您可以将 grouby 与命名聚合一起使用,如下所示,首先创建指标 1、ind_grp 组,其中
eq
为零,cumsum
:输出:
You can use grouby with named aggregrations like this, first create groups of indicators 1, ind_grp, with
eq
to zero andcumsum
:Output:
这是一个解决方案:
输出:
Here's a solution:
Output: