熊猫:合并两个时间序列,并在这两个时间重叠的时间段内获得平均值
我得到了两个pandas dataframes:
ts1
Out[50]:
soil_moisture_ids41
date_time
2007-01-07 05:00:00 0.1830
2007-01-07 06:00:00 0.1825
2007-01-07 07:00:00 0.1825
2007-01-07 08:00:00 0.1825
2007-01-07 09:00:00 0.1825
... ...
2017-10-10 20:00:00 0.0650
2017-10-10 21:00:00 0.0650
2017-10-10 22:00:00 0.0650
2017-10-10 23:00:00 0.0650
2017-10-11 00:00:00 0.0650
[94316 rows x 3 columns]
另一个是
ts2
Out[51]:
soil_moisture_ids42
date_time
2016-07-20 00:00:00 0.147
2016-07-20 01:00:00 0.148
2016-07-20 02:00:00 0.149
2016-07-20 03:00:00 0.150
2016-07-20 04:00:00 0.152
... ...
2019-12-31 19:00:00 0.216
2019-12-31 20:00:00 0.216
2019-12-31 21:00:00 0.215
2019-12-31 22:00:00 0.215
2019-12-31 23:00:00 0.215
[30240 rows x 3 columns]
您可以看到,从2007-01-07
到2016-07-19
,只有ts1
具有数据点。从2016-07-20
到2017-10-1
1 1有一些重叠的时间序列。现在,我想结合这两个数据帧。在重叠期间,我想通过ts1
和ts2
获得平均值。在非拼写期间(2007-01-07
to 2016-07-19
和2017-10-12
to 2019-12-31
),每个时间邮票的值都设置为ts1
或ts2
的值。那我该怎么做呢?
谢谢!
I got two pandas dataframes as following:
ts1
Out[50]:
soil_moisture_ids41
date_time
2007-01-07 05:00:00 0.1830
2007-01-07 06:00:00 0.1825
2007-01-07 07:00:00 0.1825
2007-01-07 08:00:00 0.1825
2007-01-07 09:00:00 0.1825
... ...
2017-10-10 20:00:00 0.0650
2017-10-10 21:00:00 0.0650
2017-10-10 22:00:00 0.0650
2017-10-10 23:00:00 0.0650
2017-10-11 00:00:00 0.0650
[94316 rows x 3 columns]
and the other one is
ts2
Out[51]:
soil_moisture_ids42
date_time
2016-07-20 00:00:00 0.147
2016-07-20 01:00:00 0.148
2016-07-20 02:00:00 0.149
2016-07-20 03:00:00 0.150
2016-07-20 04:00:00 0.152
... ...
2019-12-31 19:00:00 0.216
2019-12-31 20:00:00 0.216
2019-12-31 21:00:00 0.215
2019-12-31 22:00:00 0.215
2019-12-31 23:00:00 0.215
[30240 rows x 3 columns]
You could see that, from 2007-01-07
to 2016-07-19
, only ts1
has the data points. And from 2016-07-20
to 2017-10-1
1 there are some overlapped time series. Now I want to combine these two data frames. During the overlapped period, I want to get the mean values over ts1
and ts2
. During the non-overlapped period, (2007-01-07
to 2016-07-19
and 2017-10-12
to 2019-12-31
), the values at each time stamp is set as the value from ts1
or ts2
. So how can I do it?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
使用 >使用汇总
平均值
,如果只有一个值获得相同的OUPUT,则如果多个GET均值
。最后,dataTimeIndex
被排序:Use
concat
with aggregatemean
, if only one value get same ouput, if multiple getmean
. Also finallyDatatimeIndex
is sorted:只需先存储串联系列,然后应用平均值即可。即
MERGED_TS = PD.CONCAT([[TS1,TS2])
,然后mean_ts = merged_ts.group_by(level = 0).mean(Mean()
>Just store the concatenated series first and then apply the mean. i.e.
merged_ts = pd.concat([ts1, ts2])
and thenmean_ts = merged_ts.group_by(level=0).mean()