Pandas:在左外连接后用另一个日期帧中的值填充 NaN 值,并且没有公共行
考虑以下两个数据帧:
df1:
time p_value
2022-03-12 00:00:00+00:00 38743.7
2022-03-12 00:05:00+00:00 38802
2022-03-12 00:10:00+00:00 38842
2022-03-12 00:15:00+00:00 38865.9
2022-03-12 00:20:00+00:00 38882.1
2022-03-12 23:35:00+00:00 38965
2022-03-12 23:40:00+00:00 38933.2
2022-03-12 23:45:00+00:00 38878.4
2022-03-12 23:50:00+00:00 38861.3
2022-03-12 23:55:00+00:00 38803.9
df2:
time b_value
2022-03-12 00:27:12+00:00 13
2022-03-12 00:29:26+00:00 14
2022-03-12 00:31:37+00:00 15
2022-03-12 23:19:38+00:00 16
2022-03-12 23:33:10+00:00 17
2022-03-12 23:41:52+00:00 18
df1
包含特定日期的完整范围的日期时间索引,分辨率为 5 分钟
;即它有 24*60/5 = 288
记录。
df2
的索引低于 df1
df1
df2
的索引低于 df1
288,它们可能都与 df1 中的索引不同。
问题是我们如何合并这两个数据框,以便对于 df2 的每个索引,使用 df1 中最接近它的索引值。
# Left outer join
desired_df = df2.merge(df1, on='time', how='left')
换句话说,我们如何根据另一个数据帧中的值填充 NaN 值。
所需的 df:
time new_col
2022-03-12 00:27:12+00:00 38882.1
2022-03-12 00:29:26+00:00 38882.1
2022-03-12 00:31:37+00:00 38882.1
2022-03-12 23:19:38+00:00 38965
2022-03-12 23:33:10+00:00 38965
2022-03-12 23:41:52+00:00 38933.2
另一个想法是,对于 df2 中的每个索引,我们可以计算靠近该索引的两个连续索引之间的平均值。例如,索引 2022-03-12 23:41:52+00:00
对应的值将是 2022-03-12 23:40 的值的平均值: 00+00:00
和 2022-03-12 23:45:00+00:00
。
在 JonClements 发表评论后,这是一个非常简单的解决方案!
pd.merge_asof(left=df2, right=df1, on='time')
Consider the two following dataframes:
df1:
time p_value
2022-03-12 00:00:00+00:00 38743.7
2022-03-12 00:05:00+00:00 38802
2022-03-12 00:10:00+00:00 38842
2022-03-12 00:15:00+00:00 38865.9
2022-03-12 00:20:00+00:00 38882.1
2022-03-12 23:35:00+00:00 38965
2022-03-12 23:40:00+00:00 38933.2
2022-03-12 23:45:00+00:00 38878.4
2022-03-12 23:50:00+00:00 38861.3
2022-03-12 23:55:00+00:00 38803.9
df2:
time b_value
2022-03-12 00:27:12+00:00 13
2022-03-12 00:29:26+00:00 14
2022-03-12 00:31:37+00:00 15
2022-03-12 23:19:38+00:00 16
2022-03-12 23:33:10+00:00 17
2022-03-12 23:41:52+00:00 18
df1
contains full ranges of datetime indices with resolution 5 minutes
for a specific day; i.e. it has 24*60/5 = 288
records.
df2
has lower indices than df1
< 288 and they may all be different from indices in the df1
.
The question is how we can merge these two data frames such that for each index of df2
, the value of the nearest index to it from df1
is used.
# Left outer join
desired_df = df2.merge(df1, on='time', how='left')
In other words how we can fill NaN
values according to the values in another dataframe.
desired df:
time new_col
2022-03-12 00:27:12+00:00 38882.1
2022-03-12 00:29:26+00:00 38882.1
2022-03-12 00:31:37+00:00 38882.1
2022-03-12 23:19:38+00:00 38965
2022-03-12 23:33:10+00:00 38965
2022-03-12 23:41:52+00:00 38933.2
Another idea is, for each index in df2
we can compute the mean value between two consecutive indexes which are near that index. For eaxmple, the value correspond to the the index 2022-03-12 23:41:52+00:00
would be the mean of values of 2022-03-12 23:40:00+00:00
and 2022-03-12 23:45:00+00:00
.
After JonClements's comment, here is a pretty simple solution!
pd.merge_asof(left=df2, right=df1, on='time')
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以尝试使用 pandas 插值
You could try with pandas interpolate