如何使用最近的时间到另一个数据框？

发布于 2025-02-09 08:36:53 字数 1662 浏览 1 评论 0 原文

对于这个项目，我有两个数据范围，一个称为 df1 ，另一个称为 df2 。这些数据范围的大小不是相同的（不要认为这很重要）。

他们每个人在第一列中都有 datetime 。我要做的是：

我想制作一个新的dataframe df3 ，其中包含 df2 的剩余数据，但仅用于第一列中的原始时间最接近 df1 。

以下是数据框的外观示例：

 print (df1)
 Output:
                 Date       Val
 0 2015-02-24 00:00:02  1.764052
 1 2015-02-24 00:01:15  0.400157
 2 2015-02-24 00:02:22  0.978738
 3 2015-02-24 00:03:39  2.240893
 4 2015-02-24 00:04:00  1.867558

 print (df2)
 Output:
                 Date       Val      Name  
 0 2015-02-24 00:00:00  -0.977278    John
 1 2015-02-24 00:01:00   0.950088    Robert
 2 2015-02-24 00:02:00  -0.103219    Sam
 3 2015-02-24 00:03:00   0.151357    Tim
 4 2015-02-24 00:04:00   0.410599    Hector 
 5 2015-02-24 00:05:00   0.673247    Melissa

那么我想要检索的内容就是这样：

print (df2)
 Output:
                 Date       Val      Name  
 0 2015-02-24 00:00:00  -0.977278    John
 1 2015-02-24 00:01:00   0.950088    Robert
 2 2015-02-24 00:02:00  -0.103219    Sam
 3 2015-02-24 00:04:00   0.410599    Hector
 4 2015-02-24 00:04:00   0.410599    Hector 
 5 2015-02-24 00:05:00   0.673247    Melissa

我已经搜索了一些搜索，发现这里有两个类似的帖子 example-1 example-2 ，但是区别在于他们只需要返回单个值或单个行。出于我的目的，我希望它可以“过滤”。

如果有人能提供任何见解，这将不胜感激，谢谢。

原文

For this project, I have two dataframes one called df1 and another called df2. These dataframes are not the same size (don't think that matters).

Each of them have a datetime in the first column. What I am trying to do is:

I want to make a new dataframe df3 that contains the remaining data from the df2 but only for the when the original times in the first column are closest to the df1.

Here is an example of what the dataframes might look like:

 print (df1)
 Output:
                 Date       Val
 0 2015-02-24 00:00:02  1.764052
 1 2015-02-24 00:01:15  0.400157
 2 2015-02-24 00:02:22  0.978738
 3 2015-02-24 00:03:39  2.240893
 4 2015-02-24 00:04:00  1.867558

 print (df2)
 Output:
                 Date       Val      Name  
 0 2015-02-24 00:00:00  -0.977278    John
 1 2015-02-24 00:01:00   0.950088    Robert
 2 2015-02-24 00:02:00  -0.103219    Sam
 3 2015-02-24 00:03:00   0.151357    Tim
 4 2015-02-24 00:04:00   0.410599    Hector 
 5 2015-02-24 00:05:00   0.673247    Melissa

Then what I want to be able to retrieve is something like this:

print (df2)
 Output:
                 Date       Val      Name  
 0 2015-02-24 00:00:00  -0.977278    John
 1 2015-02-24 00:01:00   0.950088    Robert
 2 2015-02-24 00:02:00  -0.103219    Sam
 3 2015-02-24 00:04:00   0.410599    Hector
 4 2015-02-24 00:04:00   0.410599    Hector 
 5 2015-02-24 00:05:00   0.673247    Melissa

I have searched around a bit and found that there were two similar posts on here example-1 example-2, but the difference being they just want a single value returned or a single row. For my purposes I want it to be 'Filtered' so to speak.

If anyone can provide any insight, that would be greatly appreciated, thank you.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

橘亓 2025-02-16 08:36:53

如果我正确理解，我相信这会给您您想要的东西。

df1['df2_idx'] = df1.Date.apply(lambda x: [(abs(df2['Date'] - x)).idxmin()][0])
df3 = df2.reindex(df1['df2_idx'], axis=0).reset_index().drop(['df2_idx'], axis=1)

第一行刚刚在 df2 中找到该行，该行与 df1 中的每一行最接近，并将索引作为列作为列以 df1 之类的列附加。

                 Date       Val  df2_idx
0 2015-02-24 00:00:02  1.764052        0
1 2015-02-24 00:01:15  0.400157        1
2 2015-02-24 00:02:22  0.978738        2
3 2015-02-24 00:03:39  2.240893        4
4 2015-02-24 00:04:00  1.867558        4

然后，第二行只需子集 df2 带有索引列。最终输出是

                 Date       Val    Name
0 2015-02-24 00:00:00 -0.977278    John
1 2015-02-24 00:01:00  0.950088  Robert
2 2015-02-24 00:02:00 -0.103219     Sam
3 2015-02-24 00:04:00  0.410599  Hector
4 2015-02-24 00:04:00  0.410599  Hector

If I understand correctly, I believe this gives you what you're looking for.

df1['df2_idx'] = df1.Date.apply(lambda x: [(abs(df2['Date'] - x)).idxmin()][0])
df3 = df2.reindex(df1['df2_idx'], axis=0).reset_index().drop(['df2_idx'], axis=1)

The first line just finds the row in df2 that is closest to each row in df1 and appends the indices as a column to df1 like this:

                 Date       Val  df2_idx
0 2015-02-24 00:00:02  1.764052        0
1 2015-02-24 00:01:15  0.400157        1
2 2015-02-24 00:02:22  0.978738        2
3 2015-02-24 00:03:39  2.240893        4
4 2015-02-24 00:04:00  1.867558        4

The second row then just subsets df2 with the column of indices. The final output is

                 Date       Val    Name
0 2015-02-24 00:00:00 -0.977278    John
1 2015-02-24 00:01:00  0.950088  Robert
2 2015-02-24 00:02:00 -0.103219     Sam
3 2015-02-24 00:04:00  0.410599  Hector
4 2015-02-24 00:04:00  0.410599  Hector

回复收藏 0 原文

~没有更多了~