Python、pandas 根据另一个不同大小的数据帧的结果对一个数据帧进行分组
我有两个数据帧,df_DD 携带我的所有数据,df_GS 携带我想要将 df_DD 分解为的数据范围。 df_GS 比 df_DD 短得多,我想在范围相等时将所有 df_DD 按 df_GS 分组。
小范围的 df_GS
From To DHID
0 69.0 88.5 CR22-200
1 88.5 90.0 CR22-200
2 90.0 99.0 CR22-200
3 99.0 100.5 CR22-200
4 100.5 112.5 CR22-200
5 112.5 114.0 CR22-200
6 114.0 165.0 CR22-200
for i in range(len(df_GS)):
df_DD['Samples'].loc[(df_DD[From] >= df_GS[From].iloc[i]) & (df_DD[To] <= df_GS[To].iloc[i]) & (df_DD[DHID]==df_GS[DHID].iloc[i])] = i+1
这是 df_DD 的输出
Samples From To DHID
0 1 69.0 70.5 CR22-200
1 1 70.5 72.0 CR22-200
2 1 72.0 73.5 CR22-200
3 1 73.5 75.0 CR22-200
4 1 75.0 76.5 CR22-200
5 1 76.5 78.0 CR22-200
6 1 78.0 79.5 CR22-200
7 1 79.5 81.0 CR22-200
8 1 81.0 82.5 CR22-200
9 1 82.5 84.0 CR22-200
10 1 84.0 85.5 CR22-200
11 1 85.5 87.0 CR22-200
12 1 87.0 88.5 CR22-200
13 2 88.5 90.0 CR22-200
14 3 90.0 91.5 CR22-200
15 3 91.5 93.0 CR22-200
上面的代码通过创建一个名为 Samples 的新列为值提供样本索引来实现我想要的功能,之后我可以使用 groupby 函数。但我想知道是否有更好的方法来做到这一点,因为它很麻烦。
I have two dataframes, df_DD carries all my data, and df_GS carries the ranges of data that I want to break df_DD into. df_GS is much shorter than df_DD, I want to group all the df_DD by df_GS for when the ranges are equated.
Small range of df_GS
From To DHID
0 69.0 88.5 CR22-200
1 88.5 90.0 CR22-200
2 90.0 99.0 CR22-200
3 99.0 100.5 CR22-200
4 100.5 112.5 CR22-200
5 112.5 114.0 CR22-200
6 114.0 165.0 CR22-200
for i in range(len(df_GS)):
df_DD['Samples'].loc[(df_DD[From] >= df_GS[From].iloc[i]) & (df_DD[To] <= df_GS[To].iloc[i]) & (df_DD[DHID]==df_GS[DHID].iloc[i])] = i+1
Here is an output of df_DD
Samples From To DHID
0 1 69.0 70.5 CR22-200
1 1 70.5 72.0 CR22-200
2 1 72.0 73.5 CR22-200
3 1 73.5 75.0 CR22-200
4 1 75.0 76.5 CR22-200
5 1 76.5 78.0 CR22-200
6 1 78.0 79.5 CR22-200
7 1 79.5 81.0 CR22-200
8 1 81.0 82.5 CR22-200
9 1 82.5 84.0 CR22-200
10 1 84.0 85.5 CR22-200
11 1 85.5 87.0 CR22-200
12 1 87.0 88.5 CR22-200
13 2 88.5 90.0 CR22-200
14 3 90.0 91.5 CR22-200
15 3 91.5 93.0 CR22-200
The code above does what I want it to by creating a new column named Samples giving values a sample index, after which I can use the groupby function. But I wanted to know if there was a better way to do this cause it's quite cumbersome.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用 merge_asof 你可以合并最近的键而不是精确的键。
默认方向是
向后
,因此它会合并df
的行,其中from
是小于或等于的最接近值df_DD
的from
列。此代码为您提供结果:
可选注释
您可以保留原始的
to
列或添加一个新列,并且您还可以在末尾设置所需的列顺序,如下所示:这将为您提供:
With merge_asof you can merge on the closest key instead of an exact key.
The default direction is
backward
, so it mergesdf
's row wherefrom
is the closest value that is less or equal to thedf_DD
'sfrom
column.This code gives you the result:
Optional notes
You can keep the original
to
column or add a new one and also you can set the column order you want at the end like this:That gives you: