获取 groupby 和 nlargest 之后的行索引
我有一个大型数据框,我想使用 groupby
和 nlargest
来查找每个组的第二大、第三、第四和第五大值。我有超过 500 个组,每个组有超过 1000 个值。我在数据框中还有其他列,我想在应用groupby
和nlargest
后保留它们。我的数据框看起来像这样
df = pd.DataFrame({
'group': [1,2,3,3,4, 5,6,7,7,8],
'a': [4, 5, 3, 1, 2, 20, 10, 40, 50, 30],
'b': [20, 10, 40, 50, 30, 4, 5, 3, 1, 2],
'c': [25, 20, 5, 15, 10, 25, 20, 5, 15, 10]
})
要查找列 a
的每组的第二、第三、第四大等,我使用
secondlargest = df.groupby(['group'], as_index=False)['a'].apply(lambda grp: grp.nlargest(2).min())
它返回
group a
0 1 4
1 2 5
2 3 1
3 4 2
4 5 20
5 6 10
6 7 40
7 8 30
我需要列 b
和 c
出现在此结果数据框中。我使用以下内容对原始数据帧进行子集化,但它返回一个空数据帧。我应该如何修改代码?
secondsubset = df[df.groupby(['group'])['a'].apply(lambda grp: grp.nlargest(2).min())]
I have a large dataframe where I want to use groupby
and nlargest
to look for the second largest, third, fourth and fifth largest value of each group. I have over 500 groups and each group has over 1000 values. I also have other columns in the dataframe which I want to keep after applying groupby
and nlargest
. My dataframe looks like this
df = pd.DataFrame({
'group': [1,2,3,3,4, 5,6,7,7,8],
'a': [4, 5, 3, 1, 2, 20, 10, 40, 50, 30],
'b': [20, 10, 40, 50, 30, 4, 5, 3, 1, 2],
'c': [25, 20, 5, 15, 10, 25, 20, 5, 15, 10]
})
To look for second, third, fourth largest and so on of each group for column a
I use
secondlargest = df.groupby(['group'], as_index=False)['a'].apply(lambda grp: grp.nlargest(2).min())
which returns
group a
0 1 4
1 2 5
2 3 1
3 4 2
4 5 20
5 6 10
6 7 40
7 8 30
I need columns b
and c
present in this resulting dataframe. I use the following to subset the original dataframe but it returns an empty dataframe. How should I modify the code?
secondsubset = df[df.groupby(['group'])['a'].apply(lambda grp: grp.nlargest(2).min())]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果我正确理解你的目标,你应该能够删除
as_index=False
,使用idxmin
而不是min
,将结果传递给df.loc
:If I understand your goal correctly, you should be able to just drop
as_index=False
, useidxmin
instead ofmin
, pass the result todf.loc
:您可以使用 agg lambda。更整洁了
You can uses agg lambda. It is neater