在保留数据框时，在列中选择第二个（或nth）最小值

发布于 2025-02-01 04:26:53 字数 478 浏览 2 评论 0原文

我正在使用此代码在给定df的列中选择最小的行（从在这里）：

data = pd.DataFrame({'A': [1,1,1,2,2,2], 'B':[4,5,2,7,4,6], 'C':[3,4,10,2,4,6]})
min_value = data.groupby('A').B.min()
data = data.merge(min_value, on='A',suffixes=('', '_min'))
data = data[data.B==data.B_min].drop('B_min', axis=1)

我想修改它，以便我获得该列的第二个（或nth）最低值。

原文

I am using this code to select the smallest row in a column of a given df (got this appraoch from here):

data = pd.DataFrame({'A': [1,1,1,2,2,2], 'B':[4,5,2,7,4,6], 'C':[3,4,10,2,4,6]})
min_value = data.groupby('A').B.min()
data = data.merge(min_value, on='A',suffixes=('', '_min'))
data = data[data.B==data.B_min].drop('B_min', axis=1)

I would like to modify this such that I get the 2nd (or nth) lowest value for that column.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

猫九 2025-02-08 04:26:53

您可以找到每个A和滤波器数据的最低b。

data = pd.DataFrame({'A': [1,1,1,2,2,2], 'B':[4,5,2,7,4,6], 'C':[3,4,10,2,4,6]})
# sort data
data = data.sort_values(by=['A','B'])
# transform the 2nd lowest (n=1) for the row and filter
data = data[data['B'] == data.groupby('A')['B'].transform('nth', 1)]
print(data)

   A  B  C
0  1  4  3
5  2  6  6

您可以通过将等级转换为ARG来选择任何nth。

You can find the nth lowest B per A and filter data.

data = pd.DataFrame({'A': [1,1,1,2,2,2], 'B':[4,5,2,7,4,6], 'C':[3,4,10,2,4,6]})
# sort data
data = data.sort_values(by=['A','B'])
# transform the 2nd lowest (n=1) for the row and filter
data = data[data['B'] == data.groupby('A')['B'].transform('nth', 1)]
print(data)

   A  B  C
0  1  4  3
5  2  6  6

You can select any nth by passing the rank to transform as arg.

回复收藏 0 原文

嘿看小鸭子会跑 2025-02-08 04:26:53

尝试：

print(
    data.groupby("A", as_index=False).apply(
        lambda x: x.sort_values(by="B").iloc[1]
    )
)

打印：

   A  B  C
0  1  4  3
1  2  6  6

Try:

print(
    data.groupby("A", as_index=False).apply(
        lambda x: x.sort_values(by="B").iloc[1]
    )
)

Prints:

   A  B  C
0  1  4  3
1  2  6  6

回复收藏 0 原文

情愿 2025-02-08 04:26:53

如果您的数据很大，则可以避免对数据进行排序（这很昂贵），而是使用iDxmin的组合（如您所引用的解决方案所示）和nsmallest：：

grouper = data.groupby('A').B
# get the minimum
minimum = grouper.idxmin()
# get the nsmallest rows (2 in this case)
smallest_2 = grouper.nsmallest(2).index.droplevel(0)
# alternative is smallest_2.difference(minimum)
smallest_2 = smallest_2[~smallest_2.isin(minimum)]
data.loc[smallest_2]

   A  B  C
0  1  4  3
5  2  6  6

If your data is large, you could avoid sorting the data(which can be expensive), and instead use a combination of idxmin ( as shown in the solution you referenced) and nsmallest:

grouper = data.groupby('A').B
# get the minimum
minimum = grouper.idxmin()
# get the nsmallest rows (2 in this case)
smallest_2 = grouper.nsmallest(2).index.droplevel(0)
# alternative is smallest_2.difference(minimum)
smallest_2 = smallest_2[~smallest_2.isin(minimum)]
data.loc[smallest_2]

   A  B  C
0  1  4  3
5  2  6  6

回复收藏 0 原文

~没有更多了~