基于多个列值标准创建等级列

发布于 2025-01-19 21:21:40 字数 863 浏览 3 评论 0原文

假设我有以下数据框架，

   Num1  Num2  Num3
   123    75    43
   123    72    32
   123    72    37
   123    73    41
   456    72    23
   456    75    25
   456    73    21
   456    73    27

我需要创建另一列，称为rank。预期的输出是

 Num1  Num2  Num3    rank
   123    75    43    1
   123    72    32    3
   123    72    37    2
   123    73    41    4
   456    72    23    6
   456    75    25    5
   456    73    21    8
   456    73    27    7

逻辑是：对于每个num1，请检查num2，如果是75对于73，必须是第三名。对于打路机案例，检查num3，将根据较大的数字给出优先级。

我的想法是sort降低，但将在num3列上使用，而不是num2。我已经创建

df['tcolun'] = df.apply(lambda row: 1 if row['Num2'] == 75 else (2 if row['Num2'] == 72 else 3), axis = 1)

但无法正确使用它。

原文

Suppose I have the following dataframe

   Num1  Num2  Num3
   123    75    43
   123    72    32
   123    72    37
   123    73    41
   456    72    23
   456    75    25
   456    73    21
   456    73    27

I need to create another column called rank. The expected output would be

 Num1  Num2  Num3    rank
   123    75    43    1
   123    72    32    3
   123    72    37    2
   123    73    41    4
   456    72    23    6
   456    75    25    5
   456    73    21    8
   456    73    27    7

The logic is: for each Num1, check the Num2, if it is 75, give them 1st priority, if it is 72, give it 2nd and for 73, it has to be 3rd. For tie breaker case, check Num3, priority will be given based on the larger number.

My thought was to sort it down, but will work on the Num3 column not on Num2.
I have created

df['tcolun'] = df.apply(lambda row: 1 if row['Num2'] == 75 else (2 if row['Num2'] == 72 else 3), axis = 1)

But unable to use it properly.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

清音悠歌 2025-01-26 21:21:40

IIUC，为 Num2 制作映射字典，然后应用排序逻辑并使用索引上的 numpy.argsort：

import numpy as np

order = [75,72,73]
d = {k:v for v,k in enumerate(order, 1)}
# {75: 1, 72: 2, 73: 3}

df['rank'] = np.argsort(df.assign(Num2=df['Num2'].map(d))
                          .sort_values(by=['Num1', 'Num2', 'Num3'],
                                       ascending=[True, True, False]).index)+1

输出：

   Num1  Num2  Num3  rank
0   123    75    43     1
1   123    72    32     3
2   123    72    37     2
3   123    73    41     4
4   456    72    23     6
5   456    75    25     5
6   456    73    21     8
7   456    73    27     7

与 groupby+ngroup，利用 groupby 默认情况下对组进行快速排序的优势：

order = [75,72,73]
d = {k:v for v,k in enumerate(order, 1)}

df.assign(Num2=df['Num2'].map(d),
          Num3=-df['Num3']  # ensure Num3 will sort in reverse order
         ).groupby(['Num1', 'Num2', 'Num3']).ngroup().add(1)

IIUC, craft a mapping dictionary for Num2, then apply your sorting logic and use numpy.argsort on the index:

import numpy as np

order = [75,72,73]
d = {k:v for v,k in enumerate(order, 1)}
# {75: 1, 72: 2, 73: 3}

df['rank'] = np.argsort(df.assign(Num2=df['Num2'].map(d))
                          .sort_values(by=['Num1', 'Num2', 'Num3'],
                                       ascending=[True, True, False]).index)+1

output:

   Num1  Num2  Num3  rank
0   123    75    43     1
1   123    72    32     3
2   123    72    37     2
3   123    73    41     4
4   456    72    23     6
5   456    75    25     5
6   456    73    21     8
7   456    73    27     7

Alternative with groupby+ngroup, taking advantage of the fast that groupby sorts the groups by default:

order = [75,72,73]
d = {k:v for v,k in enumerate(order, 1)}

df.assign(Num2=df['Num2'].map(d),
          Num3=-df['Num3']  # ensure Num3 will sort in reverse order
         ).groupby(['Num1', 'Num2', 'Num3']).ngroup().add(1)

回复收藏 0 原文

~没有更多了~