在熊猫框架内找到每个组的最大值

发布于 2025-02-07 12:34:03 字数 1942 浏览 1 评论 0原文

我确实有一个问题，希望您能给我一点支持。我在这里查看了档案馆，找到了一个解决方案，但这需要花费很多时间，不是“美丽”的，因为与循环一起使用，

假设您有一个以下框架，

System    Country_Key    Name    Bank_number_length    Check rule for bank acct no.        

PEM       AD             Andorra     8                          2
PL1       AD             Andorra     15                         5
PPE       AD             Andorra     14                         5 
P11       AD             Andorra     9                          5  
P16       AD             Andorra     12                         4

PEM       AE             Emirates     3                         5 
PL1       AE             Emirates     15                        4
PPE       AE             Emirates     15                        5
P11       AE             Emirates     15                        6
P16       AE             Emirates     13                        5

我找到了以下两个列的方法用pandas.dataframe.groupby.groupby 获取每个组的最大值但是，就我而言，我确实确实有很多列，需要为前三列“系统”，“ country_key”和“ name”设置索引，

我的需求输出将是以下内容

System    Country_Key    Name    Bank_number_length    Check rule for bank acct no.
PEM       AD           Andorra                               
PL1                                 15                        5
PPE                                                           5 
P11                                                           5  
P16                                                  

PEM       AE           Emirates                               
PL1                                 15                        
PPE                                 15                        
P11                                 15                        6
P16

，因此实际上删除了最低值，除了最大值。。任何一种提示都将是真正的好处

原文

I do have a question, hoping that you could give me a little support. I looked into the archiv here, found a solution but that's taking much time and is not "beautiful", since works with Loops

Suppose you have a following frame

System    Country_Key    Name    Bank_number_length    Check rule for bank acct no.        

PEM       AD             Andorra     8                          2
PL1       AD             Andorra     15                         5
PPE       AD             Andorra     14                         5 
P11       AD             Andorra     9                          5  
P16       AD             Andorra     12                         4

PEM       AE             Emirates     3                         5 
PL1       AE             Emirates     15                        4
PPE       AE             Emirates     15                        5
P11       AE             Emirates     15                        6
P16       AE             Emirates     13                        5

I found the following approach for two columns Get the max value from each group with pandas.DataFrame.groupby
However, in my case I do really have many columns and need to set the index for the first three columns "System", "Country_Key" and "Name"

my desire output would be the following

System    Country_Key    Name    Bank_number_length    Check rule for bank acct no.
PEM       AD           Andorra                               
PL1                                 15                        5
PPE                                                           5 
P11                                                           5  
P16                                                  

PEM       AE           Emirates                               
PL1                                 15                        
PPE                                 15                        
P11                                 15                        6
P16

So actually dropping the lowest values except the max value. Any kind of hint would be really benefical

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦境 2025-02-14 12:34:03

您可以尝试mask not max值为空字符串和mask重复的值以空字符串

keys = ['Country_Key', 'Name']
cols = ['Bank_number_length',  'Check rule for bank acct no.']
df[cols] = df[cols].mask(df[cols].ne(df.groupby(keys)[cols].transform(max)), '')
df.loc[df.duplicated(keys), keys] = ''

print(df)

  System Country_Key      Name Bank_number_length Check rule for bank acct no.
0    PEM          AD   Andorra
1    PL1                                       15                            5
2    PPE                                                                     5
3    P11                                                                     5
4    P16
5    PEM          AE  Emirates
6    PL1                                       15
7    PPE                                       15
8    P11                                       15                            6
9    P16

You can try mask the not max value to empty string and mask the duplicated values to empty string

keys = ['Country_Key', 'Name']
cols = ['Bank_number_length',  'Check rule for bank acct no.']
df[cols] = df[cols].mask(df[cols].ne(df.groupby(keys)[cols].transform(max)), '')
df.loc[df.duplicated(keys), keys] = ''

print(df)

  System Country_Key      Name Bank_number_length Check rule for bank acct no.
0    PEM          AD   Andorra
1    PL1                                       15                            5
2    PPE                                                                     5
3    P11                                                                     5
4    P16
5    PEM          AE  Emirates
6    PL1                                       15
7    PPE                                       15
8    P11                                       15                            6
9    P16

回复收藏 0 原文

~没有更多了~