将重复的行添加在一起,不同条件的不同列?

发布于 2025-02-10 23:36:11 字数 1446 浏览 0 评论 0原文

我的DF看起来像这样(非常简单):

名称我想与我的ABC
JOHN2712 171713
DAVID23145010
JOHN274197
DAVID2310812

问题合并在一起重复名称(即同一个人)。年龄将保持不变,需要添加A和B列,但是对于CI列,必须平均两个值。

我已经尝试过:

df.agg({'a':['sum'],'b':['sum'],'c':['sean'}),但是只需创建一个带有这些列值的新DF即可。

我对熊猫没有经验,所以我只尝试了有限的事情。

我希望结果是这样的:

A名称BCJOHN
27163610DAVID
23245811实际上

我还有更多的列(超过100)。我创建了需要添加,平均然后保持相同的列名的列表。

我的主要想法是做一些事情:

do_nothing = [] #lists contain column names already
add_cols = []
avg_cols = []

for i in df.columns:
 if i in do_nothing:
    #dont do anything
 if i in add_cols:
    #add cols
 if i in avg_cols:
    #get mean

如果我只需要一个操作,例如“总和”,我知道我可以做: print(df.groupby([“名称”,“ age”],as_index = false).sum()),但是我不确定如何使用上述列列表使用多个操作来执行此操作。

任何建议都将不胜感激!

My df looks something like this (very simplified):

NameAgeABC
John27121713
David23145010
John274197
David2310812

Essentially the problem I have is that I want to merge the rows with duplicate names (i.e. same person). The age would stay the same, columns A and B need to be added together but for column C I must average the two values.

I have tried:

df.agg({'A' : ['sum'], 'B' : ['sum'], 'C': ['mean']}), but this just creates a new df with those column values.

I'm quite inexperienced with pandas so I have only tried a limited amount of things.

I would like the result to be like so:

NameAgeABC
John27163610
David23245811

In reality I have many more columns, (over 100). I have created lists of the column names which need to be added, averaged and then kept the same.

My main idea was to do something such as:

do_nothing = [] #lists contain column names already
add_cols = []
avg_cols = []

for i in df.columns:
 if i in do_nothing:
    #dont do anything
 if i in add_cols:
    #add cols
 if i in avg_cols:
    #get mean

If I only needed one operation e.g. 'sum' I know I could just do:
print(df.groupby(["Name", "Age"], as_index=False).sum()), but I am unsure how to do this with multiple operations using the column lists described above.

Any suggestions would be very appreciated!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

时光是把杀猪刀 2025-02-17 23:36:11

您应该按名称对数据进行分组,然后为不同的列添加聚合:

(df.groupby('Name', as_index=False, sort=False)
   .agg({'Age': 'first', 'A': sum, 'B': sum, 'C': 'mean'})
)

输出:

     Name  Age   A   B     C
0    John   27  16  36  10.0
1   David   23  24  58  11.0

You should group your data by name and then add aggregation for different columns:

(df.groupby('Name', as_index=False, sort=False)
   .agg({'Age': 'first', 'A': sum, 'B': sum, 'C': 'mean'})
)

Output:

     Name  Age   A   B     C
0    John   27  16  36  10.0
1   David   23  24  58  11.0
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文