将重复的行添加在一起,不同条件的不同列?
我的DF看起来像这样(非常简单):
名称 | 我想与我的 | A | B | C |
---|---|---|---|---|
JOHN | 27 | 12 17 | 17 | 13 |
DAVID | 23 | 14 | 50 | 10 |
JOHN | 27 | 4 | 19 | 7 |
DAVID | 23 | 10 | 8 | 12 |
问题合并在一起重复名称(即同一个人)。年龄将保持不变,需要添加A和B列,但是对于CI列,必须平均两个值。
我已经尝试过:
df.agg({'a':['sum'],'b':['sum'],'c':['sean'})
,但是只需创建一个带有这些列值的新DF即可。
我对熊猫没有经验,所以我只尝试了有限的事情。
我希望结果是这样的:
A | 名称 | B | C | JOHN |
---|---|---|---|---|
27 | 16 | 36 | 10 | DAVID |
23 | 24 | 58 | 11 | 实际上 |
我还有更多的列(超过100)。我创建了需要添加,平均然后保持相同的列名的列表。
我的主要想法是做一些事情:
do_nothing = [] #lists contain column names already
add_cols = []
avg_cols = []
for i in df.columns:
if i in do_nothing:
#dont do anything
if i in add_cols:
#add cols
if i in avg_cols:
#get mean
如果我只需要一个操作,例如“总和”,我知道我可以做: print(df.groupby([“名称”,“ age”],as_index = false).sum())
,但是我不确定如何使用上述列列表使用多个操作来执行此操作。
任何建议都将不胜感激!
My df looks something like this (very simplified):
Name | Age | A | B | C |
---|---|---|---|---|
John | 27 | 12 | 17 | 13 |
David | 23 | 14 | 50 | 10 |
John | 27 | 4 | 19 | 7 |
David | 23 | 10 | 8 | 12 |
Essentially the problem I have is that I want to merge the rows with duplicate names (i.e. same person). The age would stay the same, columns A and B need to be added together but for column C I must average the two values.
I have tried:
df.agg({'A' : ['sum'], 'B' : ['sum'], 'C': ['mean']})
, but this just creates a new df with those column values.
I'm quite inexperienced with pandas so I have only tried a limited amount of things.
I would like the result to be like so:
Name | Age | A | B | C |
---|---|---|---|---|
John | 27 | 16 | 36 | 10 |
David | 23 | 24 | 58 | 11 |
In reality I have many more columns, (over 100). I have created lists of the column names which need to be added, averaged and then kept the same.
My main idea was to do something such as:
do_nothing = [] #lists contain column names already
add_cols = []
avg_cols = []
for i in df.columns:
if i in do_nothing:
#dont do anything
if i in add_cols:
#add cols
if i in avg_cols:
#get mean
If I only needed one operation e.g. 'sum' I know I could just do:print(df.groupby(["Name", "Age"], as_index=False).sum())
, but I am unsure how to do this with multiple operations using the column lists described above.
Any suggestions would be very appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您应该按名称对数据进行分组,然后为不同的列添加聚合:
输出:
You should group your data by name and then add aggregation for different columns:
Output: