与groupby一起使用sum()时,请保留其他列
我有下面的pandas dataframe:
df
name value1 value2 otherstuff1 otherstuff2
0 Jack 1 1 1.19 2.39
1 Jack 1 2 1.19 2.39
2 Luke 0 1 1.08 1.08
3 Mark 0 1 3.45 3.45
4 Luke 1 0 1.08 1.08
相同的名称
将具有otherstuff1
和otherstuff2
的相同值。
我正在尝试按列进行分组名称
,并总和两个列value1
和value2
。 (不是总和value1
带有value2
!而不是在每列中单独总和它们。)
期望在下面得到结果:
newdf
name value1 value2 otherstuff1 otherstuff2
0 Jack 2 3 1.19 2.39
1 Luke 1 1 1.08 1.08
2 Mark 0 1 3.45 3.45
我已经尝试了
newdf = df.groupby(['name'], as_index=False).sum()
哪些组name < /code>并总结
value1
和value2
列正确,但最终删除列otherstuff1
和otherstuff2
。
I have a pandas dataframe below:
df
name value1 value2 otherstuff1 otherstuff2
0 Jack 1 1 1.19 2.39
1 Jack 1 2 1.19 2.39
2 Luke 0 1 1.08 1.08
3 Mark 0 1 3.45 3.45
4 Luke 1 0 1.08 1.08
Same name
will have the same value for otherstuff1
and otherstuff2
.
I'm trying to group by column name
and sum both columns value1
and value2
. (Not sum value1
with value2
!!! But sum them individually in each column.)
Expecting to get result below:
newdf
name value1 value2 otherstuff1 otherstuff2
0 Jack 2 3 1.19 2.39
1 Luke 1 1 1.08 1.08
2 Mark 0 1 3.45 3.45
I've tried
newdf = df.groupby(['name'], as_index=False).sum()
which groups by name
and sums up both value1
and value2
columns correctly, but ends up dropping columns otherstuff1
and otherstuff2
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您应该指定Pandas对其他列必须做什么。就您而言,我认为您想保持一行,无论其在小组中的位置如何。
这可以在组上使用
agg
完成。agg
接受一个参数,该参数指定每个列应执行哪些操作。You should specify what pandas must do with the other columns. In your case, I think you want to keep one row, regardless of its position within the group.
This could be done with
agg
on a group.agg
accepts a parameter that specifies what operation should be performed for each column.类似?
Something like ?(Assuming you have same otherstuff1 and otherstuff2 under the same name )
这些解决方案很棒,但是当您的列太多时,您不想键入所有列名。所以这是我想到的:
现在您可以做
These solutions are great, but when you have too many columns, you do not want to type all of the column names. So here is what I came up with:
now you can simply do
上面答案中的键实际上是
as_index = false
,否则列表中的所有列都在索引中使用。The key in the answer above is actually the
as_index=False
, otherwise all the columns in the list get used in the index.