与groupby一起使用sum（）时，请保留其他列

发布于 2025-02-13 09:53:41 字数 1122 浏览 1 评论 0原文

我有下面的pandas dataframe：

    df

    name    value1    value2  otherstuff1 otherstuff2 
0   Jack       1         1       1.19        2.39     
1   Jack       1         2       1.19        2.39
2   Luke       0         1       1.08        1.08  
3   Mark       0         1       3.45        3.45
4   Luke       1         0       1.08        1.08

相同的名称将具有otherstuff1和otherstuff2的相同值。

我正在尝试按列进行分组名称，并总和两个列value1和value2。（不是总和value1带有value2！而不是在每列中单独总和它们。）

期望在下面得到结果：

    newdf

    name    value1    value2  otherstuff1 otherstuff2 
0   Jack       2         3       1.19        2.39     
1   Luke       1         1       1.08        1.08  
2   Mark       0         1       3.45        3.45

我已经尝试了

newdf = df.groupby(['name'], as_index=False).sum()

哪些组name < /code>并总结value1和value2列正确，但最终删除列otherstuff1和otherstuff2。

原文

I have a pandas dataframe below:

    df

    name    value1    value2  otherstuff1 otherstuff2 
0   Jack       1         1       1.19        2.39     
1   Jack       1         2       1.19        2.39
2   Luke       0         1       1.08        1.08  
3   Mark       0         1       3.45        3.45
4   Luke       1         0       1.08        1.08

Same name will have the same value for otherstuff1 and otherstuff2.

I'm trying to group by column name and sum both columns value1 and value2. (Not sum value1 with value2!!! But sum them individually in each column.)

Expecting to get result below:

    newdf

    name    value1    value2  otherstuff1 otherstuff2 
0   Jack       2         3       1.19        2.39     
1   Luke       1         1       1.08        1.08  
2   Mark       0         1       3.45        3.45

I've tried

newdf = df.groupby(['name'], as_index=False).sum()

which groups by name and sums up both value1 and value2 columns correctly, but ends up dropping columns otherstuff1 and otherstuff2.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

萌逼全场 2025-02-20 09:53:41

您应该指定Pandas对其他列必须做什么。就您而言，我认为您想保持一行，无论其在小组中的位置如何。

这可以在组上使用agg完成。 agg接受一个参数，该参数指定每个列应执行哪些操作。

df.groupby(['name'], as_index=False).agg({'value1': 'sum', 'value2': 'sum', 'otherstuff1': 'first', 'otherstuff2': 'first'})

You should specify what pandas must do with the other columns. In your case, I think you want to keep one row, regardless of its position within the group.

This could be done with agg on a group. agg accepts a parameter that specifies what operation should be performed for each column.

df.groupby(['name'], as_index=False).agg({'value1': 'sum', 'value2': 'sum', 'otherstuff1': 'first', 'otherstuff2': 'first'})

回复收藏 0 原文

痴情 2025-02-20 09:53:41

类似？

df.groupby(['name','otherstuff1','otherstuff2'],as_index=False).sum()
Out[121]: 
   name  otherstuff1  otherstuff2  value1  value2
0  Jack         1.19         2.39       2       3
1  Luke         1.08         1.08       1       1
2  Mark         3.45         3.45       0       1

Something like ?(Assuming you have same otherstuff1 and otherstuff2 under the same name )

df.groupby(['name','otherstuff1','otherstuff2'],as_index=False).sum()
Out[121]: 
   name  otherstuff1  otherstuff2  value1  value2
0  Jack         1.19         2.39       2       3
1  Luke         1.08         1.08       1       1
2  Mark         3.45         3.45       0       1

回复收藏 0 原文

就此别过 2025-02-20 09:53:41

这些解决方案很棒，但是当您的列太多时，您不想键入所有列名。所以这是我想到的：

column_map = {col: "first" for col in df.columns}
column_map["col_name1"] = "sum"
column_map["col_name2"] = lambda x: set(x) # it can also be a function or lambda

现在您可以做

df.groupby(["col_to_group"], as_index=False).aggreagate(column_map)

These solutions are great, but when you have too many columns, you do not want to type all of the column names. So here is what I came up with:

column_map = {col: "first" for col in df.columns}
column_map["col_name1"] = "sum"
column_map["col_name2"] = lambda x: set(x) # it can also be a function or lambda

now you can simply do

df.groupby(["col_to_group"], as_index=False).aggreagate(column_map)

回复收藏 0 原文

魄砕の薆 2025-02-20 09:53:41

上面答案中的键实际上是as_index = false，否则列表中的所有列都在索引中使用。

p_summ = p.groupby( attributes_list, as_index=False ).agg( {'AMT':sum })

The key in the answer above is actually the as_index=False, otherwise all the columns in the list get used in the index.

p_summ = p.groupby( attributes_list, as_index=False ).agg( {'AMT':sum })

回复收藏 0 原文

~没有更多了~

关于作者

白衬杉格子梦

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

与groupby一起使用sum（）时，请保留其他列

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

佚名

今天

゛时过境迁

达拉崩吧

呆萌少年

孤者何惧

友情链接

与groupby一起使用sum（）时，请保留其他列

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

佚名

今天

゛时过境迁

达拉崩吧

呆萌少年

孤者何惧

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。