如何在pandas中保留group by中的空白？

发布于 2025-01-13 01:47:23 字数 571 浏览 0 评论 0原文

我需要对 pandas 中的 DataFrame 进行分组，但是当我这样做时，空值将转换为零，但我想保留空值。我不知道如何在熊猫中做到这一点。

输入：

Id  Country  Product  sales  qty  price
1   Germany  shoes    32      1   NaN
1   Germany  shoes    32      1    2
2   England  Shoes    22      1   NaN
2   England  Shoes    22      1   NaN
3   Austria  Shoes    0       3   NaN
3   Austria  Shoes    NaN     NaN NaN

期望输出：

Id  Country  Product  sales  qty  price
1   Germany  shoes    64      2   2
2   England  Shoes    44      2   NaN
3   Austria  Shoes    0       3   NaN

原文

I need to groupby my DataFrame in pandas but when I am doing that the null values are converting into zero, but I want to retain nulls. I am not sure how to do that in pandas.

Input:

Id  Country  Product  sales  qty  price
1   Germany  shoes    32      1   NaN
1   Germany  shoes    32      1    2
2   England  Shoes    22      1   NaN
2   England  Shoes    22      1   NaN
3   Austria  Shoes    0       3   NaN
3   Austria  Shoes    NaN     NaN NaN

Desired output:

Id  Country  Product  sales  qty  price
1   Germany  shoes    64      2   2
2   England  Shoes    44      2   NaN
3   Austria  Shoes    0       3   NaN

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夜访吸血鬼 2025-01-20 01:47:23

在 < 中使用参数 min_count=1代码>总和：

df = df.groupby(['Id','Country','Product'], as_index=False).sum(min_count=1)
print (df)
   Id  Country Product  sales  qty  price
0   1  Germany   shoes   64.0  2.0    2.0
1   2  England   Shoes   44.0  2.0    NaN
2   3  Austria   Shoes    0.0  3.0    NaN

Use parameter min_count=1 in sum:

df = df.groupby(['Id','Country','Product'], as_index=False).sum(min_count=1)
print (df)
   Id  Country Product  sales  qty  price
0   1  Germany   shoes   64.0  2.0    2.0
1   2  England   Shoes   44.0  2.0    NaN
2   3  Austria   Shoes    0.0  3.0    NaN

回复收藏 0 原文

爱你是孤单的心事 2025-01-20 01:47:23

您可以使用 isna + group + all 来mask

out = (df.groupby(['Id','Country','Product']).sum()
       .mask(df[['sales','qty','price']].isna()
             .groupby([df['Id'], df['Country'], df['Product']]).all())
       .reset_index())

它相同的想法以不同的方式编写：

cols = ['Id','Country','Product']
g = df.groupby(cols)
out = (g.sum()
       .mask(g.apply(lambda x: x.drop(columns=cols).isna().all()))
       .reset_index())

输出：

   Id  Country Product  sales  qty  price
0   1  Germany   shoes   64.0  2.0    2.0
1   2  England   Shoes   44.0  2.0    NaN
2   3  Austria   Shoes    0.0  3.0    NaN

You could mask it using isna + group + all

out = (df.groupby(['Id','Country','Product']).sum()
       .mask(df[['sales','qty','price']].isna()
             .groupby([df['Id'], df['Country'], df['Product']]).all())
       .reset_index())

The same idea written differently:

cols = ['Id','Country','Product']
g = df.groupby(cols)
out = (g.sum()
       .mask(g.apply(lambda x: x.drop(columns=cols).isna().all()))
       .reset_index())

Output:

   Id  Country Product  sales  qty  price
0   1  Germany   shoes   64.0  2.0    2.0
1   2  England   Shoes   44.0  2.0    NaN
2   3  Austria   Shoes    0.0  3.0    NaN

回复收藏 0 原文

~没有更多了~