如何在pandas中保留group by中的空白?

发布于 2025-01-13 01:47:23 字数 571 浏览 0 评论 0原文

我需要对 pandas 中的 DataFrame 进行分组,但是当我这样做时,空值将转换为零,但我想保留空值。我不知道如何在熊猫中做到这一点。

输入:

Id  Country  Product  sales  qty  price
1   Germany  shoes    32      1   NaN
1   Germany  shoes    32      1    2
2   England  Shoes    22      1   NaN
2   England  Shoes    22      1   NaN
3   Austria  Shoes    0       3   NaN
3   Austria  Shoes    NaN     NaN NaN

期望输出:

Id  Country  Product  sales  qty  price
1   Germany  shoes    64      2   2
2   England  Shoes    44      2   NaN
3   Austria  Shoes    0       3   NaN

I need to groupby my DataFrame in pandas but when I am doing that the null values are converting into zero, but I want to retain nulls. I am not sure how to do that in pandas.

Input:

Id  Country  Product  sales  qty  price
1   Germany  shoes    32      1   NaN
1   Germany  shoes    32      1    2
2   England  Shoes    22      1   NaN
2   England  Shoes    22      1   NaN
3   Austria  Shoes    0       3   NaN
3   Austria  Shoes    NaN     NaN NaN

Desired output:

Id  Country  Product  sales  qty  price
1   Germany  shoes    64      2   2
2   England  Shoes    44      2   NaN
3   Austria  Shoes    0       3   NaN

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

夜访吸血鬼 2025-01-20 01:47:23

< 中使用参数 min_count=1代码>总和

df = df.groupby(['Id','Country','Product'], as_index=False).sum(min_count=1)
print (df)
   Id  Country Product  sales  qty  price
0   1  Germany   shoes   64.0  2.0    2.0
1   2  England   Shoes   44.0  2.0    NaN
2   3  Austria   Shoes    0.0  3.0    NaN

Use parameter min_count=1 in sum:

df = df.groupby(['Id','Country','Product'], as_index=False).sum(min_count=1)
print (df)
   Id  Country Product  sales  qty  price
0   1  Germany   shoes   64.0  2.0    2.0
1   2  England   Shoes   44.0  2.0    NaN
2   3  Austria   Shoes    0.0  3.0    NaN
爱你是孤单的心事 2025-01-20 01:47:23

您可以使用 isna + group + allmask

out = (df.groupby(['Id','Country','Product']).sum()
       .mask(df[['sales','qty','price']].isna()
             .groupby([df['Id'], df['Country'], df['Product']]).all())
       .reset_index())

它相同的想法以不同的方式编写:

cols = ['Id','Country','Product']
g = df.groupby(cols)
out = (g.sum()
       .mask(g.apply(lambda x: x.drop(columns=cols).isna().all()))
       .reset_index())

输出:

   Id  Country Product  sales  qty  price
0   1  Germany   shoes   64.0  2.0    2.0
1   2  England   Shoes   44.0  2.0    NaN
2   3  Austria   Shoes    0.0  3.0    NaN

You could mask it using isna + group + all

out = (df.groupby(['Id','Country','Product']).sum()
       .mask(df[['sales','qty','price']].isna()
             .groupby([df['Id'], df['Country'], df['Product']]).all())
       .reset_index())

The same idea written differently:

cols = ['Id','Country','Product']
g = df.groupby(cols)
out = (g.sum()
       .mask(g.apply(lambda x: x.drop(columns=cols).isna().all()))
       .reset_index())

Output:

   Id  Country Product  sales  qty  price
0   1  Germany   shoes   64.0  2.0    2.0
1   2  England   Shoes   44.0  2.0    NaN
2   3  Austria   Shoes    0.0  3.0    NaN
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文