Pandas groupby() 和 agg() 方法在列上的混淆

发布于 2025-01-10 10:10:21 字数 467 浏览 0 评论 0原文

之间的区别吗

df[['column1', 'column2']].groupby('column1').agg(['mean', 'count'])

我可以检查一下和

df[['column1', 'column2']].groupby('column1').agg({'column2': 'mean', 'column2': 'count'})

？在第一个示例中，mean 和 count 是在 column2 上执行的，而 column2 不在 groupby 中。

在第二个示例中，逻辑相同，但我在 agg 中明确提到了 column2。

为什么我没有看到两者相同的结果？

原文

Can I check what is the difference between

df[['column1', 'column2']].groupby('column1').agg(['mean', 'count'])

and

df[['column1', 'column2']].groupby('column1').agg({'column2': 'mean', 'column2': 'count'})

In the first example, mean and count is performed on column2 which is not in groupby.

In the second example, same logic but I had explicitly mentioned column2 in agg.

Why do I not see the same result for both?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烂人 2025-01-17 10:10:21

TLDR

第二条语句的问题是由于覆盖列造成的。

至少可以通过三种方式来执行此语句。

首先让我们构建一个测试数据集：

import pandas as pd
from seaborn import load_dataset

df_tips = load_dataset('tips')

df_tips.head()

相同

df_tips[['sex','size']].groupby(['sex']).agg(['mean','count'])

语句 1：与您的第一个 wy输出

            size      
            mean count
sex                   
Male    2.630573   157
Female  2.459770    87

：具有多索引列标题大小和 level=1 两个聚合的数据框。

语句 2：在字典中使用“size”的聚合列表

df_tips[['sex','size']].groupby(['sex']).agg({'size':['mean','count']})

输出（与上面相同）

            size      
            mean count
sex                   
Male    2.630573   157
Female  2.459770    87

语句 3：使用命名聚合

df_tips[['sex','size']].groupby(['sex']).agg(mean_size=('size','mean'),count_size=('size','count'))

输出：

        mean_size  count_size
sex                          
Male     2.630573         157
Female   2.459770          87

这给出了一个带有您自己命名的“扁平”列标题的数据框，但是名称不得包含空格或特殊字符。

不正确方法是您的第二种方法

df_tips[['sex','size']].groupby(['sex']).agg({'size':'mean','size':'count'})

输出：

        size
sex         
Male     157
Female    87

这里发生的情况是，您获得两列，每个聚合各一列，但列标题的“大小”相同，因此在这种情况下，第一次迭代将被第二次“计数”覆盖。

TLDR

The problem with the second statement has to due with overwriting the column.

There are at least three ways to do this statement.

First let's build a test dataset:

import pandas as pd
from seaborn import load_dataset

df_tips = load_dataset('tips')

df_tips.head()

Statement 1: Same as your first wy

df_tips[['sex','size']].groupby(['sex']).agg(['mean','count'])

Output:

            size      
            mean count
sex                   
Male    2.630573   157
Female  2.459770    87

A dataframe with a multiindex column header size and level=1 both aggregations.

Statement 2: Using a list of aggregrations for 'size' in a dictionary

df_tips[['sex','size']].groupby(['sex']).agg({'size':['mean','count']})

Output (same as above)

            size      
            mean count
sex                   
Male    2.630573   157
Female  2.459770    87

Statement 3: Using named aggregrations

df_tips[['sex','size']].groupby(['sex']).agg(mean_size=('size','mean'),count_size=('size','count'))

Output:

        mean_size  count_size
sex                          
Male     2.630573         157
Female   2.459770          87

This give a dataframe with a 'flatten' column header that you name yourself, however that name must not contain a space or special characters.

The incorrect way is your second method

df_tips[['sex','size']].groupby(['sex']).agg({'size':'mean','size':'count'})

Outputs:

        size
sex         
Male     157
Female    87

What is happening here is that you are getting two columns one for each aggregations but the column header is the same 'size', therefore the first iteration is getting overwritten with the second 'count' in this case.

回复收藏 0 原文

~没有更多了~

关于作者

む无字情书

暂无简介

文章

26 人气

关注发私信

燃烧我的卡路李先生

文章 0 评论 0

关注

qq_2gSKZM

文章 0 评论 0

关注

∞梦里开花

文章 0 评论 0

关注

qq_IklFPL

文章 0 评论 0

关注

迷途知返

文章 0 评论 0

关注

深海不蓝

文章 0 评论 0

友情链接

文江博客

Pandas groupby() 和 agg() 方法在列上的混淆

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

TLDR

相同

语句 2：在字典中使用“size”的聚合列表

语句 3：使用命名聚合

不正确方法是您的第二种方法

TLDR

Statement 1: Same as your first wy

Statement 2: Using a list of aggregrations for 'size' in a dictionary

Statement 3: Using named aggregrations

The incorrect way is your second method

关于作者

相关话题

热门标签

推荐作者

燃烧我的卡路李先生

qq_2gSKZM

∞梦里开花

qq_IklFPL

迷途知返

深海不蓝

友情链接

Pandas groupby() 和 agg() 方法在列上的混淆

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

TLDR

相同

语句 2：在字典中使用“size”的聚合列表

语句 3：使用 命名聚合

不正确方法是您的第二种方法

TLDR

Statement 1: Same as your first wy

Statement 2: Using a list of aggregrations for 'size' in a dictionary

Statement 3: Using named aggregrations

The incorrect way is your second method

关于作者

相关话题

热门标签

推荐作者

燃烧我的卡路李先生

qq_2gSKZM

∞梦里开花

qq_IklFPL

迷途知返

深海不蓝

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

语句 3：使用命名聚合