无法理解的 Pandas groupby 结果

发布于 2025-01-18 01:03:06 字数 1381 浏览 0 评论 0原文

来自 R 并主要使用 tidyverse,我想知道 pandas groupby 和聚合是如何工作的。我有这段代码,结果令我心碎。

import pandas as pd
df = pd.read_csv('https://gist.githubusercontent.com/ZeccaLehn/4e06d2575eb9589dbe8c365d61cb056c/raw/64f1660f38ef523b2a1a13be77b002b98665cdfe/mtcars.csv')
df.rename(columns={'Unnamed: 0':'brand'}, inplace=True)

现在我想计算气缸的平均位移(disp),如下所示:

df['avg_disp'] = df.groupby('cyl').disp.mean()

其结果如下:

    cyl disp    avg_disp
31  4   121.0   NaN
2   4   108.0   NaN
27  4   95.1    NaN
26  4   120.3   NaN
25  4   79.0    NaN
20  4   120.1   NaN
7   4   146.7   NaN
8   4   140.8   353.100000
19  4   71.1    NaN
18  4   75.7    NaN
17  4   78.7    NaN
29  6   145.0   NaN
0   6   160.0   NaN
1   6   160.0   NaN
3   6   258.0   NaN
10  6   167.6   NaN
9   6   167.6   NaN
5   6   225.0   NaN
13  8   275.8   NaN
28  8   351.0   NaN
4   8   360.0   105.136364
24  8   400.0   NaN
23  8   350.0   NaN
22  8   304.0   NaN
21  8   318.0   NaN
6   8   360.0   183.314286
11  8   275.8   NaN
16  8   440.0   NaN
30  8   301.0   NaN
14  8   472.0   NaN
12  8   275.8   NaN
15  8   460.0   NaN

经过一段时间的搜索,我发现 transform 函数可以得出 transform 的正确值code>avg_disp 通过根据分组 cyl var 为每行分配平均值。 我的观点是......为什么不能使用 mean 函数轻松完成,而不是在分组数据框上使用 .transform('mean')

Coming from R and been working with the tidyverse mostly, I wonder how does pandas groupby and aggregations work. I have this code and the results are heartbreaking to me.

import pandas as pd
df = pd.read_csv('https://gist.githubusercontent.com/ZeccaLehn/4e06d2575eb9589dbe8c365d61cb056c/raw/64f1660f38ef523b2a1a13be77b002b98665cdfe/mtcars.csv')
df.rename(columns={'Unnamed: 0':'brand'}, inplace=True)

Now I would like to calculate the average displacement (disp) by cylinders, like that:

df['avg_disp'] = df.groupby('cyl').disp.mean()

Which results in something like:

    cyl disp    avg_disp
31  4   121.0   NaN
2   4   108.0   NaN
27  4   95.1    NaN
26  4   120.3   NaN
25  4   79.0    NaN
20  4   120.1   NaN
7   4   146.7   NaN
8   4   140.8   353.100000
19  4   71.1    NaN
18  4   75.7    NaN
17  4   78.7    NaN
29  6   145.0   NaN
0   6   160.0   NaN
1   6   160.0   NaN
3   6   258.0   NaN
10  6   167.6   NaN
9   6   167.6   NaN
5   6   225.0   NaN
13  8   275.8   NaN
28  8   351.0   NaN
4   8   360.0   105.136364
24  8   400.0   NaN
23  8   350.0   NaN
22  8   304.0   NaN
21  8   318.0   NaN
6   8   360.0   183.314286
11  8   275.8   NaN
16  8   440.0   NaN
30  8   301.0   NaN
14  8   472.0   NaN
12  8   275.8   NaN
15  8   460.0   NaN

After searching for a while, I discovered the transform function which leads to the correct value for avg_disp by assigning the mean value to each row according to the grouping cyl var.
My point is... why can't it be done easily with the mean function instead of using .transform('mean') on the grouped data frame?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

无人问我粥可暖 2025-01-25 01:03:06

如果要将结果添加回未分组的数据帧,可以使用 .transform

...并返回一个 DataFrame 与原始对象具有相同的索引并填充转换后的值。

df['avg_disp'] = df.groupby('cyl').disp.transform('mean')

If you want to add the results back to the ungrouped dataframe you could use .transform:

... and return a DataFrame having the same indexes as the original object filled with the transformed values.

df['avg_disp'] = df.groupby('cyl').disp.transform('mean')
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文