在 group by 之后，我想对 python 中成员超过 5 个的组进行聚合，我应该怎么做？

发布于 2025-01-17 04:05:07 字数 470 浏览 0 评论 0原文

我的价值

我想回答这个问题：仅考虑拥有至少 5 家上市企业的 NTA，每个 NTA 的平均总储蓄和创造的总就业机会是多少？

所以我在第一部分使用了以下代码：

df['NTA_mod']=df['NTA'].str.split('-')
df=df.explode('NTA_mod').reset_index(drop=True)

df_NTA_grp=df.groupby(['NTA_mod'])

现在我必须选择那些 NTA 大于 5 的代码，并且我使用了以下代码

df.groupby('NTA_mod').filter(lambda x: len(x) >= 5)

但是，我没有得到任何响应，我不知道如何继续回答这个问题。我应该如何选择那些拥有至少 5 家企业的 NTA？我的方法正确吗？如果是，现在我应该如何汇总以获得下一步的平均值和总和？

原文

I have Value of Energy Cost Saving Program dataset

And I want to answer this question:
Considering only NTAs with at least 5 listed businesses, what is the average total savings and the total jobs created for each NTA?

So I used the following code for the first parts:

df['NTA_mod']=df['NTA'].str.split('-')
df=df.explode('NTA_mod').reset_index(drop=True)

df_NTA_grp=df.groupby(['NTA_mod'])

now I have to pick those with NTA greater than 5 and I used following code

df.groupby('NTA_mod').filter(lambda x: len(x) >= 5)

However, I don't get any response and I don't know how to continue to answer the question. How should I pick those NTA with businesses with at least 5 businesses?
is my approach correct?
If yes now how should I aggregate to get mean and sum for next step?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

屋顶上的小猫咪 2025-01-24 04:05:07

你正朝着正确的方向前进。使用 聚合< /a> 获取平均值和总和的方法。 aggregate 可以对不同的列执行不同的操作。

df_ge_5 = df_NTA_grp.filter(lambda x: len(x) >= 5)

df_ge_5.groupby('NTA_mod').agg({ 
    'Total Savings': 'mean',
    'Job created': 'sum',
})
# Or, renaming columns with named aggregation
aggn = {
    'Average Total Savings': ('Total Savings', 'mean'),
    'Total Jobs Created': ('Job created','sum'),
}
df_ge_5.groupby('NTA_mod').agg(**aggn)

You are going in the right direction. Use aggregate method to get mean and sum. aggregate can perform different operations on different columns.

df_ge_5 = df_NTA_grp.filter(lambda x: len(x) >= 5)

df_ge_5.groupby('NTA_mod').agg({ 
    'Total Savings': 'mean',
    'Job created': 'sum',
})
# Or, renaming columns with named aggregation
aggn = {
    'Average Total Savings': ('Total Savings', 'mean'),
    'Total Jobs Created': ('Job created','sum'),
}
df_ge_5.groupby('NTA_mod').agg(**aggn)

回复收藏 0 原文

~没有更多了~