Pandas Groupby Python

发布于 2025-02-03 04:03:39 字数 321 浏览 2 评论 0原文

我有一个数据集，其中包含国家的名称，还有一些其他信息，例如文件中的薪水。问题在于，我需要在country_id和City分组的档案中的雇员的平均工资（0，5000]（5000，10000）和（10000，15000）。我正在使用此方法，但结果表不像我想要的那样。你能帮我吗？

df = file.groupby(['country_id',"city"])['salary'].mean().reset_index(name="mean") 
bins = [0, 5000]
df['binned'] = pd.cut(df['mean'], bins)
print(df)

原文

I have a dataset with the names of the countries and some other information such as salary in the file. The problem is that I need to find mean salaries of employees in the file grouped by country_id and city in ranges (0, 5000] (5000, 10000] and (10000, 15000].
I was using this method but the resultant table is not as what I want. Can you help me with that?

df = file.groupby(['country_id',"city"])['salary'].mean().reset_index(name="mean") 
bins = [0, 5000]
df['binned'] = pd.cut(df['mean'], bins)
print(df)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

殊姿 2025-02-10 04:03:39

工资箱范围始终

import math
df = pd.DataFrame({
    'salary':[
        1231, 5000, 7304, 10000, 12302, 15000,
        1001, 4900, 6012, 9123, 11231, 14923
    ],
    'country': [
        'USA','USA','USA','USA','USA','USA',
        'UK','UK','UK','UK','UK','UK'
    ]
})
df['salary_bin_number'] = (df['salary'] / 5000).apply(lambda x: math.ceil(x))
df.head()

您	认为，如果	5000
我	为	的
5000	USA	1
7304	USA	2
10000	USA	2
12302	USA	3

带有salary_bin_number，您可以通过使用以下代码来创建bin的列名，

df['salary_range_str'] = df['salary_bin_number'].apply(
    lambda bin_number: f'({(bin_number-1) * 5000}-{(bin_number) * 5000}]'
)

然后由salary_range_range_str和组成country要计算每个country的平均工资，salary_range_str。

df = df.groupby(['country', 'salary_range_str'])['salary'].mean().reset_index()

最后，将列salary_range_str转移到列。

df = pd.pivot_table(df, index='country', columns='salary_range_str', values='salary')

输出

国家	（0-5000]	（10000-15000）	（5000-10000]
英国	2950.5	13077	7567.5
美国	3115.5	13651	8652

I think if your range of the salary bin is always 5000, you can create the bin number of each row by using / operator and math.ceil

import math
df = pd.DataFrame({
    'salary':[
        1231, 5000, 7304, 10000, 12302, 15000,
        1001, 4900, 6012, 9123, 11231, 14923
    ],
    'country': [
        'USA','USA','USA','USA','USA','USA',
        'UK','UK','UK','UK','UK','UK'
    ]
})
df['salary_bin_number'] = (df['salary'] / 5000).apply(lambda x: math.ceil(x))
df.head()

salary	country	salary_bin_number
1231	USA	1
5000	USA	1
7304	USA	2
10000	USA	2
12302	USA	3

With the salary_bin_number, you can create the column name of bin by using below code

df['salary_range_str'] = df['salary_bin_number'].apply(
    lambda bin_number: f'({(bin_number-1) * 5000}-{(bin_number) * 5000}]'
)

Then group by salary_range_str and country to calculate the average salary in each country,salary_range_str.

df = df.groupby(['country', 'salary_range_str'])['salary'].mean().reset_index()

Finally, pivot the column salary_range_str to columns.

df = pd.pivot_table(df, index='country', columns='salary_range_str', values='salary')

Output

country	(0-5000]	(10000-15000]	(5000-10000]
UK	2950.5	13077	7567.5
USA	3115.5	13651	8652

回复收藏 0 原文

~没有更多了~

关于作者

甜味拾荒者

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

Pandas Groupby Python

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

陪我终i

别忘他

野心澎湃

蒲公英的约定

。

旧时模样

友情链接

Pandas Groupby Python

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

陪我终i

别忘他

野心澎湃

蒲公英的约定

。

旧时模样

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。