如何计算由两个变量分组的过去n年的平均值
首先,我想说我查看了[这个答案][1],但我无法继续处理那里的信息。
所以我有一个这样的数据集,
df = pd.DataFrame({'ID': [10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013],
'Type': ['Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue'],
'Year': [2018,2018,2019,2019,2020,2020,2021,2021,2022,2022,2018,2018,2019,2019,2021,2021,2018,2018,2019,2019,2020,2020,2021,2021,2022,2022,2018,2018,2019,2019,2021,2021],
'Score': [0,0,0,0,0,0,0,0,0,0,14,24,16,5,87,33,0,0,0,0,0,0,0,0,0,0,11,13,3,16,37,49]})
我不知道如何输入 NaN,所以我这样做:
df.replace(0, np.nan, inplace=True)
df = df.dropna(axis=0, subset=['Score'])
我想按 ID 和主题获取滚动 N(在本例中为 3)年平均分。
我可以通过 ID 获取 3 年滚动平均分数:
df['average_past_3_years'] = df.groupby(['ID'], as_index = False).rolling(3).agg( {'分数':'平均值', '年份': '最大值'}).reset_index(level=0).groupby( '年份').transform('shift')['分数'] df = df.sort_values(['ID', 'Year'])
这给了我滚动平均值,但只是按 ID 的年份,也不是主题:
ID Type Year Score average_past_3_years
10010 Red 2018 14.0 NaN
10010 Blue 2018 24.0 NaN
10010 Red 2019 16.0 NaN
10010 Blue 2019 5.0 18.000000
10010 Red 2021 87.0 NaN
10010 Blue 2021 33.0 36.000000
10013 Red 2018 11.0 NaN
10013 Blue 2018 13.0 NaN
10013 Red 2019 3.0 15.000000
10013 Blue 2019 16.0 9.000000
10013 Red 2021 37.0 41.666667
10013 Blue 2021 49.0 18.666667
我试图输出红色的 10013,2021 年将是 17,不是 41,因为它只会计算红色分数。
我尝试过:
df['average_past_3_years'] = df.groupby([['ID', 'Type']], as_index = False).rolling(3).agg(
{'Score':'mean', 'Year': 'max'}).reset_index(level=0).groupby(
'Year').transform('shift')['Score']
但遇到了这个错误:
Grouper and axis must be same length
并且有点卡在那里
我也不确定是否需要事先排序
[1]: https://stackoverflow.com/questions/61837481/pandas-how-to-calculate-mean-values-of-the-past-n-years-for-every-month
First off I would like to say I checked out [this answer][1] and I couldn't progress with the information there.
So I have a dataset as such
df = pd.DataFrame({'ID': [10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10010,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013,10013],
'Type': ['Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue','Red','Blue'],
'Year': [2018,2018,2019,2019,2020,2020,2021,2021,2022,2022,2018,2018,2019,2019,2021,2021,2018,2018,2019,2019,2020,2020,2021,2021,2022,2022,2018,2018,2019,2019,2021,2021],
'Score': [0,0,0,0,0,0,0,0,0,0,14,24,16,5,87,33,0,0,0,0,0,0,0,0,0,0,11,13,3,16,37,49]})
I don't know how to type in NaN so I make it like this:
df.replace(0, np.nan, inplace=True)
df = df.dropna(axis=0, subset=['Score'])
I want to take the rolling N (In this case 3) Year average of score by ID and Subject.
I can get the 3 year rolling average score by ID:
df['average_past_3_years'] = df.groupby(['ID'], as_index = False).rolling(3).agg(
{'Score':'mean', 'Year': 'max'}).reset_index(level=0).groupby(
'Year').transform('shift')['Score']
df = df.sort_values(['ID', 'Year'])
That gives me the rolling average but only by year on ID, not subject as well:
ID Type Year Score average_past_3_years
10010 Red 2018 14.0 NaN
10010 Blue 2018 24.0 NaN
10010 Red 2019 16.0 NaN
10010 Blue 2019 5.0 18.000000
10010 Red 2021 87.0 NaN
10010 Blue 2021 33.0 36.000000
10013 Red 2018 11.0 NaN
10013 Blue 2018 13.0 NaN
10013 Red 2019 3.0 15.000000
10013 Blue 2019 16.0 9.000000
10013 Red 2021 37.0 41.666667
10013 Blue 2021 49.0 18.666667
I am trying to out put say 10013 for red the 2021 would be 17, not 41 as it would only be calculating the red scores.
I tried:
df['average_past_3_years'] = df.groupby([['ID', 'Type']], as_index = False).rolling(3).agg(
{'Score':'mean', 'Year': 'max'}).reset_index(level=0).groupby(
'Year').transform('shift')['Score']
but got this error:
Grouper and axis must be same length
And kinda got stuck there
I'm also unsure if I need to sort before hand
[1]: https://stackoverflow.com/questions/61837481/pandas-how-to-calculate-mean-values-of-the-past-n-years-for-every-month
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
IIUC 用途:
IIUC use: