基于完整过去和窗口合并和计算移动平均值
我有 2 个数据帧,如下所示,
df = pd.DataFrame(
{'stud_name' : ['ABC', 'ABC','ABC','ABC',
'DEF'],
'ques_date' : ['13/11/2020', '10/1/2018','11/11/2017', '27/03/2016',
'13/05/2010']})
df_score = pd.DataFrame(
{'stud_name' : ['ABC', 'ABC','ABC','ABC','ABC','ABC','ABC','DEF','DEF','DEF','DEF'],
'qtr':['Q1','Q2','Q3','Q4','Q1','Q2','Q3','Q3','Q4','Q2','Q4'],
'year' : [2015,2015,2015,2015,2016,2017,2017,2017,2017,2018,2017],
't_score':[11,13,15,17,12,312,14,15,18,43,32],
'p_score':[32,45,32,21,56,87,32,786,213,32,11]})
我想执行以下操作
a) 对于每个 stud_name
,计算两个移动平均值(t_score)列作为输出
mov_avg_full
= 使用全部stud_name
的过去数据。 (过去所有季度的信息均来自 df_score) mov_avg_2qtr
= 使用过去 2 个季度的数据(仅来自 df_score 的过去 2 个季度的信息)
例如:如果年份是 2020 年并且是第三个季度,我想计算所有过去数据的移动平均值(之前2020 年第 3 季度)和最近 2 个季度的移动平均值(2020 年第 1 季度和 2020 年第 2 季度)
如果没有特定的过去数据stud_name
,我们只是输入NA
(例如:DEF在df_score中没有过去的数据)
我尝试了下面的内容
df['ques_date'] = pd.to_datetime(df['ques_date'], dayfirst=True)
df.sort_values(by=['stud_name','ques_date'],inplace=True)
df['act_qtr'] = df['ques_date'].dt.to_period('Q').dt.strftime('Q%q')
df['year'] = df['ques_date'].dt.year
df_score.sort_values(by=['year','qtr'],inplace=True)
df_full = df.merge(df_score,on=['stud_name'])
df_full['mov_avg_2qtr'] = df_full['t_score'].rolling(2).mean() # this is incorrect
,我希望我的输出如下所示
I have 2 dataframes like as shown below
df = pd.DataFrame(
{'stud_name' : ['ABC', 'ABC','ABC','ABC',
'DEF'],
'ques_date' : ['13/11/2020', '10/1/2018','11/11/2017', '27/03/2016',
'13/05/2010']})
df_score = pd.DataFrame(
{'stud_name' : ['ABC', 'ABC','ABC','ABC','ABC','ABC','ABC','DEF','DEF','DEF','DEF'],
'qtr':['Q1','Q2','Q3','Q4','Q1','Q2','Q3','Q3','Q4','Q2','Q4'],
'year' : [2015,2015,2015,2015,2016,2017,2017,2017,2017,2018,2017],
't_score':[11,13,15,17,12,312,14,15,18,43,32],
'p_score':[32,45,32,21,56,87,32,786,213,32,11]})
I would like to do the below
a) For each stud_name
, compute two moving average (of t_score) columns as output
mov_avg_full
= use all past data of a stud_name
. (all past quarters info from df_score)mov_avg_2qtr
= use data from past 2 quarters (only past 2 quarters info from df_score)
ex: if the year is 2020 and it is 3rd qtr, I would like to compute moving average of all past data (before 2020 Q3) and moving average of last 2 quarters (2020 Q1 and 2020 Q2)
If there is no past data for a specific stud_name
, we just put NA
(ex: DEF has no past data in df_score)
I tried the below
df['ques_date'] = pd.to_datetime(df['ques_date'], dayfirst=True)
df.sort_values(by=['stud_name','ques_date'],inplace=True)
df['act_qtr'] = df['ques_date'].dt.to_period('Q').dt.strftime('Q%q')
df['year'] = df['ques_date'].dt.year
df_score.sort_values(by=['year','qtr'],inplace=True)
df_full = df.merge(df_score,on=['stud_name'])
df_full['mov_avg_2qtr'] = df_full['t_score'].rolling(2).mean() # this is incorrect
I expect my output to be like as shown below
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可能想要使用
滚动
和扩展
方法。获得季度指数的笛卡尔积后,您可以应用日期掩码来获取目标行。代码:
输出:
You might want to use
rolling
andexpanding
methods. After obtaining the Cartesian product of quarterly indice, you can apply a date mask to get the target rows.Code:
Output: