基于完整过去和窗口合并和计算移动平均值

发布于 2025-01-10 09:23:36 字数 1667 浏览 1 评论 0原文

我有 2 个数据帧,如下所示,

df = pd.DataFrame(
    {'stud_name' : ['ABC', 'ABC','ABC','ABC', 
                  'DEF'],
     'ques_date' : ['13/11/2020', '10/1/2018','11/11/2017', '27/03/2016', 
                '13/05/2010']})

df_score = pd.DataFrame(
    {'stud_name' : ['ABC', 'ABC','ABC','ABC','ABC','ABC','ABC','DEF','DEF','DEF','DEF'],
     'qtr':['Q1','Q2','Q3','Q4','Q1','Q2','Q3','Q3','Q4','Q2','Q4'],
     'year' : [2015,2015,2015,2015,2016,2017,2017,2017,2017,2018,2017],
     't_score':[11,13,15,17,12,312,14,15,18,43,32],
     'p_score':[32,45,32,21,56,87,32,786,213,32,11]})

我想执行以下操作

a) 对于每个 stud_name,计算两个移动平均值(t_score)列作为输出

mov_avg_full = 使用全部stud_name 的过去数据。 (过去所有季度的信息均来自 df_score) mov_avg_2qtr = 使用过去 2 个季度的数据(仅来自 df_score 的过去 2 个季度的信息)

例如:如果年份是 2020 年并且是第三个季度,我想计算所有过去数据的移动平均值(之前2020 年第 3 季度)和最近 2 个季度的移动平均值(2020 年第 1 季度和 2020 年第 2 季度)

如果没有特定的过去数据stud_name,我们只是输入NA(例如:DEF在df_score中没有过去的数据)

我尝试了下面的内容

df['ques_date'] = pd.to_datetime(df['ques_date'], dayfirst=True)
df.sort_values(by=['stud_name','ques_date'],inplace=True)
df['act_qtr'] = df['ques_date'].dt.to_period('Q').dt.strftime('Q%q')
df['year'] = df['ques_date'].dt.year
df_score.sort_values(by=['year','qtr'],inplace=True)
df_full = df.merge(df_score,on=['stud_name'])
df_full['mov_avg_2qtr'] = df_full['t_score'].rolling(2).mean() # this is incorrect

,我希望我的输出如下所示

在此处输入图像描述< /a>

I have 2 dataframes like as shown below

df = pd.DataFrame(
    {'stud_name' : ['ABC', 'ABC','ABC','ABC', 
                  'DEF'],
     'ques_date' : ['13/11/2020', '10/1/2018','11/11/2017', '27/03/2016', 
                '13/05/2010']})

df_score = pd.DataFrame(
    {'stud_name' : ['ABC', 'ABC','ABC','ABC','ABC','ABC','ABC','DEF','DEF','DEF','DEF'],
     'qtr':['Q1','Q2','Q3','Q4','Q1','Q2','Q3','Q3','Q4','Q2','Q4'],
     'year' : [2015,2015,2015,2015,2016,2017,2017,2017,2017,2018,2017],
     't_score':[11,13,15,17,12,312,14,15,18,43,32],
     'p_score':[32,45,32,21,56,87,32,786,213,32,11]})

I would like to do the below

a) For each stud_name, compute two moving average (of t_score) columns as output

mov_avg_full = use all past data of a stud_name. (all past quarters info from df_score)
mov_avg_2qtr = use data from past 2 quarters (only past 2 quarters info from df_score)

ex: if the year is 2020 and it is 3rd qtr, I would like to compute moving average of all past data (before 2020 Q3) and moving average of last 2 quarters (2020 Q1 and 2020 Q2)

If there is no past data for a specific stud_name, we just put NA (ex: DEF has no past data in df_score)

I tried the below

df['ques_date'] = pd.to_datetime(df['ques_date'], dayfirst=True)
df.sort_values(by=['stud_name','ques_date'],inplace=True)
df['act_qtr'] = df['ques_date'].dt.to_period('Q').dt.strftime('Q%q')
df['year'] = df['ques_date'].dt.year
df_score.sort_values(by=['year','qtr'],inplace=True)
df_full = df.merge(df_score,on=['stud_name'])
df_full['mov_avg_2qtr'] = df_full['t_score'].rolling(2).mean() # this is incorrect

I expect my output to be like as shown below

enter image description here

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

给不了的爱 2025-01-17 09:23:36

您可能想要使用滚动扩展方法。获得季度指数的笛卡尔积后,您可以应用日期掩码来获取目标行。

代码:

import pandas as pd

# Create sample dataframes
df = pd.DataFrame({'stud_name': ['ABC', 'ABC','ABC','ABC', 'DEF'], 'ques_date' : ['13/11/2020', '10/1/2018','11/11/2017', '27/03/2016', '13/05/2010']})
df_score = pd.DataFrame({'stud_name': ['ABC', 'ABC','ABC','ABC','ABC','ABC','ABC','DEF','DEF','DEF','DEF'], 'qtr':['Q1','Q2','Q3','Q4','Q1','Q2','Q3','Q3','Q4','Q2','Q4'], 'year' : [2015,2015,2015,2015,2016,2017,2017,2017,2017,2018,2017], 't_score':[11,13,15,17,12,312,14,15,18,43,32], 'p_score':[32,45,32,21,56,87,32,786,213,32,11]})

# Assign necessary datetime objects
df['ques_date'] = pd.to_datetime(df.ques_date, format='%d/%m/%Y')
df[['act_qtr', 'act_year', 'act_key']] = df['ques_date'].map(lambda e: [f'Q{e.quarter}', e.year, e.to_period('Q')]).apply(pd.Series)
df_score['key'] = df_score.year.astype(str) + df_score.qtr

# Calculate the two kinds of the moving average
df_score.sort_values(['year', 'qtr'], inplace=True)
df_score['mov_avg_full'] = df_score.groupby('stud_name')['t_score'].expanding().mean().values
df_score['mov_avg_2qtr'] = df_score.groupby('stud_name')['t_score'].rolling(2).mean().values

# Get a cross-joined dataframe
df_full = df.merge(df_score, on='stud_name').sort_values(['act_key', 'key'])

# Apply a datetime mask
df_full = df_full[df_full.key < df_full.act_key].groupby(['stud_name', 'act_qtr', 'act_year'], as_index=False).last()

# Deal with the missing null values and use necessary columns
df_full = df.merge(df_full, how='left', on=['stud_name', 'ques_date', 'act_qtr', 'act_year'])
df_full = df_full[['stud_name', 'ques_date', 'act_qtr', 'act_year', 'mov_avg_full', 'mov_avg_2qtr']]

print(df_full)

输出:

stud_nameques_dateact_qtract_yearmov_avg_fullmov_avg_2qtr
ABC2020-11-13 00:00:00Q4202056.2857163
ABC2018-01-10 00:00:00Q1201856.2857163
ABC2017-11-11 00:00:00Q4201756.2857163
ABC00:00:00Q120161416
DEF2010-05-13 00:00:00Q22010楠楠2016-03-27

You might want to use rolling and expanding methods. After obtaining the Cartesian product of quarterly indice, you can apply a date mask to get the target rows.

Code:

import pandas as pd

# Create sample dataframes
df = pd.DataFrame({'stud_name': ['ABC', 'ABC','ABC','ABC', 'DEF'], 'ques_date' : ['13/11/2020', '10/1/2018','11/11/2017', '27/03/2016', '13/05/2010']})
df_score = pd.DataFrame({'stud_name': ['ABC', 'ABC','ABC','ABC','ABC','ABC','ABC','DEF','DEF','DEF','DEF'], 'qtr':['Q1','Q2','Q3','Q4','Q1','Q2','Q3','Q3','Q4','Q2','Q4'], 'year' : [2015,2015,2015,2015,2016,2017,2017,2017,2017,2018,2017], 't_score':[11,13,15,17,12,312,14,15,18,43,32], 'p_score':[32,45,32,21,56,87,32,786,213,32,11]})

# Assign necessary datetime objects
df['ques_date'] = pd.to_datetime(df.ques_date, format='%d/%m/%Y')
df[['act_qtr', 'act_year', 'act_key']] = df['ques_date'].map(lambda e: [f'Q{e.quarter}', e.year, e.to_period('Q')]).apply(pd.Series)
df_score['key'] = df_score.year.astype(str) + df_score.qtr

# Calculate the two kinds of the moving average
df_score.sort_values(['year', 'qtr'], inplace=True)
df_score['mov_avg_full'] = df_score.groupby('stud_name')['t_score'].expanding().mean().values
df_score['mov_avg_2qtr'] = df_score.groupby('stud_name')['t_score'].rolling(2).mean().values

# Get a cross-joined dataframe
df_full = df.merge(df_score, on='stud_name').sort_values(['act_key', 'key'])

# Apply a datetime mask
df_full = df_full[df_full.key < df_full.act_key].groupby(['stud_name', 'act_qtr', 'act_year'], as_index=False).last()

# Deal with the missing null values and use necessary columns
df_full = df.merge(df_full, how='left', on=['stud_name', 'ques_date', 'act_qtr', 'act_year'])
df_full = df_full[['stud_name', 'ques_date', 'act_qtr', 'act_year', 'mov_avg_full', 'mov_avg_2qtr']]

print(df_full)

Output:

stud_nameques_dateact_qtract_yearmov_avg_fullmov_avg_2qtr
ABC2020-11-13 00:00:00Q4202056.2857163
ABC2018-01-10 00:00:00Q1201856.2857163
ABC2017-11-11 00:00:00Q4201756.2857163
ABC2016-03-27 00:00:00Q120161416
DEF2010-05-13 00:00:00Q22010nannan
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文