pandas 填充 NA 但并非全部基于最近的记录
我有一个如下所示的数据框,
stud_name act_qtr year yr_qty qtr mov_avg_full mov_avg_2qtr_min_period
0 ABC Q2 2014 2014Q2 NaN NaN NaN
1 ABC Q1 2016 2016Q1 Q1 13.0 14.5
2 ABC Q4 2016 2016Q4 NaN NaN NaN
3 ABC Q4 2017 2017Q4 NaN NaN NaN
4 ABC Q4 2020 2020Q4 NaN NaN NaN
OP = pd.read_clipboard()
stud_name qtr year t_score p_score yr_qty mov_avg_full mov_avg_2qtr_min_period
0 ABC Q1 2014 10 11 2014Q1 10.000000 10.0
1 ABC Q1 2015 11 32 2015Q1 10.500000 10.5
2 ABC Q2 2015 13 45 2015Q2 11.333333 12.0
3 ABC Q3 2015 15 32 2015Q3 12.250000 14.0
4 ABC Q4 2015 17 21 2015Q4 13.200000 16.0
5 ABC Q1 2016 12 56 2016Q1 13.000000 14.5
6 ABC Q2 2017 312 87 2017Q2 55.714286 162.0
7 ABC Q3 2018 24 90 2018Q3 51.750000 168.0
df = pd.read_clipboard()
我想根据以下逻辑来 fillna()
例如:让我们采用 stud_name = ABC
。他拥有多项 NA 记录。我们以他的 2020Q4
的 NA
为例。为了填充该内容,我们从 df
中选择 2020Q4
(即 2018Q3)之前 stud_name=ABC
的最新记录。同样,如果我们采用 stud_name = ABC
。他的另一项NA
记录是2014Q2
。我们从 df
中选择 2014Q2
(即 2014Q1)之前 stud_name=ABC
的最新(先前)记录。我们需要根据 yearqty
值进行排序,以正确获取最新(之前的)记录
我们需要对每个 stud_name
和大数据集执行此操作
因此,我们填写 < code>mov_avg_full 和 mov_avg_2qtr_min_period
如果 df 数据帧中没有以前的记录可供查看,则保留 NA 不变
我正在尝试类似于下面的内容,但它不起作用并且不正确
Filled = OP.merge(df,on=['stud_name'],how='left')
filled.sort_values(['year','Qty'],inplace=True)
filled['mov_avg_full'].fillna(Filled.groupby('stud_name']['mov_avg_full'].shift())
filled['mov_avg_2qtr_min_period'].fillna(Filled .groupby('stud_name']['mov_avg_2qtr_min_period'].shift())
我希望我的输出如下所示
I have a dataframe like as shown below
stud_name act_qtr year yr_qty qtr mov_avg_full mov_avg_2qtr_min_period
0 ABC Q2 2014 2014Q2 NaN NaN NaN
1 ABC Q1 2016 2016Q1 Q1 13.0 14.5
2 ABC Q4 2016 2016Q4 NaN NaN NaN
3 ABC Q4 2017 2017Q4 NaN NaN NaN
4 ABC Q4 2020 2020Q4 NaN NaN NaN
OP = pd.read_clipboard()
stud_name qtr year t_score p_score yr_qty mov_avg_full mov_avg_2qtr_min_period
0 ABC Q1 2014 10 11 2014Q1 10.000000 10.0
1 ABC Q1 2015 11 32 2015Q1 10.500000 10.5
2 ABC Q2 2015 13 45 2015Q2 11.333333 12.0
3 ABC Q3 2015 15 32 2015Q3 12.250000 14.0
4 ABC Q4 2015 17 21 2015Q4 13.200000 16.0
5 ABC Q1 2016 12 56 2016Q1 13.000000 14.5
6 ABC Q2 2017 312 87 2017Q2 55.714286 162.0
7 ABC Q3 2018 24 90 2018Q3 51.750000 168.0
df = pd.read_clipboard()
I would like to fillna() based on below logic
For ex: let's take stud_name = ABC
. He has multipple NA records. Let's take his NA
for 2020Q4
. To fill that, we pick the latest record from df
for stud_name=ABC
before 2020Q4
(which is 2018Q3). Similarly, if we take stud_name = ABC
. His another NA
record is for 2014Q2
. We pick the latest (prior) record from df
for stud_name=ABC
before 2014Q2
(which is 2014Q1). We need to sort based on yearqty
values to get the latest (prior) record correctly
We need to do this for each stud_name
and for a big dataset
So, we fillna in mov_avg_full
and mov_avg_2qtr_min_period
If there are no previous records to look at in df dataframe, leave NA as it is
I was trying something like below but it doesn't work and incorrect
Filled = OP.merge(df,on=['stud_name'],how='left')
filled.sort_values(['year','Qty'],inplace=True)
filled['mov_avg_full'].fillna(Filled.groupby('stud_name']['mov_avg_full'].shift())
filled['mov_avg_2qtr_min_period'].fillna(Filled .groupby('stud_name']['mov_avg_2qtr_min_period'].shift())
I expect my output to be like as shown below
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在这种情况下,您可能需要使用
append
而不是merge
。换句话说,您想要垂直连接而不是水平连接。然后,在按stud_name
和yr_qtr
对 DataFrame 进行排序后,您可以对其使用groupby
和fillna
方法。代码:
输出:
In this case, you might want to use
append
instead ofmerge
. In other words, you want to concatenate vertically instead of horizontally. Then after sorting the DataFrame bystud_name
andyr_qtr
, you can usegroupby
andfillna
methods on it.Code:
Output: