熊猫动态替换NAN值
我有一个看起来像这样的数据框:
df = pd.DataFrame({'a':[1,2,np.nan,1,np.nan,np.nan,4,2,3,np.nan],
'b':[4,2,3,np.nan,np.nan,1,5,np.nan,5,8]
})
a b
0 1.0 4.0
1 2.0 2.0
2 NaN 3.0
3 1.0 NaN
4 NaN NaN
5 NaN 1.0
6 4.0 5.0
7 2.0 NaN
8 3.0 5.0
9 NaN 8.0
我想动态替换NAN值。我尝试使用(df.ffill()+df.bfill())/2
,但这不会产生所需的输出,因为它会一次将填充值投入到整列,而不是当时动态。我已经尝试了插值
,但对于非线性数据不太好。
我已经看过此答案< /a>,但不完全理解它,不确定它是否有效。
值的计算更新
我希望每个NAN值是上一个非NAN值的均值。如果顺序有超过1 nan的值,我想一次替换一个值,然后计算平均值,例如,如果有1个,np.nan,np.nan,4,我首先想要1的平均值第一个NAN值的4(2.5) - 获得1,2.5,NP.Nan,4-,然后第二个NAN将是2.5和4的平均值,达到1,2.5,3.25,4
所需的输出为1,2.5,3.25,4
a b
0 1.00 4.0
1 2.00 2.0
2 1.50 3.0
3 1.00 2.0
4 2.50 1.5
5 3.25 1.0
6 4.00 5.0
7 2.00 5.0
8 3.00 5.0
9 1.50 8.0
I have a DataFrame that looks like this:
df = pd.DataFrame({'a':[1,2,np.nan,1,np.nan,np.nan,4,2,3,np.nan],
'b':[4,2,3,np.nan,np.nan,1,5,np.nan,5,8]
})
a b
0 1.0 4.0
1 2.0 2.0
2 NaN 3.0
3 1.0 NaN
4 NaN NaN
5 NaN 1.0
6 4.0 5.0
7 2.0 NaN
8 3.0 5.0
9 NaN 8.0
I want to dynamically replace the nan values. I have tried doing (df.ffill()+df.bfill())/2
but that does not yield the desired output, as it casts the fill value to the whole column at once, rather then dynamically. I have tried with interpolate
, but it doesn't work well for non linear data.
I have seen this answer but did not fully understand it and not sure if it would work.
Update on the computation of the values
I want every nan value to be the mean of the previous and next non nan value. In case there are more than 1 nan value in sequence, I want to replace one at a time and then compute the mean e.g., in case there is 1, np.nan, np.nan, 4, I first want the mean of 1 and 4 (2.5) for the first nan value - obtaining 1,2.5,np.nan,4 - and then the second nan will be the mean of 2.5 and 4, getting to 1,2.5,3.25,4
The desired output is
a b
0 1.00 4.0
1 2.00 2.0
2 1.50 3.0
3 1.00 2.0
4 2.50 1.5
5 3.25 1.0
6 4.00 5.0
7 2.00 5.0
8 3.00 5.0
9 1.50 8.0
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
受@ye olde noobe答案的启发(感谢他!):
我已经对其进行了优化,以使其更快地(以下下面的比较):
时代比较:
< a href =“ https://i.sstatic.net/ewuea.png” rel =“ nofollow noreferrer”>
Inspired by the @ye olde noobe answer (thanks to him!):
I've optimized it to make it ≃ 100x faster (times comparison below):
Times comparison:
也许不是最优化的,但它有效(注意:从您的示例中,我假设如果NAN之前或之后没有有效的值,例如A列上的最后一行,0用作替换):
使用这样的使用对于完整的DataFrame:
DF申请后:
如果您将拥有其他列,并且不想在整个数据框架上应用,则当然可以在单列上使用它,例如:
在这种情况下,DF看起来像那:
Maybe not the most optimized, but it works (note: from your example, I assume that if there is no valid value before or after a NaN, like the last row on column a, 0 is used as a replacement):
Use like this for the full dataframe:
df after applying:
In case you would have other columns and don't want to apply that on the whole dataframe, you can of course use it on a single column, like that:
In this case, df looks like that: