熊猫动态替换NAN值

发布于 2025-02-03 09:45:33 字数 1049 浏览 4 评论 0原文

我有一个看起来像这样的数据框：

df = pd.DataFrame({'a':[1,2,np.nan,1,np.nan,np.nan,4,2,3,np.nan], 
    'b':[4,2,3,np.nan,np.nan,1,5,np.nan,5,8]
})

   a    b
0  1.0  4.0
1  2.0  2.0
2  NaN  3.0
3  1.0  NaN
4  NaN  NaN
5  NaN  1.0
6  4.0  5.0
7  2.0  NaN
8  3.0  5.0
9  NaN  8.0

我想动态替换NAN值。我尝试使用（df.ffill（）+df.bfill（））/2，但这不会产生所需的输出，因为它会一次将填充值投入到整列，而不是当时动态。我已经尝试了插值，但对于非线性数据不太好。

我已经看过此答案< /a>，但不完全理解它，不确定它是否有效。

值的计算更新
我希望每个NAN值是上一个非NAN值的均值。如果顺序有超过1 nan的值，我想一次替换一个值，然后计算平均值，例如，如果有1个，np.nan，np.nan，4，我首先想要1的平均值第一个NAN值的4（2.5） - 获得1,2.5，NP.Nan，4-，然后第二个NAN将是2.5和4的平均值，达到1,2.5,3.25,4

所需的输出为1,2.5,3.25,4

    a    b
0  1.00  4.0
1  2.00  2.0
2  1.50  3.0
3  1.00  2.0
4  2.50  1.5
5  3.25  1.0
6  4.00  5.0
7  2.00  5.0
8  3.00  5.0
9  1.50  8.0

原文

I have a DataFrame that looks like this:

df = pd.DataFrame({'a':[1,2,np.nan,1,np.nan,np.nan,4,2,3,np.nan], 
    'b':[4,2,3,np.nan,np.nan,1,5,np.nan,5,8]
})

   a    b
0  1.0  4.0
1  2.0  2.0
2  NaN  3.0
3  1.0  NaN
4  NaN  NaN
5  NaN  1.0
6  4.0  5.0
7  2.0  NaN
8  3.0  5.0
9  NaN  8.0

I want to dynamically replace the nan values. I have tried doing (df.ffill()+df.bfill())/2 but that does not yield the desired output, as it casts the fill value to the whole column at once, rather then dynamically. I have tried with interpolate, but it doesn't work well for non linear data.

I have seen this answer but did not fully understand it and not sure if it would work.

Update on the computation of the values
I want every nan value to be the mean of the previous and next non nan value. In case there are more than 1 nan value in sequence, I want to replace one at a time and then compute the mean e.g., in case there is 1, np.nan, np.nan, 4, I first want the mean of 1 and 4 (2.5) for the first nan value - obtaining 1,2.5,np.nan,4 - and then the second nan will be the mean of 2.5 and 4, getting to 1,2.5,3.25,4

The desired output is

    a    b
0  1.00  4.0
1  2.00  2.0
2  1.50  3.0
3  1.00  2.0
4  2.50  1.5
5  3.25  1.0
6  4.00  5.0
7  2.00  5.0
8  3.00  5.0
9  1.50  8.0

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

终陌 2025-02-10 09:45:33

受@ye olde noobe答案的启发（感谢他！）：

我已经对其进行了优化，以使其更快地（以下下面的比较）：

def custom_fillna(s:pd.Series):
  for i in range(len(s)):
    if pd.isna(s[i]):
      last_valid_number = (s[s[:i].last_valid_index()] if s[:i].last_valid_index() is not None else 0)
      next_valid_numer = (s[s[i:].first_valid_index()] if s[i:].first_valid_index() is not None else 0)
      s[i] = (last_valid_number+next_valid_numer)/2

custom_fillna(df['a'])
df

时代比较：

< a href =“ https://i.sstatic.net/ewuea.png” rel =“ nofollow noreferrer”>

Inspired by the @ye olde noobe answer (thanks to him!):

I've optimized it to make it ≃ 100x faster (times comparison below):

def custom_fillna(s:pd.Series):
  for i in range(len(s)):
    if pd.isna(s[i]):
      last_valid_number = (s[s[:i].last_valid_index()] if s[:i].last_valid_index() is not None else 0)
      next_valid_numer = (s[s[i:].first_valid_index()] if s[i:].first_valid_index() is not None else 0)
      s[i] = (last_valid_number+next_valid_numer)/2

custom_fillna(df['a'])
df

Times comparison:

回复收藏 0 原文

留一抹残留的笑 2025-02-10 09:45:33

也许不是最优化的，但它有效（注意：从您的示例中，我假设如果NAN之前或之后没有有效的值，例如A列上的最后一行，0用作替换）：

import pandas as pd

def fill_dynamically(s: pd.Series):
    for i in range(len(s)):
        s[i] = (
            (0 if s[i:].first_valid_index() is None else s[i:][s[i:].first_valid_index()]) +
            (0 if s[:i+1].last_valid_index() is None else s[:i+1][s[:i+1].last_valid_index()])
        ) / 2

使用这样的使用对于完整的DataFrame：

df = pd.DataFrame({'a':[1,2,np.nan,1,np.nan,np.nan,4,2,3,np.nan], 
    'b':[4,2,3,np.nan,np.nan,1,5,np.nan,5,8]
})

df.apply(fill_dynamically)

DF申请后：

      a    b
0  1.00  4.0
1  2.00  2.0
2  1.50  3.0
3  1.00  2.0
4  2.50  1.5
5  3.25  1.0
6  4.00  5.0
7  2.00  5.0
8  3.00  5.0
9  1.50  8.0

如果您将拥有其他列，并且不想在整个数据框架上应用，则当然可以在单列上使用它，例如：

df = pd.DataFrame({'a':[1,2,np.nan,1,np.nan,np.nan,4,2,3,np.nan], 
    'b':[4,2,3,np.nan,np.nan,1,5,np.nan,5,8]
})

fill_dynamically(df['a'])

在这种情况下，DF看起来像那：

      a    b
0  1.00  4.0
1  2.00  2.0
2  1.50  3.0
3  1.00  NaN
4  2.50  NaN
5  3.25  1.0
6  4.00  5.0
7  2.00  NaN
8  3.00  5.0
9  1.50  8.0

Maybe not the most optimized, but it works (note: from your example, I assume that if there is no valid value before or after a NaN, like the last row on column a, 0 is used as a replacement):

import pandas as pd

def fill_dynamically(s: pd.Series):
    for i in range(len(s)):
        s[i] = (
            (0 if s[i:].first_valid_index() is None else s[i:][s[i:].first_valid_index()]) +
            (0 if s[:i+1].last_valid_index() is None else s[:i+1][s[:i+1].last_valid_index()])
        ) / 2

Use like this for the full dataframe:

df = pd.DataFrame({'a':[1,2,np.nan,1,np.nan,np.nan,4,2,3,np.nan], 
    'b':[4,2,3,np.nan,np.nan,1,5,np.nan,5,8]
})

df.apply(fill_dynamically)

df after applying:

      a    b
0  1.00  4.0
1  2.00  2.0
2  1.50  3.0
3  1.00  2.0
4  2.50  1.5
5  3.25  1.0
6  4.00  5.0
7  2.00  5.0
8  3.00  5.0
9  1.50  8.0

In case you would have other columns and don't want to apply that on the whole dataframe, you can of course use it on a single column, like that:

df = pd.DataFrame({'a':[1,2,np.nan,1,np.nan,np.nan,4,2,3,np.nan], 
    'b':[4,2,3,np.nan,np.nan,1,5,np.nan,5,8]
})

fill_dynamically(df['a'])

In this case, df looks like that:

      a    b
0  1.00  4.0
1  2.00  2.0
2  1.50  3.0
3  1.00  NaN
4  2.50  NaN
5  3.25  1.0
6  4.00  5.0
7  2.00  NaN
8  3.00  5.0
9  1.50  8.0

回复收藏 0 原文

~没有更多了~

关于作者

坚持沉默

暂无简介

文章

937 人气

关注发私信

友情链接

文江博客

熊猫动态替换NAN值

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

陪我终i

别忘他

野心澎湃

蒲公英的约定

。

旧时模样

友情链接

熊猫动态替换NAN值

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

陪我终i

别忘他

野心澎湃

蒲公英的约定

。

旧时模样

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。