为什么 pandas fillna 函数将非空值转换为空值？

发布于 2025-01-19 21:50:53 字数 600 浏览 3 评论 0原文

分组数据框后，我正在尝试用最大计数填充空值。这是我的代码。

def fill_with_maxcount(x):
    try:
        return x.value_counts().index.tolist()[0]
    except Exception as e:
        return np.NaN


df_all["Surname"] = df_all.groupby(['HomePlanet','CryoSleep','Destination']).Surname.apply(lambda x : x.fillna(fill_with_maxcount(x)))

如果在尝试捕获中发生错误，它将返回NP.NAN值。但是在函数fill_with_maxcount中，我也尝试记录错误。但是在尝试捕获期间没有例外。

在执行代码线之前，有294个NAN值。执行后，它已增加到857 NAN值，这意味着它已将非空值转换为NAN值。我不知道为什么。我使用打印语句进行了一些实验。它返回函数结果的非空值（字符串）。因此，问题应该在PANDAS DataFrame的应用或FillNA功能上。但是我在其他地方使用了相同的方法，没有任何问题。

有人可以给我一个建议。谢谢

原文

I'm trying to fill empty values with the element with max count after grouping the dataframe. Here is my code.

def fill_with_maxcount(x):
    try:
        return x.value_counts().index.tolist()[0]
    except Exception as e:
        return np.NaN


df_all["Surname"] = df_all.groupby(['HomePlanet','CryoSleep','Destination']).Surname.apply(lambda x : x.fillna(fill_with_maxcount(x)))

If there is an error occurred in try catch, it would return np.NaN value. But in the function fill_with_maxcount I tried logging the error also. But there is no exception occurred during the try catch.

Before the execution of the code lines, there are 294 nan values. After the execution it has incresed to 857 nan values, which means it has turned non-empty values into nan values. I can't figure out why. I did some experiments using print statements. It returns a non-empty value (a string) as the result of the function. So the problem should be with the pandas dataframe's apply or fillna function. But I have used this same method in other places without any problem.

Can someone give me a suggestion. Thank you

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

香草可樂 2025-01-26 21:50:53

经过一些代码测试终于找到了。

 df_all.groupby(['HomePlanet','CryoSleep','Destination']).Surname.apply(lambda x : x.fillna(fill_with_maxcount(x)))

上面的部分返回一系列填充值。但是，在用于分组的字段为空的行中，它不会考虑应用该函数。所以这些索引将返回为空。然后将该系列直接分配到“姓氏”列中。所以这些值也变为空。

作为解决方案，我将代码更改如下。

def fill_with_maxcount(x):
    try:
        return x.value_counts().index.tolist()[0]
    except Exception as e:
        return np.NaN
    
def replace_only_null(x,z):
    for i in range(len(x)):
        if x[i]==None or x[i]==np.NaN:
            yield z[i]
        else:
            yield x[i]

result_1 = df_all.groupby(['HomePlanet','CryoSleep','Destination']).Surname.apply(lambda x : x.fillna(fill_with_maxcount(x)))
replaced = pd.Series(np.array(list(replace_only_null(df_all.Surname,result_1))))

df_all.Surname = replaced

Replace_only_null 函数会将结果与当前姓氏列进行比较，并仅用通过应用 fill_with_maxcount 函数检索到的结果替换空值。

Finally found it after some testings with code.

 df_all.groupby(['HomePlanet','CryoSleep','Destination']).Surname.apply(lambda x : x.fillna(fill_with_maxcount(x)))

The above part returns a series with filled values. But however in the rows where the fields used for grouping are empty, it doesn't consider it for applying the function. So those indexes will be returned as null. then that series is directly assigned into the Surname column. So those values become null too.

As the solution I changed the code as the following.

def fill_with_maxcount(x):
    try:
        return x.value_counts().index.tolist()[0]
    except Exception as e:
        return np.NaN
    
def replace_only_null(x,z):
    for i in range(len(x)):
        if x[i]==None or x[i]==np.NaN:
            yield z[i]
        else:
            yield x[i]

result_1 = df_all.groupby(['HomePlanet','CryoSleep','Destination']).Surname.apply(lambda x : x.fillna(fill_with_maxcount(x)))
replaced = pd.Series(np.array(list(replace_only_null(df_all.Surname,result_1))))

df_all.Surname = replaced

The replace_only_null function will compare the result with current Surname columns and replace only null values with result retrieved by applying fill_with_maxcount function.

回复收藏 0 原文

~没有更多了~