fillna()不针对groupby()归纳值

发布于 2025-01-30 18:42:20 字数 732 浏览 4 评论 0原文

我正在尝试使用fillna()和transform()将有关手机的“ realease_year”和“ brand_name”列为列中的一些缺失值,但是在运行我的代码后,我仍然具有相同的缺失值计数。

这是我缺少的价值计数&运行代码之前的百分比:

”我要推荐的列是'main_camera_mp“

这是我运行的代码,要算上'main_camera_mp'和结果(只是我将上述数据flyframe复制到df2中的FYI):

df2['main_camera_mp'] = df2['main_camera_mp'].fillna(value = df2.groupby(['release_year','brand_name'])['main_camera_mp'].transform('mean'))

< a href =“ https://i.sstatic.net/ucq7s.png” rel =“ nofollow noreferrer”>

I'm trying to use fillna() and transform() to impute some missing values in a column with respect to the 'release_year' and 'brand_name' of the phone, but after running my code I still have the same missing value counts.

Here are my missing value counts & percentages prior to running the code:

The column I'm imputing on is 'main_camera_mp

Here is the code I ran to impute 'main_camera_mp' and the result (just an FYI that I copied the above dataframe into df2):

df2['main_camera_mp'] = df2['main_camera_mp'].fillna(value = df2.groupby(['release_year','brand_name'])['main_camera_mp'].transform('mean'))

Missing value counts & percentages after running the above line

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

耳钉梦 2025-02-06 18:42:20

我猜您的插补方法不适合您的数据,因为当main_camera_mp缺少时,该条目中缺少Release_year - brand_name 组。因此,从填充值本身将这些群体的值衍生而来的串联对象将对这些组缺少值。

这是一个简单的示例,说明了如何发生这种情况:

import numpy as np
import pandas as pd

df2 = pd.DataFrame({'main_camera_mp': [1, 2, 3, np.nan, 5, 6, np.nan, np.nan],
                    'release_year': [2000, 2000, 2001, 2001, 2000, 2000, 2001, 2001],
                    'brand_name': ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'b']})

df2['main_camera_mp'] = df2['main_camera_mp'].fillna(value = 
    df2.groupby(['release_year', 'brand_name'])['main_camera_mp'].transform('mean'))
df2
    main_camera_mp  release_year    brand_name
0   1.0             2000            a
1   2.0             2000            b
2   3.0             2001            a
3   NaN             2001            b
4   5.0             2000            a
5   6.0             2000            b
6   3.0             2001            a
7   NaN             2001            b

请注意,索引6处的值是按预期估算的,但其他两个缺失值不是,因为它们的组没有无误的值。

I guess your imputation method is not suited for your data, in that when main_camera_mp is missing, it is missing for all entries in that release_year-brand_name group. Thus the series derived from the groupby object that you pass as the fill value will itself have missing values for those groups.

Here is a simple example of how this can happen:

import numpy as np
import pandas as pd

df2 = pd.DataFrame({'main_camera_mp': [1, 2, 3, np.nan, 5, 6, np.nan, np.nan],
                    'release_year': [2000, 2000, 2001, 2001, 2000, 2000, 2001, 2001],
                    'brand_name': ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'b']})

df2['main_camera_mp'] = df2['main_camera_mp'].fillna(value = 
    df2.groupby(['release_year', 'brand_name'])['main_camera_mp'].transform('mean'))
df2
    main_camera_mp  release_year    brand_name
0   1.0             2000            a
1   2.0             2000            b
2   3.0             2001            a
3   NaN             2001            b
4   5.0             2000            a
5   6.0             2000            b
6   3.0             2001            a
7   NaN             2001            b

Note that the value at index 6 was imputed as intended, but the other two missing values were not, because there is no non-missing value for their group.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文