具有特定顺序的序数编码器包括 NAN

发布于 2025-01-18 13:12:02 字数 925 浏览 1 评论 0原文

假设我有这个示例数据集

test = {'Education': ['High School', 'Uneducated', 'Graduate', 'College', np.nan, 'High School'],
        'Gender': ['M', 'F', 'M', 'F', 'M', 'F']}

,结果将是这样的,对吧,

    Education   Gender
    High School M
    Uneducated  F
    Graduate    M
    College     F
    NaN         M
    High School F

我想做的就是,使用此代码将“教育”列指定为序数,

edu = ['Uneducated','High School', 'College', 'Graduate']
oe_edu = OrdinalEncoder(categories=[edu])
test['Education'] = oe_edu.fit_transform(test[['Education']])

但我对 NaN 值有问题,并且我仍然想包含 NaN 值,所以稍后我可以使用插补
(我的 scikit-learn 版本是 1.02,所以如果默认类别它可以处理 NaN)

所以,最终的输出是这样的,

    Education   Gender
    1.0         M
    0.0         F
    3.0         M
    2.0         F
    NaN         M
    1.0         F

如果包含这个参数“handle_unknown”和“unknown_value”也许它会起作用,但我不知道如何使用它

Let say that I have this example dataset

test = {'Education': ['High School', 'Uneducated', 'Graduate', 'College', np.nan, 'High School'],
        'Gender': ['M', 'F', 'M', 'F', 'M', 'F']}

and the outcome will be like this, right

    Education   Gender
    High School M
    Uneducated  F
    Graduate    M
    College     F
    NaN         M
    High School F

All I want to do is, specify the 'Education' column to be ordinal, with this code,

edu = ['Uneducated','High School', 'College', 'Graduate']
oe_edu = OrdinalEncoder(categories=[edu])
test['Education'] = oe_edu.fit_transform(test[['Education']])

but I have a problem with the NaN values, and I still want to include NaN values, so later I can use imputation
(my scikit-learn version is 1.02 so it can handle NaN if default categories)

So, the final output to be like this

    Education   Gender
    1.0         M
    0.0         F
    3.0         M
    2.0         F
    NaN         M
    1.0         F

maybe it will work if include this paramater 'handle_unknown' and 'unknown_value', but I'm not sure how to use it

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

请叫√我孤独 2025-01-25 13:12:02

没关系,我自己拿的

edu = ['Uneducated','High School', 'College', 'Graduate']
oe_edu = OrdinalEncoder(categories=[edu], handle_unknown='use_encoded_value', unknown_value=np.nan)
test['Education'] = oe_edu.fit_transform(test[['Education']])

Never mind, I got it by myself

edu = ['Uneducated','High School', 'College', 'Graduate']
oe_edu = OrdinalEncoder(categories=[edu], handle_unknown='use_encoded_value', unknown_value=np.nan)
test['Education'] = oe_edu.fit_transform(test[['Education']])
一袭白衣梦中忆 2025-01-25 13:12:02

您可以使用 pandas.pandas.pandas.categorical::

edu = ['Uneducated','High School', 'College', 'Graduate']
test['cat'] = pd.Categorical(test['Education'], categories=edu, ordered=True)

print(test)
     Education Gender          cat
0  High School      M  High School
1   Uneducated      F   Uneducated
2     Graduate      M     Graduate
3      College      F      College
4          NaN      M          NaN
5  High School      F  High School

print(test['cat'].cat.codes)
0    1
1    0
2    3
3    2
4   -1
5    1
dtype: int8

You can use pandas.Categorical:

edu = ['Uneducated','High School', 'College', 'Graduate']
test['cat'] = pd.Categorical(test['Education'], categories=edu, ordered=True)

print(test)
     Education Gender          cat
0  High School      M  High School
1   Uneducated      F   Uneducated
2     Graduate      M     Graduate
3      College      F      College
4          NaN      M          NaN
5  High School      F  High School

print(test['cat'].cat.codes)
0    1
1    0
2    3
3    2
4   -1
5    1
dtype: int8
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文