具有特定顺序的序数编码器包括 NAN
假设我有这个示例数据集
test = {'Education': ['High School', 'Uneducated', 'Graduate', 'College', np.nan, 'High School'],
'Gender': ['M', 'F', 'M', 'F', 'M', 'F']}
,结果将是这样的,对吧,
Education Gender
High School M
Uneducated F
Graduate M
College F
NaN M
High School F
我想做的就是,使用此代码将“教育”列指定为序数,
edu = ['Uneducated','High School', 'College', 'Graduate']
oe_edu = OrdinalEncoder(categories=[edu])
test['Education'] = oe_edu.fit_transform(test[['Education']])
但我对 NaN 值有问题,并且我仍然想包含 NaN 值,所以稍后我可以使用插补
(我的 scikit-learn 版本是 1.02,所以如果默认类别它可以处理 NaN)
所以,最终的输出是这样的,
Education Gender
1.0 M
0.0 F
3.0 M
2.0 F
NaN M
1.0 F
如果包含这个参数“handle_unknown”和“unknown_value”也许它会起作用,但我不知道如何使用它
Let say that I have this example dataset
test = {'Education': ['High School', 'Uneducated', 'Graduate', 'College', np.nan, 'High School'],
'Gender': ['M', 'F', 'M', 'F', 'M', 'F']}
and the outcome will be like this, right
Education Gender
High School M
Uneducated F
Graduate M
College F
NaN M
High School F
All I want to do is, specify the 'Education' column to be ordinal, with this code,
edu = ['Uneducated','High School', 'College', 'Graduate']
oe_edu = OrdinalEncoder(categories=[edu])
test['Education'] = oe_edu.fit_transform(test[['Education']])
but I have a problem with the NaN values, and I still want to include NaN values, so later I can use imputation
(my scikit-learn version is 1.02 so it can handle NaN if default categories)
So, the final output to be like this
Education Gender
1.0 M
0.0 F
3.0 M
2.0 F
NaN M
1.0 F
maybe it will work if include this paramater 'handle_unknown' and 'unknown_value', but I'm not sure how to use it
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
没关系,我自己拿的
Never mind, I got it by myself
您可以使用
pandas.pandas.pandas.categorical
::You can use
pandas.Categorical
: