KNN螺母插入Sklearn
我想使用类sklearn.impute.knnimputer
在我的数据集中估算缺失值。
我有两个问题:
我在媒体上看到了多个实现,也是官方Sklearn 网站。它们都不将数据标准化。在使用KNN之前,不应该使数据标准化吗?还是knnimputer使幕后数据归一化?
knnimputer仅接受数值输入。因此,对于分类数据,我应该单速编码,然后使用插入功能吗?
谢谢
I want to use the class sklearn.impute.KNNImputer
to impute missing values in my dataset.
I have 2 questions regarding this:
I have seen multiple implementations on Medium and also the example on the official Sklearn website. None of them normalize the data. Shouldn’t one normalize the data before using KNN? Or does the KNNImputer normalize the data behind the scenes?
The KNNImputer only accepts numerical input. So for categorical data, should I one-hot encode them and then use the Impute function?
Thank you
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
否,knnimputer中没有隐式归一化。您可以在 a>它只是使用KNN逻辑来计算其邻居特征的加权平均值。
正确,您需要一个热编码它们,然后您需要对它们进行辩论,因为螺母将不会创建一个hot表示(例如[0.2,0.1,0.4])
No, there is no implicit normalisation in the KNNImputer. You can see in the source that it is just using KNN logic to compute weighted average of the features of its neighbours.
Correct, you need to one hot encode them, and then you will need to argmax over these, as the imputer will create not one-hot representations (e.g. [0.2, 0.1, 0.4])