k-最近邻分类器但使用分布?
我正在为一些二维数据构建分类器。
我有一些训练数据,我知道这些数据的类别,并将它们绘制在图表上以查看聚类情况。
对于观察者来说,存在明显的、独立的簇,但不幸的是它们分布在线上而不是紧密的簇中。一条线展开与水平线成大约 80 度角,另一条线与水平线成 45 度角,另一条线与水平线成大约 10 度角,但这三条线似乎都指向原点。
我想对一些测试数据执行最近邻分类,从表面上看,如果测试数据与训练数据非常相似,3 个最近邻分类器可以正常工作,除非数据接近图的原点,在这种情况下,三个簇非常接近,并且可能会出现一些错误。
我应该为我的集群提出一些估计的高斯分布吗?如果是这样,我不确定如何将其与最近邻分类器结合起来?
感谢任何意见。
干杯
I am building a classifier for some 2D data.
I have some training data for which I know the classes and have plotted these on a graph to see the clustering.
To the observer, there are obvious, separate clusters, but unfortunately they are spread out over lines rather than in tight clusters. One line-spread goes up at about an 80 degree angle, another at 45 degree and another at about 10 degrees from horizontal, but all three seem to point back to the origin.
I want to perform a nearest-neighbour classification on some test data, and from the looks of things, if the test data is very similar to the training data a 3-nearest-neighbour classifier would work fine, except when the data is close to the origin of the graph, in which case the three clusters are quite close together and there might be a few errors.
Should I be coming up with some estimated gaussian distributions for my clusters? If so, I'm not sure how I can combine this with a nearest neighbour classifier?
Be grateful for any input.
Cheers
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在运行最近邻之前,将所有点转换为 [r, angle],并将 r 缩小到 0 到 90 的范围。
为什么 ?神经网络使用点和中心之间的欧几里得距离(在大多数实现中),
但你希望
distance(point, center )
更像sqrt( (点.r - 中心.r)^2 + (点.角度 - 中心.角度)^2 )
比 sqrt( (point.x - center.x)^2 + (point.y - center.y)^2 ) 。
将 r 缩小到 30 ? 10?重量角度会比 r 更大,这似乎就是你想要的。
Transform all your points to [r, angle], and scale r down to the range 0 to 90 too, before running nearest-neighbor.
Why ? NN uses Euclidean distance between points and centres (in most implementations),
but you want
distance( point, centre )
to be more likesqrt( (point.r - centre.r)^2 + (point.angle - centre.angle)^2 )
than sqrt( (point.x - centre.x)^2 + (point.y - centre.y)^2 ) .
Scaling r down to 30 ? 10 ? would weight angle more than r, which seems to be what you want.
为什么要使用 k-NN 来达到这个目的?任何线性分类器都可以做到这一点。尝试用 SVM 来解决它,你会得到更好的结果。
如果您坚持使用 kNN,那么显然必须缩放特征并将其转换为极坐标特征,如此处所述。
Why use k-NN for that purpose? any linear classifier would do the trick. try solving it with SVM and you'll get much better results.
If you insist of using kNN, you clearly have to scale the features and transform them into polar ones as mentioned here.