改进 k 均值聚类
我关于计算机视觉的讲义提到,如果我们知道聚类的标准差,则可以提高 k 均值聚类算法的性能。为何如此?
我的想法是,我们可以首先通过基于直方图的分割,使用标准差来得出更好的初始估计。你怎么认为?感谢您的帮助!
My lecture notes on computer vision mention that the performance of the k-means clustering algorithm can be improved if we know the standard deviation of the clusters. How so?
My thinking is that we can use the standard deviations to come up with a better initial estimate through histogram based segmentation first. What do you think? Thanks for any help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的讲师可能会想到 Veenman 等人于 2002 年发表的论文。基本思想是设置每个集群中允许的最大方差。您从与数据点一样多的簇开始,然后
(这种演化充当全局优化过程,并防止 k 均值中聚类初始分配的不良后果)
总而言之,如果您知道方差,您就知道如何簇应该是不同的,因此更容易检测异常值(通常应该放入单独的簇中)。
Your lecturer might have the 2002 paper by Veenman et al in mind. The basic idea is that you set the maximum variance you allow in each cluster. You start with as many clusters as data points and then you "evolve" clusters by
(this evolution acts as a global optimization procedure, and prevents the bad consequences of initial assignment of cluster means you have in k-means)
To sum up, if you know the variance, you know how varied the clusters should be, so it's easier to e.g. detect outliers (which usually should be put into separate clusters).