如何在 K - Means 算法中优化 K
Possible Duplicate:
How do I determine k when using k-means clustering?
How can i choose the K initially, if i do not know about the data?
Can someone help me in choosing the K.
Thanks
Navin
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
说真的,你想知道什么?您想让我们告诉您一些数字吗?或者如何找到最佳
k
的策略?您必须阅读有关 k-means 的书或其他资源,我很确定那里有相关内容。维基百科上有关于它的内容:
http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
在使用算法之前,请先阅读它。
Seriously, what do you want to know? Do you want us to tell you some number? Or a strategy how to find the optimal
k
? You have to read a book or other resources about k-means, I'm pretty sure it is covered there.There is something on Wikipedia about it:
http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
Before you use an algorithm, read about it.
基本思想是评估样本数据的聚类评分,通常是聚类内部的距离和聚类之间的距离。此测量值越多,聚类效果越好,基于此测量值,您可以选择最佳聚类参数。可以在此处找到其中一个指标 http://alias- i.com/lingpipe/docs/api/com/aliasi/cluster/ClusterScore.html
The base idea is to evaluate cluster scoring on sample data, usally it is distance inside cluster and distance between clusters. The more this measure the better clustering, based on this mesure you can select best clustring paramters. One of metrics can be found here http://alias-i.com/lingpipe/docs/api/com/aliasi/cluster/ClusterScore.html