通过为属性分配权重进行聚类
我在 Excel 工作表中有一个数据集,我需要通过分配权重对其进行聚类。我该怎么做呢?
I have a data set in excel sheet which I need to cluster it by assigning weights. How can I do it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以定义一个函数,通过考虑属性权重来计算两点之间的距离。例如,加权欧氏距离
具体来说,如果数据集中的每个点有 k 个属性,并且属性的相应权重为 d1,d2,..,dk,则两点 X 和 Y 之间的距离为
d(X ,Y) = sum(di * (Xi-Yi)^2) i=1,2..k 其中 Xi 是点 X 的第 i 个属性的值。
如果权重是属性方差的倒数,则会减少到mahalanobis 距离
http://en.wikipedia.org/wiki/Mahalanobis_distance
定义距离函数后您可以使用 K-means 对数据进行聚类。
You can define a function that computes the distance between two points by attribute weights into account. An example of this would be weighted euclidean distance
Specifically if there are k attributes for each point in your dataset and if the corresponding weights for the attributes are d1,d2,..,dk then distance between two points X and Y is
d(X,Y) = sum(di * (Xi-Yi)^2) i=1,2..k where Xi is the value of ith attribute for the point X.
If the weights are inverse of the variance of the attribute it reduces to mahalanobis distance
http://en.wikipedia.org/wiki/Mahalanobis_distance
Once you define the distance function you can use K-means to cluster your data.