如何计算此聚类中总误差的度量

发布于 2024-08-31 04:13:07 字数 205 浏览 9 评论 0原文

这是一个关于k-means聚类算法的问题。我有以下几点和数据S1的聚类。谁能告诉我如何计算与此聚类相关的总误差?我知道这不是一个严格的编程问题,但我的算法需要它。我认为答案应该是 4/3,但我不知道如何计算。谁能帮助我吗?

x1= (2.0,1.0)
x2= (2.0,2.0)
x3= (1.0,2.0)

S1={ x1, x2, x3 }

This is a question about k-means clustering algorithm. I have the following points and clustering of data S1. Can anyone tell me how to calculate the total error associated with this clustering? I know it's not a strictly programming question, but I need it for my algorithm. I think the answer should be 4/3 but I have no idea how to calculate this. Can anyone help me?

x1= (2.0,1.0)
x2= (2.0,2.0)
x3= (1.0,2.0)

S1={ x1, x2, x3 }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

驱逐舰岛风号 2024-09-07 04:13:07

计算误差的方法有很多种。这是一个。

首先计算集合的质心,C1 = (x1+x2+x3)/3。然后将误差计算为距质心的距离之和:E1 = d(C1-x1) + d(C1-x2) + d(C1-x3)。

There are many ways to calculate the error. Here's one.

First calculate the centroid of the set, C1 = (x1+x2+x3)/3. Then calculate the error as the sum of the distances from the centroid: E1 = d(C1-x1) + d(C1-x2) + d(C1-x3).

巨坚强 2024-09-07 04:13:07

在过去的几周里我不得不寻找类似的东西。与大多数事情一样,找到正确的名称有很大帮助。您正在寻找聚类有效性指数。
我发现算法(和相关数学)的有用来源是 Gan、Ma 和 Wu 所著的《数据聚类理论、算法和应用》第 17 章。亚马逊的价格不便宜,要 100 多美元,但我会发现这本书的其余部分很有用。
虽然它涵盖了很多这些指标,但缺乏对优点和缺点的很好的讨论,所以你需要一些在线搜索。

最后我尝试了戴维斯布尔丁指数和邓恩指数。 Dunn 工作得更好,但计算速度非常慢,我选择了一个简化版本,该版本使用质心到质心距离(而不是分量点到点距离)和距质心的最大半径,而不是真实直径。到目前为止,这对我来说效果很好。

大多数各种指数都使用簇大小和分离度的度量。

I had to search for something similar in the past couple of weeks. As with most things, finding the correct name helped greatly. You are looking for a Cluster Validity Index.
I found a useful source for algorithms (and related maths) to be Chapter 17 of "Data Clustering Theory, Algorithms, and Applications" by Gan, Ma, and Wu. Not cheap at $100+ from Amazon but I will find the rest of the book useful.
Although it covers a lot of these indices, it lacks a good discussion of the strengths and weaknesses, so you need some online searching.

In the end I tried the Davies Bouldin Index and Dunn's Index. Dunn worked better but was very slow to compute to I settled on a simplified version which used centroid-centroid distances (rather than component point-point distances) and max radius from centroid, rather than true diameter. So far this is working well for me.

most of the various indices use measures of cluster size and separation.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文