Matlab 中的凝聚聚类
我有一个简单的二维数据集,我希望以凝聚的方式进行聚类(不知道要使用的最佳聚类数量)。我能够成功对数据进行聚类的唯一方法是为函数指定“maxclust”值。
为了简单起见,假设这是我的数据集:
X=[ 1,1;
1,2;
2,2;
2,1;
5,4;
5,5;
6,5;
6,4 ];
当然,我希望这些数据形成 2 个集群。我知道,如果我知道这一点,我可以说:
T = clusterdata(X,'maxclust',2);
并且要找到哪些点属于每个集群,我可以说:
cluster_1 = X(T==1, :);
但
cluster_2 = X(T==2, :);
如果不知道 2 个集群对于该数据集来说是最佳的,我如何对这些数据进行集群?
谢谢
I have a simple 2-dimensional dataset that I wish to cluster in an agglomerative manner (not knowing the optimal number of clusters to use). The only way I've been able to cluster my data successfully is by giving the function a 'maxclust' value.
For simplicity's sake, let's say this is my dataset:
X=[ 1,1;
1,2;
2,2;
2,1;
5,4;
5,5;
6,5;
6,4 ];
Naturally, I would want this data to form 2 clusters. I understand that if I knew this, I could just say:
T = clusterdata(X,'maxclust',2);
and to find which points fall into each cluster I could say:
cluster_1 = X(T==1, :);
and
cluster_2 = X(T==2, :);
but without knowing that 2 clusters would be optimal for this dataset, how do I cluster these data?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
此方法的要点在于,它表示在层次结构中找到的集群,并且由您决定要获取多少详细信息。
将其视为与树状图相交的水平线,该线从 0 开始移动(每个点都是其自己的簇)一直到最大值(所有点都在一个簇中)。您可以:
这可以通过使用
CLUSTER/CLUSTERDATA 函数的'maxclust'
或'cutoff'
参数The whole point of this method is that it represents the clusters found in a hierarchy, and it is up to you to determine how much details you want to get..
Think of this as having a horizontal line intersecting the dendrogram, which moves starting from 0 (each point is its own cluster) all the way to the max value (all points in one cluster). You could:
This can be done by either using the
'maxclust'
or'cutoff'
arguments of the CLUSTER/CLUSTERDATA functions要选择最佳簇数,一种常见的方法是绘制类似于碎石图的图。然后,您在图中寻找“肘部”,这就是您选择的簇的数量。对于此处的标准,我们将使用簇内平方和:
To choose the optimal number of clusters, one common approach is to make a plot similar to a Scree Plot. Then you look for the "elbow" in the plot, and that is the number of clusters you pick. For the criterion here, we will use the within-cluster sum-of-squares: