Matlab 中的凝聚聚类

发布于 2024-12-13 22:30:24 字数 511 浏览 0 评论 0原文

我有一个简单的二维数据集,我希望以凝聚的方式进行聚类(不知道要使用的最佳聚类数量)。我能够成功对数据进行聚类的唯一方法是为函数指定“maxclust”值。

为了简单起见,假设这是我的数据集:

X=[ 1,1;
    1,2;
    2,2;
    2,1;
    5,4;
    5,5;
    6,5;
    6,4 ];

当然,我希望这些数据形成 2 个集群。我知道,如果我知道这一点,我可以说:

T = clusterdata(X,'maxclust',2);

并且要找到哪些点属于每个集群,我可以说:

cluster_1 = X(T==1, :);

cluster_2 = X(T==2, :);

如果不知道 2 个集群对于该数据集来说是最佳的,我如何对这些数据进行集群?

谢谢

I have a simple 2-dimensional dataset that I wish to cluster in an agglomerative manner (not knowing the optimal number of clusters to use). The only way I've been able to cluster my data successfully is by giving the function a 'maxclust' value.

For simplicity's sake, let's say this is my dataset:

X=[ 1,1;
    1,2;
    2,2;
    2,1;
    5,4;
    5,5;
    6,5;
    6,4 ];

Naturally, I would want this data to form 2 clusters. I understand that if I knew this, I could just say:

T = clusterdata(X,'maxclust',2);

and to find which points fall into each cluster I could say:

cluster_1 = X(T==1, :);

and

cluster_2 = X(T==2, :);

but without knowing that 2 clusters would be optimal for this dataset, how do I cluster these data?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

開玄 2024-12-20 22:30:24

此方法的要点在于,它表示在层次结构中找到的集群,并且由您决定要获取多少详细信息。

聚集
dendogram

将其视为与树状图相交的水平线,该线从 0 开始移动(每个点都是其自己的簇)一直到最大值(所有点都在一个簇中)。您可以:

  • 当达到预定数量的簇时停止(示例)
  • 给定一定的高度值(示例
  • 选择将其放置在根据距离标准,簇之间的距离太远(即,有一个很大的跳跃到下一个级别)(示例

这可以通过使用 CLUSTER/CLUSTERDATA 函数的'maxclust''cutoff' 参数

The whole point of this method is that it represents the clusters found in a hierarchy, and it is up to you to determine how much details you want to get..

agglomerative
dendogram

Think of this as having a horizontal line intersecting the dendrogram, which moves starting from 0 (each point is its own cluster) all the way to the max value (all points in one cluster). You could:

  • stop when you reach a predetermined number of clusters (example)
  • manually position it given a certain height value (example)
  • choose to place it where the clusters are too far apart according to the distance criterion (ie there's a big jump to the next level) (example)

This can be done by either using the 'maxclust' or 'cutoff' arguments of the CLUSTER/CLUSTERDATA functions

逆光飞翔i 2024-12-20 22:30:24

要选择最佳簇数,一种常见的方法是绘制类似于碎石图的图。然后,您在图中寻找“肘部”,这就是您选择的簇的数量。对于此处的标准,我们将使用簇内平方和:

function wss = plotScree(X, n)

wss = zeros(1, n);
wss(1) = (size(X, 1)-1) * sum(var(X, [], 1));
for i=2:n
    T = clusterdata(X,'maxclust',i);
    wss(i) = sum((grpstats(T, T, 'numel')-1) .* sum(grpstats(X, T, 'var'), 2));
end
hold on
plot(wss)
plot(wss, '.')
xlabel('Number of clusters')
ylabel('Within-cluster sum-of-squares')
>> plotScree(X, 5)

ans =

   54.0000    4.0000    3.3333    2.5000    2.0000

在此处输入图像描述

To choose the optimal number of clusters, one common approach is to make a plot similar to a Scree Plot. Then you look for the "elbow" in the plot, and that is the number of clusters you pick. For the criterion here, we will use the within-cluster sum-of-squares:

function wss = plotScree(X, n)

wss = zeros(1, n);
wss(1) = (size(X, 1)-1) * sum(var(X, [], 1));
for i=2:n
    T = clusterdata(X,'maxclust',i);
    wss(i) = sum((grpstats(T, T, 'numel')-1) .* sum(grpstats(X, T, 'var'), 2));
end
hold on
plot(wss)
plot(wss, '.')
xlabel('Number of clusters')
ylabel('Within-cluster sum-of-squares')
>> plotScree(X, 5)

ans =

   54.0000    4.0000    3.3333    2.5000    2.0000

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文