Matlab 中的凝聚聚类

发布于 2024-12-13 22:30:24 字数 511 浏览 7 评论 0原文

我有一个简单的二维数据集，我希望以凝聚的方式进行聚类（不知道要使用的最佳聚类数量）。我能够成功对数据进行聚类的唯一方法是为函数指定“maxclust”值。

为了简单起见，假设这是我的数据集：

X=[ 1,1;
    1,2;
    2,2;
    2,1;
    5,4;
    5,5;
    6,5;
    6,4 ];

当然，我希望这些数据形成 2 个集群。我知道，如果我知道这一点，我可以说：

T = clusterdata(X,'maxclust',2);

并且要找到哪些点属于每个集群，我可以说：

cluster_1 = X(T==1, :);

但

cluster_2 = X(T==2, :);

如果不知道 2 个集群对于该数据集来说是最佳的，我如何对这些数据进行集群？

谢谢

原文

I have a simple 2-dimensional dataset that I wish to cluster in an agglomerative manner (not knowing the optimal number of clusters to use). The only way I've been able to cluster my data successfully is by giving the function a 'maxclust' value.

For simplicity's sake, let's say this is my dataset:

X=[ 1,1;
    1,2;
    2,2;
    2,1;
    5,4;
    5,5;
    6,5;
    6,4 ];

Naturally, I would want this data to form 2 clusters. I understand that if I knew this, I could just say:

T = clusterdata(X,'maxclust',2);

and to find which points fall into each cluster I could say:

cluster_1 = X(T==1, :);

and

cluster_2 = X(T==2, :);

but without knowing that 2 clusters would be optimal for this dataset, how do I cluster these data?

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

開玄 2024-12-20 22:30:24

此方法的要点在于，它表示在层次结构中找到的集群，并且由您决定要获取多少详细信息。

dendogram

将其视为与树状图相交的水平线，该线从 0 开始移动（每个点都是其自己的簇）一直到最大值（所有点都在一个簇中）。您可以：

当达到预定数量的簇时停止（示例）
给定一定的高度值（示例）
选择将其放置在根据距离标准，簇之间的距离太远（即，有一个很大的跳跃到下一个级别）（示例）

这可以通过使用 CLUSTER/CLUSTERDATA 函数的'maxclust' 或 'cutoff' 参数

回复收藏 0 原文

逆光飞翔i 2024-12-20 22:30:24

要选择最佳簇数，一种常见的方法是绘制类似于碎石图的图。然后，您在图中寻找“肘部”，这就是您选择的簇的数量。对于此处的标准，我们将使用簇内平方和：

function wss = plotScree(X, n)

wss = zeros(1, n);
wss(1) = (size(X, 1)-1) * sum(var(X, [], 1));
for i=2:n
    T = clusterdata(X,'maxclust',i);
    wss(i) = sum((grpstats(T, T, 'numel')-1) .* sum(grpstats(X, T, 'var'), 2));
end
hold on
plot(wss)
plot(wss, '.')
xlabel('Number of clusters')
ylabel('Within-cluster sum-of-squares')

>> plotScree(X, 5)

ans =

   54.0000    4.0000    3.3333    2.5000    2.0000

在此处输入图像描述

To choose the optimal number of clusters, one common approach is to make a plot similar to a Scree Plot. Then you look for the "elbow" in the plot, and that is the number of clusters you pick. For the criterion here, we will use the within-cluster sum-of-squares:

function wss = plotScree(X, n)

wss = zeros(1, n);
wss(1) = (size(X, 1)-1) * sum(var(X, [], 1));
for i=2:n
    T = clusterdata(X,'maxclust',i);
    wss(i) = sum((grpstats(T, T, 'numel')-1) .* sum(grpstats(X, T, 'var'), 2));
end
hold on
plot(wss)
plot(wss, '.')
xlabel('Number of clusters')
ylabel('Within-cluster sum-of-squares')

>> plotScree(X, 5)

ans =

   54.0000    4.0000    3.3333    2.5000    2.0000

enter image description here

回复收藏 0 原文

~没有更多了~

关于作者

江挽川

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

Matlab 中的凝聚聚类

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

Matlab 中的凝聚聚类

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。