hclust 函数的聚类列表

发布于 2024-11-17 19:18:27 字数 321 浏览 4 评论 0原文

使用plot(hclust(dist(x)))方法,我能够绘制聚类树图。有用。然而我想获得所有集群的列表,而不是树形图,因为我有大量数据(例如 150K 个节点)并且绘图变得混乱。

换句话说,如果 ab c 是一个簇,如果 def g 是一个簇,那么我想得到这样的结果:

1 a,b,c
2 d,e,f,g

请注意,这并不完全是我想要得到什么作为“输出”。这只是一个例子。我只是希望能够获得集群列表而不是树图它可以是向量、矩阵或只是显示元素属于哪些组的简单数字。

这怎么可能?

Using plot(hclust(dist(x))) method, I was able to draw a cluster tree map. It works. Yet I would like to get a list of all clusters, not a tree diagram, because I have huge amount of data (like 150K nodes) and the plot gets messy.

In other words, lets say if a b c is a cluster and if d e f g is a cluster then I would like to get something like this:

1 a,b,c
2 d,e,f,g

Please note that this is not exactly what I want to get as an "output". It is just an example. I just would like to be able to get a list of clusters instead of a tree plot It could be vector, matrix or just simple numbers that show which groups elements belong to.

How is this possible?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

永不分离 2024-11-24 19:18:27

我将使用 R 中提供的数据集来演示如何将树切割成所需数量的部分。结果是一个表。

构造一个 hclust 对象。

hc <- hclust(dist(USArrests), "ave")
#plot(hc)

现在,您可以将树切成任意数量的树枝。对于我的下一个技巧,我将把树分成两组。您可以使用 k 参数设置切割次数。请参阅 ?cutree 以及参数 h 的使用,这可能对您更有用(请参阅 cutree(hc, k = 2) == cutree(hc, h = 110))。

cutree(hc, k = 2)
       Alabama         Alaska        Arizona       Arkansas     California 
             1              1              1              2              1 
      Colorado    Connecticut       Delaware        Florida        Georgia 
             2              2              1              1              2 
        Hawaii          Idaho       Illinois        Indiana           Iowa 
             2              2              1              2              2 
        Kansas       Kentucky      Louisiana          Maine       Maryland 
             2              2              1              2              1 
 Massachusetts       Michigan      Minnesota    Mississippi       Missouri 
             2              1              2              1              2 
       Montana       Nebraska         Nevada  New Hampshire     New Jersey 
             2              2              1              2              2 
    New Mexico       New York North Carolina   North Dakota           Ohio 
             1              1              1              2              2 
      Oklahoma         Oregon   Pennsylvania   Rhode Island South Carolina 
             2              2              2              2              1 
  South Dakota      Tennessee          Texas           Utah        Vermont 
             2              2              2              2              2 
      Virginia     Washington  West Virginia      Wisconsin        Wyoming 
             2              2              2              2              2

I will use the dataset available in R to demonstrate how to cut a tree into desired number of pieces. Result is a table.

Construct a hclust object.

hc <- hclust(dist(USArrests), "ave")
#plot(hc)

You can now cut the tree into as many branches as you want. For my next trick, I will split the tree into two groups. You set the number of cuts with the k parameter. See ?cutree and the use of paramter h which may be more useful to you (see cutree(hc, k = 2) == cutree(hc, h = 110)).

cutree(hc, k = 2)
       Alabama         Alaska        Arizona       Arkansas     California 
             1              1              1              2              1 
      Colorado    Connecticut       Delaware        Florida        Georgia 
             2              2              1              1              2 
        Hawaii          Idaho       Illinois        Indiana           Iowa 
             2              2              1              2              2 
        Kansas       Kentucky      Louisiana          Maine       Maryland 
             2              2              1              2              1 
 Massachusetts       Michigan      Minnesota    Mississippi       Missouri 
             2              1              2              1              2 
       Montana       Nebraska         Nevada  New Hampshire     New Jersey 
             2              2              1              2              2 
    New Mexico       New York North Carolina   North Dakota           Ohio 
             1              1              1              2              2 
      Oklahoma         Oregon   Pennsylvania   Rhode Island South Carolina 
             2              2              2              2              1 
  South Dakota      Tennessee          Texas           Utah        Vermont 
             2              2              2              2              2 
      Virginia     Washington  West Virginia      Wisconsin        Wyoming 
             2              2              2              2              2
拥抱影子 2024-11-24 19:18:27

可以说,

y<-dist(x)
clust<-hclust(y)
groups<-cutree(clust, k=3)
x<-cbind(x,groups)

现在您将获得每条记录的集群组。
您也可以对数据集进行子集化:

x1<- subset(x, groups==1)
x2<- subset(x, groups==2)
x3<- subset(x, groups==3)

lets say,

y<-dist(x)
clust<-hclust(y)
groups<-cutree(clust, k=3)
x<-cbind(x,groups)

now you will get for each record, the cluster group.
You can subset the dataset as well:

x1<- subset(x, groups==1)
x2<- subset(x, groups==2)
x3<- subset(x, groups==3)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文