hclust 函数的聚类列表
使用plot(hclust(dist(x)))方法,我能够绘制聚类树图。有用。然而我想获得所有集群的列表,而不是树形图,因为我有大量数据(例如 150K 个节点)并且绘图变得混乱。
换句话说,如果 ab c
是一个簇,如果 def g
是一个簇,那么我想得到这样的结果:
1 a,b,c
2 d,e,f,g
请注意,这并不完全是我想要得到什么作为“输出”。这只是一个例子。我只是希望能够获得集群列表而不是树图它可以是向量、矩阵或只是显示元素属于哪些组的简单数字。
这怎么可能?
Using plot(hclust(dist(x)))
method, I was able to draw a cluster tree map. It works. Yet I would like to get a list of all clusters, not a tree diagram, because I have huge amount of data (like 150K nodes) and the plot gets messy.
In other words, lets say if a b c
is a cluster and if d e f g
is a cluster then I would like to get something like this:
1 a,b,c
2 d,e,f,g
Please note that this is not exactly what I want to get as an "output". It is just an example. I just would like to be able to get a list of clusters instead of a tree plot It could be vector, matrix or just simple numbers that show which groups elements belong to.
How is this possible?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我将使用 R 中提供的数据集来演示如何将树切割成所需数量的部分。结果是一个表。
构造一个 hclust 对象。
现在,您可以将树切成任意数量的树枝。对于我的下一个技巧,我将把树分成两组。您可以使用
k
参数设置切割次数。请参阅?cutree
以及参数h
的使用,这可能对您更有用(请参阅cutree(hc, k = 2) == cutree(hc, h = 110)
)。I will use the dataset available in R to demonstrate how to cut a tree into desired number of pieces. Result is a table.
Construct a hclust object.
You can now cut the tree into as many branches as you want. For my next trick, I will split the tree into two groups. You set the number of cuts with the
k
parameter. See?cutree
and the use of paramterh
which may be more useful to you (seecutree(hc, k = 2) == cutree(hc, h = 110)
).可以说,
现在您将获得每条记录的集群组。
您也可以对数据集进行子集化:
lets say,
now you will get for each record, the cluster group.
You can subset the dataset as well: