聚类预测
在R语言中,聚类中是否有像分类中那样的预测函数? 除了比较两个簇之外,我们还可以从 R 得到的聚类图结果中得出什么结论?
In R language is there a predict function in clustering like the way we have in classification?
What can we conclude from the clustering graph result that we get from R, other that comparing two clusters?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
聚类不关注预测能力。它只是试图找到似乎相关的对象。这就是为什么没有对聚类结果进行“预测”功能的原因。
然而,在许多情况下,基于集群的学习分类器可以提供改进的性能。为此,您本质上是训练一个分类器将对象分配到适当的集群,然后使用仅根据该集群中的示例进行训练的分类器对其进行分类。当集群是纯集群时,您甚至可以跳过第二步。
原因如下:可能有多种类型被归为同一标签。在完整数据集上训练分类器可能很困难,因为它将尝试同时学习两个集群。将班级分成两组,并为每组训练一个单独的分类器,可以使任务变得更加容易。
Clustering does not pay attention to prediction capabilities. It just tries to find objects that seem to be related. That is why there is no "predict" function for clustering results.
However, in many situations, learning classifiers based on the clusters offers an improved performance. For this, you essentially train a classifier to assign the object to the appropriate cluster, then classify it using a classifier trained only on examples from this cluster. When the cluster is pure, you can even skip this second step.
The reason is the following: there may be multiple types that are classified with the same label. Training a classifier on the full data set may be hard, because it will try to learn both clusters at the same time. Splitting the class into two groups, and training a separate classifier for each, can make the task significantly easier.
许多包为集群对象提供了预测方法。此类示例之一是带有
cl_predict
的clue
。执行此操作时的最佳实践是应用用于聚类训练数据的相同规则。例如,在内核 K 均值中,您应该计算数据点和聚类中心之间的内核距离。最小值决定集群分配(参见此处的示例)。在谱聚类中,您应该将数据点的差异投影到训练数据的特征函数中,将欧几里德距离与该空间中的 K 均值中心进行比较,并且最小值应该确定您的聚类分配(参见此处的示例)。
Many packages offer
predict
methods for cluster object. One of such examples isclue
, withcl_predict
.The best practice when doing this is applying the same rules used to cluster training data. For example, in Kernel K-Means you should compute the kernel distance between your data point and the cluster centers. The minimum determines cluster assignment (see here for example). In Spectral Clustering you should project your data point dissimilarity into the eigenfunctions of the training data, compare the euclidean distance to K-Means centers in that space, and a minimum should determine your cluster assignment (see here for example).