不同大小的簇邻接矩阵
我为不同大小的有向图创建了邻接矩阵。我有大约 30,000 个矩阵,每个矩阵都位于一个单独的文本文件中。我如何对它们进行聚类,是否有任何可用的工具。表示聚类有向图的最佳方式是什么。
谢谢。
I have created adjacency matrix for directed graphs of different sizes. I have around 30,000 matrices, each on a separate text file. How can I cluster them, is there any tools available. What is the best way to represent a directed graph for clustering.
Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您究竟想实现什么目标?将相似的矩阵分组,对吗?
使用 k-means,你不会在这里有太多乐趣。邻接矩阵是二元的;将它们解释为巨大的向量并计算它们的 Lp 范数距离(例如欧几里德距离),然后计算平均矩阵 - 这就是 k-means 所做的 - 对我来说听起来不合理。另外,您可能会受到维度诅咒的困扰。高维数将使所有矩阵看起来相似。
对于几乎任何聚类算法,作为“领域专家”,您必须回答的第一个问题是:是什么使两个邻接矩阵相似?一旦您将其形式化,您将能够运行许多聚类算法,包括经典的单链路聚类、DBSCAN 或 OPTICS。
What exactly do you want to achieve? Group similar matrices, right?
With k-means, you will not have much fun here. The adjacency matrices are binary; interpreting them as huge vectors and computing an L-p-norm distance (e.g. Euclidean distance) on them, then computing average matrixes - which is what k-means does - doesn't sound sensible to me. Plus, you will likely be bitten by the curse of dimensionality. The high number of dimensions will make all matrixes appear similar.
For pretty much any clustering algorithm, the first question you as the "domain expert" will have to answer is: what makes two adjacency matrixes similar? Once you have formalized this, you will be able to run many clustering algorithms, including classic single-link clustering, DBSCAN or OPTICS.
我会尝试 k 均值和 voronoi 图。它可以通过使用最小生成树并寻找最长的边来计算。然后,您可以使用传统的 k 均值(以 mst 边为中心)计算不同的簇。另一种可能性是分层集群,例如空间填充曲线。例如,请参阅: https://stats.stackexchange.com/questions/1475/visualization-software-用于聚类。
I would try k-means and voronoi-diagrams. It can be by computed with a minimal spanning tree and by looking for the longest edges. Then you can compute the different cluster with the traditional k-means using the mst edges as center. Another possiblity would be a hierarchical cluster for example a space-filling-curve. See for example: https://stats.stackexchange.com/questions/1475/visualization-software-for-clustering.
您可以在这里找到一些有关图形特征/统计数据的想法:
http://networkx.lanl.gov/reference/algorithms.html
You can find some ideas for graph features/statistics here:
http://networkx.lanl.gov/reference/algorithms.html