Katz相似度-矩阵/图运算(JAVA+JUNG)
我正在图表中测试一些相似性指标。我正在使用 JUNG API 来处理图表。我已经成功计算了基本指标,例如共同邻居和优先依恋。现在,我想计算 katz 指标,如下所示: 卡茨(v1,v2) = B.paths_1(v1,v2) + B^2.paths_2(v1,v2) + ...+ B^n.paths_n(v1,v2) 其中 paths_n(v1,v2) 是 v1 和 v2 之间长度为“n”的路径数; B 是标量。我将 n 限制为 4,因此最终的 katz 矩阵可以通过以下方式轻松计算: BA + (BA)^2 + ... + (BA)^4 其中 A 是图的邻接矩阵。问题是我正在处理的图表非常巨大,我无法将整个卡茨矩阵存储在内存中。此外,我不需要所有的分数,因为我只测试几对节点。 不过,我无法找到一种有效的方法来计算个人分数,而不必在图表上执行步行。 有什么想法吗?
I'm testing some similarity metrics in a graph. I'm using JUNG API to handle the graphs. I've managed to compute basic metrics like common neighbors and prefferential attachment. Now, I want to calculate katz metric that is as follow:
katz(v1,v2) = B.paths_1(v1,v2) + B^2.paths_2(v1,v2) + ...+ B^n.paths_n(v1,v2)
where paths_n(v1,v2) is the number of paths of length "n" between v1 and v2; and B is a scalar. I'm restricting n to 4, so the final katz matrix could be calculated easily through:
B.A + (B.A)^2 + ... + (B.A)^4 where A is the adjacency matrix of the graph. The thing is that the graphs I'm working with are very huge and I'm not able to store the whole katz matrix in memory. Besides, I won't need all the scores because I only test few pairs of nodes.
I can't find an efficient way to calculate individual scores without having to performe a walk on the graph though.
Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
要计算个人得分
ketz(v1,v2)
,您只需要考虑邻接子矩阵,该矩阵仅包含与 v1 或 v2 距离小于 4 的顶点。您可以使用 v1 和 v2 中的呼吸优先搜索来定位这些顶点。
但如果从 v1 开始执行 BFS 时直接计算 #paths,实际上可以做得更好。你只需要记住距 v1 的距离,并在每个顶点检查是否到达 v2。如果是这样,则增加适当的计数器。
类似的东西(伪代码):
因此,运行此命令后,您将得到
counts[n] = paths_n(v1,v2)
for n =1,2,3,4To calculate individual score
ketz(v1,v2)
, you just need to consider the adjacency sub-matrix, which contains only the vertices which are within a distance less than 4 from either v1 or v2.You can locate those vertices using breath-first-search from v1 and from v2.
But you can actually do much better if you directly count the #paths while doing a BFS starting at v1. You only need to remember the distance from v1 and at each vertex check whether you have arrived at v2. If so increment the appropriate counter.
Something like that (pseudo code):
So after you run this, you'll have
counts[n] = paths_n(v1,v2)
for n =1,2,3,4