查找图中 3 个节点(或三角形)的循环
我正在处理复杂的网络。我想找到给定图中形成 3 个节点(或三角形)循环的节点组。由于我的图包含大约一百万条边,因此使用简单的迭代解决方案(多个“for”循环)效率不是很高。
我正在使用 python 进行编程,如果这些是一些用于处理这些问题的内置模块,请告诉我。
如果有人知道任何可用于在图中查找三角形的算法,请回复。
I am working with complex networks. I want to find group of nodes which forms a cycle of 3 nodes (or triangles) in a given graph. As my graph contains about million edges, using a simple iterative solution (multiple "for" loop) is not very efficient.
I am using python for my programming, if these is some inbuilt modules for handling these problems, please let me know.
If someone knows any algorithm which can be used for finding triangles in graphs, kindly reply back.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
假设它是一个无向图,答案就在python的networkx库中。
如果你只需要计算三角形,请使用:
但如果你需要知道具有三角形(三元)关系的边列表,请使用
这将为你提供所有派系(k=1,2,3...最大度 - 1)
所以,仅过滤三角形,即 k=3,
triad_cliques 将给出仅包含三角形的边列表。
Assuming its an undirected graph, the answer lies in networkx library of python.
if you just need to count triangles, use:
But if you need to know the edge list with triangle (triadic) relationship, use
This will give you all cliques (k=1,2,3...max degree - 1)
So, to filter just triangles i.e k=3,
The triad_cliques will give a edge list with only triangles.
一百万条边相当小。除非您执行了数千次,否则只需使用简单的实现即可。
我假设您有一个 node_ids 字典,它指向其邻居的序列,并且该图是有向的。
例如:
我的解决方案:
检查性能:
当我尝试时,构建随机图比计算周期花费的时间更长。
不过,您可能想测试一下;)我不能保证它是正确的。
您还可以查看networkx,它是大型Python 图形库。
A million edges is quite small. Unless you are doing it thousands of times, just use a naive implementation.
I'll assume that you have a dictionary of node_ids, which point to a sequence of their neighbors, and that the graph is directed.
For example:
My solution:
Checking performance:
When I tried it, it took longer to build the random graph than to count the cycles.
You might want to test it though ;) I won't guarantee that it's correct.
You could also look into networkx, which is the big python graph library.
非常简单明了的方法是使用 Networkx:
使用 Networkx,您可以通过 nx.cycle_basis(G) 然后选择有3个节点的
,或者你可以通过find_cliques(G) ,然后选择您想要的(有 3 个节点)。派系是图表的一部分,其中所有节点都相互连接,这发生在具有 3 个节点的循环/循环中。
Pretty easy and clear way to do is to use Networkx:
With Networkx you can get the loops of an undirected graph by nx.cycle_basis(G) and then select the ones with 3 nodes
or you can find all the cliques by find_cliques(G) and then select the ones you want (with 3 nodes). cliques are sections of the graph where all the nodes are connected to each other which happens in cycles/loops with 3 nodes.
尽管效率不高,但您可能想要实现一个解决方案,因此请使用循环。编写一个测试,以便您了解需要多长时间。
然后,当您尝试新方法时,您可以做两件事:
1)确保答案保持不变。
2)看看有什么改进。
拥有更快的算法而错过一些东西可能会比拥有更慢的算法更糟糕。
一旦进行了慢速测试,您就可以看看是否可以并行执行此操作,并查看性能提升情况。
然后,您可以看看是否可以标记所有少于 3 个顶点的节点。
理想情况下,您可能希望首先将其缩小到 100 左右,这样您就可以绘制它,并以图形方式查看发生了什么。
有时,您的大脑会看到一种在查看算法时并不那么明显的模式。
Even though it isn't efficient, you may want to implement a solution, so use the loops. Write a test so you can get an idea as to how long it takes.
Then, as you try new approaches you can do two things:
1) Make certain that the answer remains the same.
2) See what the improvement is.
Having a faster algorithm that misses something is probably going to be worse than having a slower one.
Once you have the slow test, you can see if you can do this in parallel and see what the performance increase is.
Then, you can see if you can mark all nodes that have less than 3 vertices.
Ideally, you may want to shrink it down to just 100 or so first, so you can draw it, and see what is happening graphically.
Sometimes your brain will see a pattern that isn't as obvious when looking at algorithms.
我不想听起来很刺耳,但是你试过用谷歌搜索一下吗?第一个链接是一个非常快速的算法:
http://www.mail-archive.com/[email protected]/msg05642.html
然后是关于 ACM 的这篇文章(您可以访问):
http://portal.acm.org/itation.cfm?id=244866
(如果你没有访问权限,我相信如果你问一下写它的女士,你会得到一份副本。)
另外,我可以想象一种基于派系分解的三角形枚举方法,但我不这样做知道是否在某处描述过。
I don't want to sound harsh, but have you tried to Google it? The first link is a pretty quick algorithm to do that:
http://www.mail-archive.com/[email protected]/msg05642.html
And then there is this article on ACM (which you may have access to):
http://portal.acm.org/citation.cfm?id=244866
(and if you don't have access, I am sure if you kindly ask the lady who wrote it, you will get a copy.)
Also, I can imagine a triangle enumeration method based on clique-decomposition, but I don't know if it was described somewhere.
我正在研究计算无向图上三角形数量的相同问题,wisty 的解决方案在我的情况下效果非常好。我对其进行了一些修改,因此仅计算无向三角形。
当然,你需要使用字典,例如
使用Wisty的代码,找到的三角形将是
[(0, 1, 2), (0, 2, 1), (0, 3, 1), (1, 2, 3)]
计算了三角形 (0, 1, 2) 和 (0, 2, 1)作为两个不同的三角形。根据我修改的代码,这些仅算作一个三角形。
我将其与一个相对较小的字典(少于 100 个键)一起使用,每个键平均有 50 个值。
I am working on the same problem of counting number of triangles on undirected graph and wisty's solution works really well in my case. I have modified it a bit so only undirected triangles are counted.
Of course, you need to use a dictionary for example
Using the code of Wisty, the triangles found will be
[(0, 1, 2), (0, 2, 1), (0, 3, 1), (1, 2, 3)]
which counted the triangle (0, 1, 2) and (0, 2, 1) as two different triangles. With the code I modified, these are counted as only one triangle.
I used this with a relatively small dictionary of under 100 keys and each key has on average 50 values.
惊讶地发现没有提及 Networkx 三角形函数。我知道它不一定返回形成三角形的节点组,但应该与许多发现自己在此页面上的人非常相关。
返回节点群的另一种方法是......
Surprised to see no mention of the Networkx triangles function. I know it doesn't necessarily return the groups of nodes that form a triangle, but should be pretty relevant to many who find themselves on this page.
An alternative way to return clumps of nodes would be something like...
如果您不关心同一三角形以不同顺序的多个副本,则可以使用三元组列表:
这里的逻辑是检查每个节点的每对邻居以查看它们是否连接。
G[n]
是一种迭代或查找邻居的快速方法。如果你想摆脱重新排序,请将每个三元组变成一个冻结集并制作一组冻结集:
如果你不喜欢冻结集并想要一个集合列表,那么:
If you don't care about multiple copies of the same triangle in different order then a list of 3-tuples works:
The logic here is to check each pair of neighbors of every node to see if they are connected.
G[n]
is a fast way to iterate over or look up neighbors.If you want to get rid of reorderings, turn each triple into a frozenset and make a set of the frozensets:
If you don't like frozenset and want a list of sets then:
您需要找到“所有”“三角形”,还是只是“一些”/“任何”?
或者也许您只需要测试特定节点是否是三角形的一部分?
测试很简单 - 给定一个节点 A,是否有任何两个连接的节点 B 和 A。 C 也直接连接。
如果您需要找到所有三角形 - 具体来说,所有 3 个节点组,其中每个节点都与其他两个节点相连 - 那么您需要在一个非常长的运行“foreach”循环中检查每个可能的组。
唯一的优化是确保您不会两次检查同一个“组”,例如,如果您已经测试了 B & 组。 C 与 A 不在一组,则不检查 A 和 A 是否在一个组中。 C与B分在一组。
Do you need to find 'all' of the 'triangles', or just 'some'/'any'?
Or perhaps you just need to test whether a particular node is part of a triangle?
The test is simple - given a node A, are there any two connected nodes B & C that are also directly connected.
If you need to find all of the triangles - specifically, all groups of 3 nodes in which each node is joined to the other two - then you need to check every possible group in a very long running 'for each' loop.
The only optimisation is ensuring that you don't check the same 'group' twice, e.g. if you have already tested that B & C aren't in a group with A, then don't check whether A & C are in a group with B.
这是 Ajay M 答案 的更有效版本(我会评论它,但我没有足够的声誉)。
事实上,
networkx
的enumerate_all_cliques
方法将返回图中的所有 cliques,无论其长度如何;因此循环可能会花费很多时间(尤其是对于非常密集的图形)。此外,一旦为三角形定义了,只需参数化即可概括每个团长度的方法,因此这里有一个函数:
要获取三角形,只需使用 get_cliques_by_length(G, 3) 。
警告:此方法仅适用于无向图。
networkx
中未提供有向图中的派系算法This is a more efficient version of Ajay M answer (I would have commented it, but I've not enough reputation).
Indeed the
enumerate_all_cliques
method ofnetworkx
will return all cliques in the graph, irrespectively of their length; hence looping over it may take a lot of time (especially with very dense graphs).Moreover, once defined for triangles, it's just a matter of parametrization to generalize the method for every clique length so here's a function:
To get triangles just use
get_cliques_by_length(G, 3)
.Caveat: this method works only for undirected graphs. Algorithm for cliques in directed graphs are not provided in
networkx
我刚刚发现 nx.edge_disjoint_paths 可以计算包含某些边的三角形。比 nx.enumerate_all_cliques 和 nx.cycle_basis 更快。
它返回源和目标之间的边不相交路径。边不相交路径是不共享任何边的路径。
result-1 是包含某些边或源节点和目标节点之间的三角形的数量。
i just found that
nx.edge_disjoint_paths
works to count the triangle contains certain edges. faster thannx.enumerate_all_cliques
andnx.cycle_basis
.It returns the edges disjoint paths between source and target.Edge disjoint paths are paths that do not share any edge.
And result-1 is the number of triangles that contain certain edges or between source node and target node.