查找图中 3 个节点(或三角形)的循环

发布于 2024-08-10 20:41:32 字数 182 浏览 9 评论 0原文

我正在处理复杂的网络。我想找到给定图中形成 3 个节点(或三角形)循环的节点组。由于我的图包含大约一百万条边,因此使用简单的迭代解决方案(多个“for”循环)效率不是很高。

我正在使用 python 进行编程,如果这些是一些用于处理这些问题的内置模块,请告诉我。

如果有人知道任何可用于在图中查找三角形的算法,请回复。

I am working with complex networks. I want to find group of nodes which forms a cycle of 3 nodes (or triangles) in a given graph. As my graph contains about million edges, using a simple iterative solution (multiple "for" loop) is not very efficient.

I am using python for my programming, if these is some inbuilt modules for handling these problems, please let me know.

If someone knows any algorithm which can be used for finding triangles in graphs, kindly reply back.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

ι不睡觉的鱼゛ 2024-08-17 20:41:32

假设它是一个无向图,答案就在python的networkx库中。
如果你只需要计算三角形,请使用:

import networkx as nx
tri=nx.triangles(g)

但如果你需要知道具有三角形(三元)关系的边列表,请使用

all_cliques= nx.enumerate_all_cliques(g)

这将为你提供所有派系(k=1,2,3...最大度 - 1)

所以,仅过滤三角形,即 k=3,

triad_cliques=[x for x in all_cliques if len(x)==3 ]

triad_cliques 将给出仅包含三角形的边列表。

Assuming its an undirected graph, the answer lies in networkx library of python.
if you just need to count triangles, use:

import networkx as nx
tri=nx.triangles(g)

But if you need to know the edge list with triangle (triadic) relationship, use

all_cliques= nx.enumerate_all_cliques(g)

This will give you all cliques (k=1,2,3...max degree - 1)

So, to filter just triangles i.e k=3,

triad_cliques=[x for x in all_cliques if len(x)==3 ]

The triad_cliques will give a edge list with only triangles.

热风软妹 2024-08-17 20:41:32

一百万条边相当小。除非您执行了数千次,否则只需使用简单的实现即可。

我假设您有一个 node_ids 字典,它指向其邻居的序列,并且该图是有向的。

例如:

nodes = {}
nodes[0] = 1,2
nodes[1] = tuple() # empty tuple
nodes[2] = 1

我的解决方案:

def generate_triangles(nodes):
    """Generate triangles. Weed out duplicates."""
    visited_ids = set() # remember the nodes that we have tested already
    for node_a_id in nodes:
        for node_b_id in nodes[node_a_id]:
            if nod_b_id == node_a_id:
                raise ValueError # nodes shouldn't point to themselves
            if node_b_id in visited_ids:
                continue # we should have already found b->a->??->b
            for node_c_id in nodes[node_b_id]:
                if node_c_id in visited_ids:
                    continue # we should have already found c->a->b->c
                if node_a_id in nodes[node_c_id]:
                    yield(node_a_id, node_b_id, node_c_id)
        visited_ids.add(node_a_id) # don't search a - we already have all those cycles

检查性能:

from random import randint
n = 1000000
node_list = range(n)
nodes = {}
for node_id in node_list:
    node = tuple()
    for i in range(randint(0,10)): # add up to 10 neighbors
        try:
            neighbor_id = node_list[node_id+randint(-5,5)] # pick a nearby node
        except:
            continue 
        if not neighbor_id in node:
            node = node + (neighbor_id,)
    nodes[node_id] = node

cycles = list(generate_triangles(nodes))
print len(cycles)

当我尝试时,构建随机图比计算周期花费的时间更长。

不过,您可能想测试一下;)我不能保证它是正确的。

您还可以查看networkx,它是大型Python 图形库。

A million edges is quite small. Unless you are doing it thousands of times, just use a naive implementation.

I'll assume that you have a dictionary of node_ids, which point to a sequence of their neighbors, and that the graph is directed.

For example:

nodes = {}
nodes[0] = 1,2
nodes[1] = tuple() # empty tuple
nodes[2] = 1

My solution:

def generate_triangles(nodes):
    """Generate triangles. Weed out duplicates."""
    visited_ids = set() # remember the nodes that we have tested already
    for node_a_id in nodes:
        for node_b_id in nodes[node_a_id]:
            if nod_b_id == node_a_id:
                raise ValueError # nodes shouldn't point to themselves
            if node_b_id in visited_ids:
                continue # we should have already found b->a->??->b
            for node_c_id in nodes[node_b_id]:
                if node_c_id in visited_ids:
                    continue # we should have already found c->a->b->c
                if node_a_id in nodes[node_c_id]:
                    yield(node_a_id, node_b_id, node_c_id)
        visited_ids.add(node_a_id) # don't search a - we already have all those cycles

Checking performance:

from random import randint
n = 1000000
node_list = range(n)
nodes = {}
for node_id in node_list:
    node = tuple()
    for i in range(randint(0,10)): # add up to 10 neighbors
        try:
            neighbor_id = node_list[node_id+randint(-5,5)] # pick a nearby node
        except:
            continue 
        if not neighbor_id in node:
            node = node + (neighbor_id,)
    nodes[node_id] = node

cycles = list(generate_triangles(nodes))
print len(cycles)

When I tried it, it took longer to build the random graph than to count the cycles.

You might want to test it though ;) I won't guarantee that it's correct.

You could also look into networkx, which is the big python graph library.

奶茶白久 2024-08-17 20:41:32

非常简单明了的方法是使用 Networkx:

使用 Networkx,您可以通过 nx.cycle_basis(G) 然后选择有3个节点的

cycls_3 = [c for c in nx.cycle_basis(G) if len(c)==3]

,或者你可以通过find_cliques(G) ,然后选择您想要的(有 3 个节点)。派系是图表的一部分,其中所有节点都相互连接,这发生在具有 3 个节点的循环/循环中。

Pretty easy and clear way to do is to use Networkx:

With Networkx you can get the loops of an undirected graph by nx.cycle_basis(G) and then select the ones with 3 nodes

cycls_3 = [c for c in nx.cycle_basis(G) if len(c)==3]

or you can find all the cliques by find_cliques(G) and then select the ones you want (with 3 nodes). cliques are sections of the graph where all the nodes are connected to each other which happens in cycles/loops with 3 nodes.

九歌凝 2024-08-17 20:41:32

尽管效率不高,但您可能想要实现一个解决方案,因此请使用循环。编写一个测试,以便您了解需要多长时间。

然后,当您尝试新方法时,您可以做两件事:
1)确保答案保持不变。
2)看看有什么改进。

拥有更快的算法而错过一些东西可能会比拥有更慢的算法更糟糕。

一旦进行了慢速测试,您就可以看看是否可以并行执行此操作,并查看性能提升情况。

然后,您可以看看是否可以标记所有少于 3 个顶点的节点。

理想情况下,您可能希望首先将其缩小到 100 左右,这样您就可以绘制它,并以图形方式查看发生了什么。

有时,您的大脑会看到一种在查看算法时并不那么明显的模式。

Even though it isn't efficient, you may want to implement a solution, so use the loops. Write a test so you can get an idea as to how long it takes.

Then, as you try new approaches you can do two things:
1) Make certain that the answer remains the same.
2) See what the improvement is.

Having a faster algorithm that misses something is probably going to be worse than having a slower one.

Once you have the slow test, you can see if you can do this in parallel and see what the performance increase is.

Then, you can see if you can mark all nodes that have less than 3 vertices.

Ideally, you may want to shrink it down to just 100 or so first, so you can draw it, and see what is happening graphically.

Sometimes your brain will see a pattern that isn't as obvious when looking at algorithms.

天涯离梦残月幽梦 2024-08-17 20:41:32

我不想听起来很刺耳,但是你试过用谷歌搜索一下吗?第一个链接是一个非常快速的算法:
http://www.mail-archive.com/[email protected]/msg05642.html

然后是关于 ACM 的这篇文章(您可以访问):
http://portal.acm.org/itation.cfm?id=244866
(如果你没有访问权限,我相信如果你问一下写它的女士,你会得到一份副本。)

另外,我可以想象一种基于派系分解的三角形枚举方法,但我不这样做知道是否在某处描述过。

I don't want to sound harsh, but have you tried to Google it? The first link is a pretty quick algorithm to do that:
http://www.mail-archive.com/[email protected]/msg05642.html

And then there is this article on ACM (which you may have access to):
http://portal.acm.org/citation.cfm?id=244866
(and if you don't have access, I am sure if you kindly ask the lady who wrote it, you will get a copy.)

Also, I can imagine a triangle enumeration method based on clique-decomposition, but I don't know if it was described somewhere.

不忘初心 2024-08-17 20:41:32

我正在研究计算无向图上三角形数量的相同问题,wisty 的解决方案在我的情况下效果非常好。我对其进行了一些修改,因此仅计算无向三角形。

    #### function for counting undirected cycles
    def generate_triangles(nodes):
        visited_ids = set() # mark visited node
        for node_a_id in nodes:
            temp_visited = set() # to get undirected triangles
            for node_b_id in nodes[node_a_id]:
                if node_b_id == node_a_id:
                    raise ValueError # to prevent self-loops, if your graph allows self-loops then you don't need this condition
                if node_b_id in visited_ids:
                    continue
                for node_c_id in nodes[node_b_id]:
                    if node_c_id in visited_ids:
                        continue    
                    if node_c_id in temp_visited:
                        continue
                    if node_a_id in nodes[node_c_id]:
                        yield(node_a_id, node_b_id, node_c_id)
                    else:
                        continue
                temp_visited.add(node_b_id)
            visited_ids.add(node_a_id)

当然,你需要使用字典,例如

    #### Test cycles ####

    nodes = {}

    nodes[0] = [1, 2, 3]
    nodes[1] = [0, 2]
    nodes[2] = [0, 1, 3]
    nodes[3] = [1]

    cycles = list(generate_triangles(nodes))
    print cycles

使用Wisty的代码,找到的三角形将是
[(0, 1, 2), (0, 2, 1), (0, 3, 1), (1, 2, 3)]

计算了三角形 (0, 1, 2) 和 (0, 2, 1)作为两个不同的三角形。根据我修改的代码,这些仅算作一个三角形。

我将其与一个相对较小的字典(少于 100 个键)一起使用,每个键平均有 50 个值。

I am working on the same problem of counting number of triangles on undirected graph and wisty's solution works really well in my case. I have modified it a bit so only undirected triangles are counted.

    #### function for counting undirected cycles
    def generate_triangles(nodes):
        visited_ids = set() # mark visited node
        for node_a_id in nodes:
            temp_visited = set() # to get undirected triangles
            for node_b_id in nodes[node_a_id]:
                if node_b_id == node_a_id:
                    raise ValueError # to prevent self-loops, if your graph allows self-loops then you don't need this condition
                if node_b_id in visited_ids:
                    continue
                for node_c_id in nodes[node_b_id]:
                    if node_c_id in visited_ids:
                        continue    
                    if node_c_id in temp_visited:
                        continue
                    if node_a_id in nodes[node_c_id]:
                        yield(node_a_id, node_b_id, node_c_id)
                    else:
                        continue
                temp_visited.add(node_b_id)
            visited_ids.add(node_a_id)

Of course, you need to use a dictionary for example

    #### Test cycles ####

    nodes = {}

    nodes[0] = [1, 2, 3]
    nodes[1] = [0, 2]
    nodes[2] = [0, 1, 3]
    nodes[3] = [1]

    cycles = list(generate_triangles(nodes))
    print cycles

Using the code of Wisty, the triangles found will be
[(0, 1, 2), (0, 2, 1), (0, 3, 1), (1, 2, 3)]

which counted the triangle (0, 1, 2) and (0, 2, 1) as two different triangles. With the code I modified, these are counted as only one triangle.

I used this with a relatively small dictionary of under 100 keys and each key has on average 50 values.

旧城烟雨 2024-08-17 20:41:32

惊讶地发现没有提及 Networkx 三角形函数。我知道它不一定返回形成三角形的节点组,但应该与许多发现自己在此页面上的人非常相关。

nx.triangles(G) # list of how many triangles each node is part of
sum(nx.triangles(G).values())/3 # total number of triangles

返回节点群的另一种方法是......

for u,v,d in G.edges(data=True):
    u_array = adj_m.getrow(u).nonzero()[1] # get lists of all adjacent nodes
    v_array = adj_m.getrow(v).nonzero()[1]
    # find the intersection of the two sets - these are the third node of the triangle
    np.intersect1d(v_array,u_array)

Surprised to see no mention of the Networkx triangles function. I know it doesn't necessarily return the groups of nodes that form a triangle, but should be pretty relevant to many who find themselves on this page.

nx.triangles(G) # list of how many triangles each node is part of
sum(nx.triangles(G).values())/3 # total number of triangles

An alternative way to return clumps of nodes would be something like...

for u,v,d in G.edges(data=True):
    u_array = adj_m.getrow(u).nonzero()[1] # get lists of all adjacent nodes
    v_array = adj_m.getrow(v).nonzero()[1]
    # find the intersection of the two sets - these are the third node of the triangle
    np.intersect1d(v_array,u_array)
墨落画卷 2024-08-17 20:41:32

如果您不关心同一三角形以不同顺序的多个副本,则可以使用三元组列表:

from itertools import combinations as combos
[(n,nbr,nbr2) for n in G for nbr, nbr2 in combos(G[n],2) if nbr in G[nbr2]]

这里的逻辑是检查每个节点的每对邻居以查看它们是否连接。 G[n] 是一种迭代或查找邻居的快速方法。

如果你想摆脱重新排序,请将每个三元组变成一个冻结集并制作一组冻结集:

set(frozenset([n,nbr,nbr2]) for n in G for nbr, nbr2 in combos(G[n]) if nbr in G[nbr2])

如果你不喜欢冻结集并想要一个集合列表,那么:

triple_iter = ((n, nbr, nbr2) for n in G for nbr, nbr2 in combos(G[n],2) if nbr in G[nbr2])
triangles = set(frozenset(tri) for tri in triple_iter)
nice_triangles = [set(tri) for tri in triangles]

If you don't care about multiple copies of the same triangle in different order then a list of 3-tuples works:

from itertools import combinations as combos
[(n,nbr,nbr2) for n in G for nbr, nbr2 in combos(G[n],2) if nbr in G[nbr2]]

The logic here is to check each pair of neighbors of every node to see if they are connected. G[n] is a fast way to iterate over or look up neighbors.

If you want to get rid of reorderings, turn each triple into a frozenset and make a set of the frozensets:

set(frozenset([n,nbr,nbr2]) for n in G for nbr, nbr2 in combos(G[n]) if nbr in G[nbr2])

If you don't like frozenset and want a list of sets then:

triple_iter = ((n, nbr, nbr2) for n in G for nbr, nbr2 in combos(G[n],2) if nbr in G[nbr2])
triangles = set(frozenset(tri) for tri in triple_iter)
nice_triangles = [set(tri) for tri in triangles]
鲜血染红嫁衣 2024-08-17 20:41:32

您需要找到“所有”“三角形”,还是只是“一些”/“任何”?
或者也许您只需要测试特定节点是否是三角形的一部分?

测试很简单 - 给定一个节点 A,是否有任何两个连接的节点 B 和 A。 C 也直接连接。

如果您需要找到所有三角形 - 具体来说,所有 3 个节点组,其中每个节点都与其他两个节点相连 - 那么您需要在一个非常长的运行“foreach”循环中检查每个可能的组。

唯一的优化是确保您不会两次检查同一个“组”,例如,如果您已经测试了 B & 组。 C 与 A 不在一组,则不检查 A 和 A 是否在一个组中。 C与B分在一组。

Do you need to find 'all' of the 'triangles', or just 'some'/'any'?
Or perhaps you just need to test whether a particular node is part of a triangle?

The test is simple - given a node A, are there any two connected nodes B & C that are also directly connected.

If you need to find all of the triangles - specifically, all groups of 3 nodes in which each node is joined to the other two - then you need to check every possible group in a very long running 'for each' loop.

The only optimisation is ensuring that you don't check the same 'group' twice, e.g. if you have already tested that B & C aren't in a group with A, then don't check whether A & C are in a group with B.

牵强ㄟ 2024-08-17 20:41:32

这是 Ajay M 答案 的更有效版本(我会评论它,但我没有足够的声誉)。

事实上,networkxenumerate_all_cliques 方法将返回图中的所有 cliques,无论其长度如何;因此循环可能会花费很多时间(尤其是对于非常密集的图形)。

此外,一旦为三角形定义了,只需参数化即可概括每个团长度的方法,因此这里有一个函数:

import networkx as nx

def get_cliques_by_length(G, length_clique):
    """ Return the list of all cliques in an undirected graph G with length 
    equal to length_clique. """
    cliques = []
    for c in nx.enumerate_all_cliques(G) :
        if len(c) <= length_clique:
            if len(c) == length_clique:
                cliques.append(c)            
        else:
            return cliques
    # return empty list if nothing is found
    return cliques

要获取三角形,只需使用 get_cliques_by_length(G, 3) 。

警告:此方法仅适用于无向图。 networkx 中未提供有向图中的派系算法

This is a more efficient version of Ajay M answer (I would have commented it, but I've not enough reputation).

Indeed the enumerate_all_cliques method of networkx will return all cliques in the graph, irrespectively of their length; hence looping over it may take a lot of time (especially with very dense graphs).

Moreover, once defined for triangles, it's just a matter of parametrization to generalize the method for every clique length so here's a function:

import networkx as nx

def get_cliques_by_length(G, length_clique):
    """ Return the list of all cliques in an undirected graph G with length 
    equal to length_clique. """
    cliques = []
    for c in nx.enumerate_all_cliques(G) :
        if len(c) <= length_clique:
            if len(c) == length_clique:
                cliques.append(c)            
        else:
            return cliques
    # return empty list if nothing is found
    return cliques

To get triangles just use get_cliques_by_length(G, 3).

Caveat: this method works only for undirected graphs. Algorithm for cliques in directed graphs are not provided in networkx

你与清晨阳光 2024-08-17 20:41:32

我刚刚发现 nx.edge_disjoint_paths 可以计算包含某些边的三角形。比 nx.enumerate_all_cliques 和 nx.cycle_basis 更快。
它返回源和目标之间的边不相交路径。边不相交路径是不共享任何边的路径。
result-1 是包含某些边或源节点和目标节点之间的三角形的数量。

edge_triangle_dict = {}
for i in g.edges:
    edge_triangle_dict[i] = len(list(nx.edge_disjoint_paths(g, i[0], i[1]))-1)

i just found that nx.edge_disjoint_paths works to count the triangle contains certain edges. faster than nx.enumerate_all_cliques and nx.cycle_basis.
It returns the edges disjoint paths between source and target.Edge disjoint paths are paths that do not share any edge.
And result-1 is the number of triangles that contain certain edges or between source node and target node.

edge_triangle_dict = {}
for i in g.edges:
    edge_triangle_dict[i] = len(list(nx.edge_disjoint_paths(g, i[0], i[1]))-1)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文