python中二分图的高效投影(使用networkx)

发布于 2024-12-05 12:47:15 字数 1480 浏览 1 评论 0原文

使用 networkx 模块,我在 Python 3.2 下进行了一些网络分析,其中我需要将二分图(链接到其牢房的囚犯:在下面的代码中输入图 B)投影到一个子图(如果牢房中的囚犯都有同一单元中的重叠咒语:使用定义图 B 的囚犯节点的集合节点的输入,生成输出图 G)。我不需要特殊的算法来提出任何匹配或最佳匹配,我只需要收集满足某些条件的所有链接。因此我发现的其他 SO 帖子并不真正适用。但是:

当我给它提供越来越多的数据时,我当前的代码正在崩溃(RAM、交换和 CPU 方面)。如果您发现使用 5 层循环简化下面代码的方法,请告诉我。我不确定是否需要任何有关 networkx 的知识,或者我的边缘属性标签的详细信息是否相关。谢谢!

def time_overlap_projected_graph_parallel(B, nodes):
    G=nx.MultiGraph()
    G.add_nodes_from((n,B.node[n]) for n in nodes)
    for u in nodes:
        unbrs = set(B[u])
        nbrs2 = set((n for nbr in unbrs for n in B[nbr])) - set([u])
        for v in nbrs2:
            for mutual_cell in set(B[u]) & set(B[v]):
                for uspell in B.get_edge_data(u,mutual_cell).values():
                    ustart = uspell[1]
                    uend = uspell[2]
                    for vspell in B.get_edge_data(v,mutual_cell).values():
                        vstart = vspell[1]
                        vend = vspell[2]
                        if uend > vstart and vend > ustart:
                            ostart = max(ustart,vstart)
                            oend = min(uend,vend)
                            olen = (oend-ostart+1)/86400
                            ocell = mutual_cell
                            if (v not in G[u] or ostart not in [ edict[1] for edict in G[u][v].values() ]):
                                G.add_edges_from([(u,v,{0: olen,1: ostart,2: oend,3: ocell})])
    return G

Using the networkx module, I do some network analysis under Python 3.2, where I need to project a bipartite graph (of inmates linked to their cell: input graph B in the code below) to a subgraph (linking cellmates to each other if both had an overlapping spell in the same cell: using the input of set nodes defining the inmate-nodes of graph B, generating output graph G). I don't need a special algorithm to come up with any or an optimal matching, I just need to collect all links that satisfy some conditions. Thus other SO posts I found do not really apply. But:

My current code is blowing up (RAM-, swap-, and CPU-wise) as I give it more and more data. Please let me know if you see ways to streamline the code below with 5 layers of loops. I am not sure any knowledge of networkx is necessary or details of my labeling of edge attributes is relevant. Thanks!

def time_overlap_projected_graph_parallel(B, nodes):
    G=nx.MultiGraph()
    G.add_nodes_from((n,B.node[n]) for n in nodes)
    for u in nodes:
        unbrs = set(B[u])
        nbrs2 = set((n for nbr in unbrs for n in B[nbr])) - set([u])
        for v in nbrs2:
            for mutual_cell in set(B[u]) & set(B[v]):
                for uspell in B.get_edge_data(u,mutual_cell).values():
                    ustart = uspell[1]
                    uend = uspell[2]
                    for vspell in B.get_edge_data(v,mutual_cell).values():
                        vstart = vspell[1]
                        vend = vspell[2]
                        if uend > vstart and vend > ustart:
                            ostart = max(ustart,vstart)
                            oend = min(uend,vend)
                            olen = (oend-ostart+1)/86400
                            ocell = mutual_cell
                            if (v not in G[u] or ostart not in [ edict[1] for edict in G[u][v].values() ]):
                                G.add_edges_from([(u,v,{0: olen,1: ostart,2: oend,3: ocell})])
    return G

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

醉梦枕江山 2024-12-12 12:47:15

我猜现在可以使用二分图了。就像

import networkx as nx
from networkx.algorithms import bipartite

B.add_nodes_from(inmates_list, bipartite=0)
B.add_nodes_from(cells_list, bipartite=1)

inmates = set(n for n,d in B.nodes(data=True) if d['bipartite']==0)
cells = = set(B) - inmates
G = bipartite.projected_graph(B, inmates)

http://networkx.lanl.gov/reference/algorithms.bipartite.html)

One could now use bipartite graphs I guess. Like

import networkx as nx
from networkx.algorithms import bipartite

B.add_nodes_from(inmates_list, bipartite=0)
B.add_nodes_from(cells_list, bipartite=1)

inmates = set(n for n,d in B.nodes(data=True) if d['bipartite']==0)
cells = = set(B) - inmates
G = bipartite.projected_graph(B, inmates)

(http://networkx.lanl.gov/reference/algorithms.bipartite.html)

回忆凄美了谁 2024-12-12 12:47:15

这是我的看法。根据每个牢房的平均囚犯数量,它可能会提高性能。如果您有更好的方法来获取单元格(例如节点属性?),请替换

cells = [n for n in B.nodes() if n[0] not in nodes]

为(这里我假设节点是所有囚犯的列表)。

from itertools import combinations

def time_overlap_projected_graph_parallel(B, nodes):
    G=nx.MultiGraph()
    G.add_nodes_from((n,B.node[n]) for n in nodes)
    cells = [n for n in B.nodes() if n[0] not in nodes]
    for cell in cells:
        for u,v in combinations(B[cell],2):
            for uspell in B.get_edge_data(u,cell).values():
                ustart = uspell[1]
                uend = uspell[2]
                for vspell in B.get_edge_data(v,cell).values():
                    vstart = vspell[1]
                    vend = vspell[2]
                    if uend > vstart and vend > ustart:
                        ostart = max(ustart,vstart)
                        oend = min(uend,vend)
                        olen = (oend-ostart+1)/86400
                        ocell = cell
                        if (v not in G[u] or ostart not in [ edict[1] for edict in G[u][v].values() ]):
                            G.add_edge(u,v,{0: olen,1: ostart,2: oend,3: ocell})
    return G

Here is my take. Depending on average inmates per cell, it might improve performance. If you have a better way to get cells (e.g. node properties?), replace

cells = [n for n in B.nodes() if n[0] not in nodes]

with that (here I'm assuming nodes is a list of all inmates).

from itertools import combinations

def time_overlap_projected_graph_parallel(B, nodes):
    G=nx.MultiGraph()
    G.add_nodes_from((n,B.node[n]) for n in nodes)
    cells = [n for n in B.nodes() if n[0] not in nodes]
    for cell in cells:
        for u,v in combinations(B[cell],2):
            for uspell in B.get_edge_data(u,cell).values():
                ustart = uspell[1]
                uend = uspell[2]
                for vspell in B.get_edge_data(v,cell).values():
                    vstart = vspell[1]
                    vend = vspell[2]
                    if uend > vstart and vend > ustart:
                        ostart = max(ustart,vstart)
                        oend = min(uend,vend)
                        olen = (oend-ostart+1)/86400
                        ocell = cell
                        if (v not in G[u] or ostart not in [ edict[1] for edict in G[u][v].values() ]):
                            G.add_edge(u,v,{0: olen,1: ostart,2: oend,3: ocell})
    return G
偏爱你一生 2024-12-12 12:47:15

我发布这个答案是为了提出一些改进建议。我假设您的二分图不是多重图,而是普通的nx.Graph。我将 B 更改为 bi,将 G 更改为 uni,因为按照惯例,大写名称是为类保留的。顺便说一句,如果咒语在同一时间开始和结束怎么办?

def time_overlap_projected_graph_parallel(bi, nodes):
    uni = nx.MultiGraph()
    for u in nodes: #inmate
        uni.add_node(u) # do this to prevent iterating nodes twice
        u_adj = bi.adj[u] # bi.adj is a dict of dicts
        for (w, uw_attr) in u_adj.iteritems(): # cell
            w_adj = bi.adj[w]
            for (v, wv_attr) in w_adj.iteritems():#cellmate
                if v == u:
                    continue
                elif uni.has_edge(u, v): # avoid computing twice
                    continue
                for uspell in uw_attr.itervalues():
                    ustart = uspell[1]
                    uend = uspell[2]
                    for vspell in wv_attr.itervalues():
                        vstart = vspell[1]
                        vend = vspell[2]
                        if uend > vstart and vend > ustart:
                            ostart = max(ustart, vstart)
                            oend = min(uend, vend)
                            olen = (oend - ostart + 1) / 86400 # this assumes floats or Python 3 otherwise will be 0
                            ocell = w
                            # I would change this to uni.add_edge(u, v, length=olen, start=ostart, end=oend, cell=ocell)
                            # or to uni.add_edge(u, v, spell=[olen, ostart, oend, ocell])
                            uni.add_edge(u, v, **{0: olen, 1: ostart, 2: oend, 3: ocell})
    return uni

I'm posting this answer to suggest a few improvements. I will assume that your bipartite graph is not a multi graph but a normal nx.Graph. I changed B to bi and G to uni since upper case names are reserved for classes by convention. Btw, what if spells started and ended at the exact same time?

def time_overlap_projected_graph_parallel(bi, nodes):
    uni = nx.MultiGraph()
    for u in nodes: #inmate
        uni.add_node(u) # do this to prevent iterating nodes twice
        u_adj = bi.adj[u] # bi.adj is a dict of dicts
        for (w, uw_attr) in u_adj.iteritems(): # cell
            w_adj = bi.adj[w]
            for (v, wv_attr) in w_adj.iteritems():#cellmate
                if v == u:
                    continue
                elif uni.has_edge(u, v): # avoid computing twice
                    continue
                for uspell in uw_attr.itervalues():
                    ustart = uspell[1]
                    uend = uspell[2]
                    for vspell in wv_attr.itervalues():
                        vstart = vspell[1]
                        vend = vspell[2]
                        if uend > vstart and vend > ustart:
                            ostart = max(ustart, vstart)
                            oend = min(uend, vend)
                            olen = (oend - ostart + 1) / 86400 # this assumes floats or Python 3 otherwise will be 0
                            ocell = w
                            # I would change this to uni.add_edge(u, v, length=olen, start=ostart, end=oend, cell=ocell)
                            # or to uni.add_edge(u, v, spell=[olen, ostart, oend, ocell])
                            uni.add_edge(u, v, **{0: olen, 1: ostart, 2: oend, 3: ocell})
    return uni
山色无中 2024-12-12 12:47:15

考虑修改后的代码,它可能会执行相同的操作,但是使用迭代器(尽管我还修改了一些与 networkx 相关的函数/方法调用 --- 但我仍在检查是否破坏了某些内容):

def time_overlap_projected_graph_parallel(B, nodes):
    G=nx.MultiGraph()
    G.add_nodes_from(nodes)
    for u in G.nodes_iter():#inmate
        for w in B.neighbors_iter(u):#cell
            for v in B.neighbors_iter(w):#cellmate
                if v == u:
                    continue
                for uspell in B[u][w].values():
                    ustart = uspell[1]
                    uend = uspell[2]
                    for vspell in B[v][w].values():
                        vstart = vspell[1]
                        vend = vspell[2]
                        if uend > vstart and vend > ustart:
                            ostart = max(ustart,vstart)
                            oend = min(uend,vend)
                            olen = (oend-ostart+1)/86400
                            ocell = w
                            if (v not in G[u] or ostart not in [ edict[1] for edict in G[u][v].values() ]):
                                G.add_edges_from([(u,v,{0: olen,1: ostart,2: oend,3: ocell})])
    return G

Consider the revised code, which might do the same, but with iterators (though I also revised a few networkx-related function/method-calls --- but I'm still checking if I broke something):

def time_overlap_projected_graph_parallel(B, nodes):
    G=nx.MultiGraph()
    G.add_nodes_from(nodes)
    for u in G.nodes_iter():#inmate
        for w in B.neighbors_iter(u):#cell
            for v in B.neighbors_iter(w):#cellmate
                if v == u:
                    continue
                for uspell in B[u][w].values():
                    ustart = uspell[1]
                    uend = uspell[2]
                    for vspell in B[v][w].values():
                        vstart = vspell[1]
                        vend = vspell[2]
                        if uend > vstart and vend > ustart:
                            ostart = max(ustart,vstart)
                            oend = min(uend,vend)
                            olen = (oend-ostart+1)/86400
                            ocell = w
                            if (v not in G[u] or ostart not in [ edict[1] for edict in G[u][v].values() ]):
                                G.add_edges_from([(u,v,{0: olen,1: ostart,2: oend,3: ocell})])
    return G
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文