如何简化(或通过多处理并行化)一些具有许多循环的 python 3.2 代码?

发布于 2024-12-03 22:50:59 字数 5205 浏览 2 评论 0原文

我尝试自学如何优化 python 并潜在地并行化我的问题。我使用的链接列在我发布的问题中 http ://www.python-forum.org/pythonforum/viewtopic.php?f=18&t=28855

但是,我的基于networkx模块的应用程序仍然随着网络的大小而扩展得非常糟糕。网络,我认为这主要是我的循环的错误,而不是特定于网络的错误。凭借我的一点经验和专注于完成这项工作,如果您能浏览我的代码并指出应如何制定循环或可能使用其他一些技巧,我将不胜感激。任何粗略的评论都可以,我会尝试从那里获取它。

以下是我预计会成为瓶颈的关键且混乱的部分:

该部分将囚犯-牢房(二分)图的二分图投影到狱友-狱友图中。

# Dictionary for edge attributes in projected graph:
# 0: overlap_length
# 1: overlap_start
# 2: overlap_end
# 3: cell
# 4: level

def time_overlap_projected_graph_parallel(B, nodes):
    G=nx.MultiGraph()
    G.add_nodes_from((n,B.node[n]) for n in nodes)
    for u in nodes:
        unbrs = set(B[u])
        nbrs2 = set((n for nbr in unbrs for n in B[nbr])) - set([u])
        for v in nbrs2:
            for mutual_cell in set(B[u]) & set(B[v]):
                for uspell in B.get_edge_data(u,mutual_cell).values():
                    ustart = uspell[1]
                    uend = uspell[2]
                    for vspell in B.get_edge_data(v,mutual_cell).values():
                        vstart = vspell[1]
                        vend = vspell[2]
                        if uend > vstart and vend > ustart:
                            ostart = max(ustart,vstart)
                            oend = min(uend,vend)
                            olen = (oend-ostart+1)/86400
                            ocell = mutual_cell
                            if (v not in G[u] or ostart not in [ edict[1] for edict in G[u][v].values() ]):
                                G.add_edges_from([(u,v,{0: olen,1: ostart,2: oend,3: ocell})])
    return G

这篇文章通过合并同一个人之间的多个链接(同一单元格中的咒语)来清除之前的结果。

# Dictionary for edge attributes in consolidated graph:
# 5: total_overlap
# 6: first_overlap
# 7: last_overlap
# 8: cell_list
# 9: spell_list

def consolidated_g_parallel(B):
    G = nx.Graph()
    G.add_nodes_from(B)
    for n in B.nodes_iter():
        for m in B.neighbors_iter(n):
            omin = 2000000000.0
            omax = 0.0
            otot = 0.0
            cells = []
            spells = []
            for ed in B.get_edge_data(m,n).values():
                omin = min(omin,ed[1])
                omax = max(omax,ed[2])
                otot += ed[0]
                cells.append(ed[3])
                spells.append((ed[1],ed[2]))
            G.add_edges_from([(n,m,{5: otot,6: omin,7: omax,8: cells,9: spells})])
    return G

这会生成一个新的图表,其中先前结果的所有高阶链接(例如,狱友的狱友,至少达到由参数设置的顺序)都变成直接链接,由于重要的时间顺序,这需要一些强力的努力的链接。

def consolidated_mdg_parallel(B):
    print('number of self-loops',B.number_of_selfloops())
    G = B.to_directed()
    for u,v,d in G.edges(data=True):
        d[4]=1
    for n in G.nodes_iter():
        sinks = nx.algorithms.boundary.node_boundary(B,[n])
        nsink = list(sinks)
        nsink.append(n)
        sources = nx.algorithms.boundary.node_boundary(B,nsink)
        while len(sinks) > 0 and len(sources) > 0 :
            for u in sources:
                for v in nx.algorithms.boundary.node_boundary(B,[u],sinks):
                    for edata in G[u][v].values():
                        for ed in G[v][n].values():
                            if edata[1] <= ed[2]:
                                oend = min(edata[2],ed[2])
                                ostart = max(edata[1],ed[1])
                                otot = min(ed[0],edata[0])
                                lvl = edata[4] + 1
                                G.add_edges_from([(u,n,{0: otot,1: ostart,2: oend,3: {},4: lvl})])
            nsink.extend(sources)
            sinks = list(set(sources)&set(G.predecessors(n)))
            sources = nx.algorithms.boundary.node_boundary(B,nsink)
    H = nx.MultiDiGraph()
    for n in G.nodes_iter():
        for m in G.neighbors_iter(n):
            omin = 2000000000.0
            omax = 0.0
            otot = 0.0
            for lvl in range(1,maxorder):
                for ed in G.get_edge_data(n,m).values():
                    if ed[4] == lvl:
                        otot += ed[0]
                if otot > 0:
                    H.add_edges_from([(n,m,{0: otot,4: lvl})])
                otot = 0.0
#                               print('Higher-order link added!')
    return H

这一部分(当然不是函数)为每个可能的链接顺序编写单独的文本文件,并列出边缘。

empty = ['nothing']
for i in range(1,maxorder):
    empty.append(set(J.nodes()))

# Write simple adjacency representation for spmat --- note losing track of length of spells and chronology!
for i in range(1,maxorder):
    targets[i].write('0 '+str(J.number_of_nodes())+' SHAPE KEY')

for e in J.edges_iter(data=True):
# (u,n,{0: otot,1: ostart,2: oend,3: {},4: lvl})
        print(e)
        if e[2][4] < maxorder:
            targets[e[2][4]].write('\n{} {} {}'.format(str(e[1]),str(e[0]),str(1/e[2][0])))
            try:
                empty[e[2][4]].remove(e[1])
            except:
                pass
for i in range(1,maxorder):
    for n in list(empty[i]):
        targets[i].write('\n'+str(n))
    targets[i].close()

I tried to educate myself about optimizing python and potentially parallelizing my problem. The links I used are listed at my question posted over at http://www.python-forum.org/pythonforum/viewtopic.php?f=18&t=28855

However, my application based on the networkx module still scales very badly with the size of the network, which I think is mainly a fault of my loops, not networkx-specific. With my little experience and focus to get this single job done, I would greatly appreciate if you could skim my code and pinpoint how the loops should be formulated instead or some other trick might be used. Any cursory comment will do, I'll try to take it from there.

Here are key and messy pieces which I expect to be the bottlenecks:

This piece projects a bipartite graph of prisoner-cell (bipartite) graph into a cellmate-cellmate graph.

# Dictionary for edge attributes in projected graph:
# 0: overlap_length
# 1: overlap_start
# 2: overlap_end
# 3: cell
# 4: level

def time_overlap_projected_graph_parallel(B, nodes):
    G=nx.MultiGraph()
    G.add_nodes_from((n,B.node[n]) for n in nodes)
    for u in nodes:
        unbrs = set(B[u])
        nbrs2 = set((n for nbr in unbrs for n in B[nbr])) - set([u])
        for v in nbrs2:
            for mutual_cell in set(B[u]) & set(B[v]):
                for uspell in B.get_edge_data(u,mutual_cell).values():
                    ustart = uspell[1]
                    uend = uspell[2]
                    for vspell in B.get_edge_data(v,mutual_cell).values():
                        vstart = vspell[1]
                        vend = vspell[2]
                        if uend > vstart and vend > ustart:
                            ostart = max(ustart,vstart)
                            oend = min(uend,vend)
                            olen = (oend-ostart+1)/86400
                            ocell = mutual_cell
                            if (v not in G[u] or ostart not in [ edict[1] for edict in G[u][v].values() ]):
                                G.add_edges_from([(u,v,{0: olen,1: ostart,2: oend,3: ocell})])
    return G

This piece clears up the previous result by consolidating multiple links (spells in the same cell) between the same people.

# Dictionary for edge attributes in consolidated graph:
# 5: total_overlap
# 6: first_overlap
# 7: last_overlap
# 8: cell_list
# 9: spell_list

def consolidated_g_parallel(B):
    G = nx.Graph()
    G.add_nodes_from(B)
    for n in B.nodes_iter():
        for m in B.neighbors_iter(n):
            omin = 2000000000.0
            omax = 0.0
            otot = 0.0
            cells = []
            spells = []
            for ed in B.get_edge_data(m,n).values():
                omin = min(omin,ed[1])
                omax = max(omax,ed[2])
                otot += ed[0]
                cells.append(ed[3])
                spells.append((ed[1],ed[2]))
            G.add_edges_from([(n,m,{5: otot,6: omin,7: omax,8: cells,9: spells})])
    return G

This generates a new graph where all higher-order links (say, cellmates' cellmates, at least up to an order set by a parameter) of the previous result become direct links, which needs some brute-force hard work because of the important chronology of the links.

def consolidated_mdg_parallel(B):
    print('number of self-loops',B.number_of_selfloops())
    G = B.to_directed()
    for u,v,d in G.edges(data=True):
        d[4]=1
    for n in G.nodes_iter():
        sinks = nx.algorithms.boundary.node_boundary(B,[n])
        nsink = list(sinks)
        nsink.append(n)
        sources = nx.algorithms.boundary.node_boundary(B,nsink)
        while len(sinks) > 0 and len(sources) > 0 :
            for u in sources:
                for v in nx.algorithms.boundary.node_boundary(B,[u],sinks):
                    for edata in G[u][v].values():
                        for ed in G[v][n].values():
                            if edata[1] <= ed[2]:
                                oend = min(edata[2],ed[2])
                                ostart = max(edata[1],ed[1])
                                otot = min(ed[0],edata[0])
                                lvl = edata[4] + 1
                                G.add_edges_from([(u,n,{0: otot,1: ostart,2: oend,3: {},4: lvl})])
            nsink.extend(sources)
            sinks = list(set(sources)&set(G.predecessors(n)))
            sources = nx.algorithms.boundary.node_boundary(B,nsink)
    H = nx.MultiDiGraph()
    for n in G.nodes_iter():
        for m in G.neighbors_iter(n):
            omin = 2000000000.0
            omax = 0.0
            otot = 0.0
            for lvl in range(1,maxorder):
                for ed in G.get_edge_data(n,m).values():
                    if ed[4] == lvl:
                        otot += ed[0]
                if otot > 0:
                    H.add_edges_from([(n,m,{0: otot,4: lvl})])
                otot = 0.0
#                               print('Higher-order link added!')
    return H

This piece (not a function, of course) writes separate text files for each of the possible order of links, with the edges listed.

empty = ['nothing']
for i in range(1,maxorder):
    empty.append(set(J.nodes()))

# Write simple adjacency representation for spmat --- note losing track of length of spells and chronology!
for i in range(1,maxorder):
    targets[i].write('0 '+str(J.number_of_nodes())+' SHAPE KEY')

for e in J.edges_iter(data=True):
# (u,n,{0: otot,1: ostart,2: oend,3: {},4: lvl})
        print(e)
        if e[2][4] < maxorder:
            targets[e[2][4]].write('\n{} {} {}'.format(str(e[1]),str(e[0]),str(1/e[2][0])))
            try:
                empty[e[2][4]].remove(e[1])
            except:
                pass
for i in range(1,maxorder):
    for n in list(empty[i]):
        targets[i].write('\n'+str(n))
    targets[i].close()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文