这种广度优先搜索可以变得更快吗？

发布于 2024-08-12 00:46:05 字数 875 浏览 11 评论 0原文

我有一个数据集，它是一个大型未加权循环图，循环发生在大约 5-6 条路径的循环中。它由大约 8000 个节点组成，每个节点有 1-6 个（通常大约 4-5 个）连接。我正在进行单对最短路径计算，并实现了以下代码来进行广度优先搜索。

from Queue import Queue

q = Queue()
parent = {}
fromNode = 'E1123'
toNode = 'A3455'

# path finding
q.put(fromNode)
parent[fromNode] = 'Root'

while not q.empty():
  # get the next node and add its neighbours to queue
  current = q.get()
  for i in getNeighbours(current):
    # note parent and only continue if not already visited
    if i[0] not in parent:
      parent[i[0]] = current
      q.put(i[0])

  # check if destination
  if current == toNode:
    print 'arrived at', toNode
    break

上面的代码使用Python 2.6 Queue 模块，getNeighbours() 只是一个子例程，它进行单个MySQL 调用并将邻居作为元组列表返回，例如(('foo',),('bar',))。 SQL 调用速度很快。

该代码工作正常，但是测试到大约 7 层的深度需要大约 20 秒才能运行（2.5GHz Intel 4GB RAM OS X 10.6），

我欢迎任何有关如何改进此代码的运行时间的评论。

原文

I have a data set which is a large unweighted cyclic graph The cycles occur in loops of about 5-6 paths. It consists of about 8000 nodes and each node has from 1-6 (usually about 4-5) connections. I'm doing single pair shortest path calculations and have implemented the following code to do a breadth-first search.

from Queue import Queue

q = Queue()
parent = {}
fromNode = 'E1123'
toNode = 'A3455'

# path finding
q.put(fromNode)
parent[fromNode] = 'Root'

while not q.empty():
  # get the next node and add its neighbours to queue
  current = q.get()
  for i in getNeighbours(current):
    # note parent and only continue if not already visited
    if i[0] not in parent:
      parent[i[0]] = current
      q.put(i[0])

  # check if destination
  if current == toNode:
    print 'arrived at', toNode
    break

The above code uses the Python 2.6 Queue module and getNeighbours() is simply a subroutine that makes a single MySQL call and returns the neighbours as a list of tuples e.g. (('foo',),('bar',)). The SQL call is quick.

The code works ok however testing to down to depths of about 7 layers takes about 20 seconds to run (2.5GHz Intel 4GB RAM OS X 10.6)

I'd welcome any comments about how to improve the run time of this code.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

您的好友蓝忘机已上羡 2024-08-19 00:46:05

好吧，考虑到评论的赞成票，我现在就回答一下。

紧密循环中的 SQL肯定会减慢您的速度。我不在乎通话速度有多快。想想看——您要求解析一个查询，运行一个查找——尽管速度很快，但它仍然处于一个紧密的循环中。您的数据集是什么样的？您能否将整个数据集 SELECT 放入内存中，或者至少在 MySQL 之外使用它？

如果您在内存中处理这些数据，您将看到显着的性能提升。

回复收藏 0 原文

赠佳期 2024-08-19 00:46:05

像这样的事情：

#!/usr/bin/env python

from Queue import Queue

def traverse_path(fromNode, toNode, nodes):
    def getNeighbours(current, nodes):
        return nodes[current] if current in nodes else []

    def make_path(toNode, graph):
        result = []
        while 'Root' != toNode:
            result.append(toNode)
            toNode = graph[toNode]
        result.reverse()
        return result

    q = Queue()
    q.put(fromNode)
    graph = {fromNode: 'Root'}

    while not q.empty():
        # get the next node and add its neighbours to queue
        current = q.get()
        for neighbor in getNeighbours(current, nodes):
            # use neighbor only continue if not already visited
            if neighbor not in graph:
                graph[neighbor] = current
                q.put(neighbor)

        # check if destination
        if current == toNode:
            return make_path(toNode, graph)
    return []

if __name__ == '__main__':
    nodes = {
        'E1123': ['D111', 'D222', 'D333', 'D444'],
        'D111': ['C01', 'C02', 'C04'],
        'D222': ['C11', 'C03', 'C05'],
        'D333': ['C01'],
        'C02': ['B1'],
        'B1': ['A3455']
    }
    result = traverse_path('E1123', 'A3455', nodes)
    print result

['E1123', 'D111', 'C02', 'B1', 'A3455']

如果你用列表字典替换你的 SQL 查询（这将是棘手的部分），你将获得这样的性能。

Something like this:

#!/usr/bin/env python

from Queue import Queue

def traverse_path(fromNode, toNode, nodes):
    def getNeighbours(current, nodes):
        return nodes[current] if current in nodes else []

    def make_path(toNode, graph):
        result = []
        while 'Root' != toNode:
            result.append(toNode)
            toNode = graph[toNode]
        result.reverse()
        return result

    q = Queue()
    q.put(fromNode)
    graph = {fromNode: 'Root'}

    while not q.empty():
        # get the next node and add its neighbours to queue
        current = q.get()
        for neighbor in getNeighbours(current, nodes):
            # use neighbor only continue if not already visited
            if neighbor not in graph:
                graph[neighbor] = current
                q.put(neighbor)

        # check if destination
        if current == toNode:
            return make_path(toNode, graph)
    return []

if __name__ == '__main__':
    nodes = {
        'E1123': ['D111', 'D222', 'D333', 'D444'],
        'D111': ['C01', 'C02', 'C04'],
        'D222': ['C11', 'C03', 'C05'],
        'D333': ['C01'],
        'C02': ['B1'],
        'B1': ['A3455']
    }
    result = traverse_path('E1123', 'A3455', nodes)
    print result

['E1123', 'D111', 'C02', 'B1', 'A3455']

If you replace your SQL queries with a dictionary of lists (and that would be the tricky part), you will get this performance.

回复收藏 0 原文