这种广度优先搜索可以变得更快吗?

发布于 2024-08-12 00:46:05 字数 875 浏览 3 评论 0原文

我有一个数据集,它是一个大型未加权循环图,循环发生在大约 5-6 条路径的循环中。它由大约 8000 个节点组成,每个节点有 1-6 个(通常大约 4-5 个)连接。我正在进行单对最短路径计算,并实现了以下代码来进行广度优先搜索。

from Queue import Queue

q = Queue()
parent = {}
fromNode = 'E1123'
toNode = 'A3455'

# path finding
q.put(fromNode)
parent[fromNode] = 'Root'

while not q.empty():
  # get the next node and add its neighbours to queue
  current = q.get()
  for i in getNeighbours(current):
    # note parent and only continue if not already visited
    if i[0] not in parent:
      parent[i[0]] = current
      q.put(i[0])

  # check if destination
  if current == toNode:
    print 'arrived at', toNode
    break

上面的代码使用Python 2.6 Queue 模块,getNeighbours() 只是一个子例程,它进行单个MySQL 调用并将邻居作为元组列表返回,例如(('foo',),('bar',))。 SQL 调用速度很快。

该代码工作正常,但是测试到大约 7 层的深度需要大约 20 秒才能运行(2.5GHz Intel 4GB RAM OS X 10.6),

我欢迎任何有关如何改进此代码的运行时间的评论。

I have a data set which is a large unweighted cyclic graph The cycles occur in loops of about 5-6 paths. It consists of about 8000 nodes and each node has from 1-6 (usually about 4-5) connections. I'm doing single pair shortest path calculations and have implemented the following code to do a breadth-first search.

from Queue import Queue

q = Queue()
parent = {}
fromNode = 'E1123'
toNode = 'A3455'

# path finding
q.put(fromNode)
parent[fromNode] = 'Root'

while not q.empty():
  # get the next node and add its neighbours to queue
  current = q.get()
  for i in getNeighbours(current):
    # note parent and only continue if not already visited
    if i[0] not in parent:
      parent[i[0]] = current
      q.put(i[0])

  # check if destination
  if current == toNode:
    print 'arrived at', toNode
    break

The above code uses the Python 2.6 Queue module and getNeighbours() is simply a subroutine that makes a single MySQL call and returns the neighbours as a list of tuples e.g. (('foo',),('bar',)). The SQL call is quick.

The code works ok however testing to down to depths of about 7 layers takes about 20 seconds to run (2.5GHz Intel 4GB RAM OS X 10.6)

I'd welcome any comments about how to improve the run time of this code.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

您的好友蓝忘机已上羡 2024-08-19 00:46:05

好吧,考虑到评论的赞成票,我现在就回答一下。

紧密循环中的 SQL肯定会减慢您的速度。我不在乎通话速度有多快。想想看——您要求解析一个查询,运行一个查找——尽管速度很快,但它仍然处于一个紧密的循环中。您的数据集是什么样的?您能否将整个数据集 SELECT 放入内存中,或者至少在 MySQL 之外使用它?

如果您在内存中处理这些数据,您将看到显着的性能提升。

Well, given the upvotes on the comment, I'll make it an answer now.

The SQL in the tight loop is definitely slowing you down. I don't care how fast the call is. Think about it -- you're asking for a query to be parsed, a lookup to be run -- as fast as that is, it's still in a tight loop. What does your data set look like? Can you just SELECT the entire data set into memory, or at least work with it outside of MySQL?

If you work with that data in memory, you will see a significant performance gain.

赠佳期 2024-08-19 00:46:05

像这样的事情:

#!/usr/bin/env python

from Queue import Queue

def traverse_path(fromNode, toNode, nodes):
    def getNeighbours(current, nodes):
        return nodes[current] if current in nodes else []

    def make_path(toNode, graph):
        result = []
        while 'Root' != toNode:
            result.append(toNode)
            toNode = graph[toNode]
        result.reverse()
        return result

    q = Queue()
    q.put(fromNode)
    graph = {fromNode: 'Root'}

    while not q.empty():
        # get the next node and add its neighbours to queue
        current = q.get()
        for neighbor in getNeighbours(current, nodes):
            # use neighbor only continue if not already visited
            if neighbor not in graph:
                graph[neighbor] = current
                q.put(neighbor)

        # check if destination
        if current == toNode:
            return make_path(toNode, graph)
    return []

if __name__ == '__main__':
    nodes = {
        'E1123': ['D111', 'D222', 'D333', 'D444'],
        'D111': ['C01', 'C02', 'C04'],
        'D222': ['C11', 'C03', 'C05'],
        'D333': ['C01'],
        'C02': ['B1'],
        'B1': ['A3455']
    }
    result = traverse_path('E1123', 'A3455', nodes)
    print result

['E1123', 'D111', 'C02', 'B1', 'A3455']

如果你用列表字典替换你的 SQL 查询(这将是棘手的部分),你将获得这样的性能。

Something like this:

#!/usr/bin/env python

from Queue import Queue

def traverse_path(fromNode, toNode, nodes):
    def getNeighbours(current, nodes):
        return nodes[current] if current in nodes else []

    def make_path(toNode, graph):
        result = []
        while 'Root' != toNode:
            result.append(toNode)
            toNode = graph[toNode]
        result.reverse()
        return result

    q = Queue()
    q.put(fromNode)
    graph = {fromNode: 'Root'}

    while not q.empty():
        # get the next node and add its neighbours to queue
        current = q.get()
        for neighbor in getNeighbours(current, nodes):
            # use neighbor only continue if not already visited
            if neighbor not in graph:
                graph[neighbor] = current
                q.put(neighbor)

        # check if destination
        if current == toNode:
            return make_path(toNode, graph)
    return []

if __name__ == '__main__':
    nodes = {
        'E1123': ['D111', 'D222', 'D333', 'D444'],
        'D111': ['C01', 'C02', 'C04'],
        'D222': ['C11', 'C03', 'C05'],
        'D333': ['C01'],
        'C02': ['B1'],
        'B1': ['A3455']
    }
    result = traverse_path('E1123', 'A3455', nodes)
    print result

['E1123', 'D111', 'C02', 'B1', 'A3455']

If you replace your SQL queries with a dictionary of lists (and that would be the tricky part), you will get this performance.

月光色 2024-08-19 00:46:05

我敢打赌那台机器有不止一个核心,不是吗?并行运行它。

Python 线程

I'll bet that machine has more than one core, doesn't it? Run it in parallel.

Python Threading

傲性难收 2024-08-19 00:46:05

嗯,BFS 不是会标记你已经见过的节点,这样你就不会再访问它们了吗?

Hmm, doesn't BFS involve marking nodes you've already seen so you don't visit them again?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文