这种广度优先搜索可以变得更快吗?
我有一个数据集,它是一个大型未加权循环图,循环发生在大约 5-6 条路径的循环中。它由大约 8000 个节点组成,每个节点有 1-6 个(通常大约 4-5 个)连接。我正在进行单对最短路径计算,并实现了以下代码来进行广度优先搜索。
from Queue import Queue
q = Queue()
parent = {}
fromNode = 'E1123'
toNode = 'A3455'
# path finding
q.put(fromNode)
parent[fromNode] = 'Root'
while not q.empty():
# get the next node and add its neighbours to queue
current = q.get()
for i in getNeighbours(current):
# note parent and only continue if not already visited
if i[0] not in parent:
parent[i[0]] = current
q.put(i[0])
# check if destination
if current == toNode:
print 'arrived at', toNode
break
上面的代码使用Python 2.6 Queue 模块,getNeighbours() 只是一个子例程,它进行单个MySQL 调用并将邻居作为元组列表返回,例如(('foo',),('bar',))。 SQL 调用速度很快。
该代码工作正常,但是测试到大约 7 层的深度需要大约 20 秒才能运行(2.5GHz Intel 4GB RAM OS X 10.6),
我欢迎任何有关如何改进此代码的运行时间的评论。
I have a data set which is a large unweighted cyclic graph The cycles occur in loops of about 5-6 paths. It consists of about 8000 nodes and each node has from 1-6 (usually about 4-5) connections. I'm doing single pair shortest path calculations and have implemented the following code to do a breadth-first search.
from Queue import Queue
q = Queue()
parent = {}
fromNode = 'E1123'
toNode = 'A3455'
# path finding
q.put(fromNode)
parent[fromNode] = 'Root'
while not q.empty():
# get the next node and add its neighbours to queue
current = q.get()
for i in getNeighbours(current):
# note parent and only continue if not already visited
if i[0] not in parent:
parent[i[0]] = current
q.put(i[0])
# check if destination
if current == toNode:
print 'arrived at', toNode
break
The above code uses the Python 2.6 Queue module and getNeighbours() is simply a subroutine that makes a single MySQL call and returns the neighbours as a list of tuples e.g. (('foo',),('bar',)). The SQL call is quick.
The code works ok however testing to down to depths of about 7 layers takes about 20 seconds to run (2.5GHz Intel 4GB RAM OS X 10.6)
I'd welcome any comments about how to improve the run time of this code.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
好吧,考虑到评论的赞成票,我现在就回答一下。
紧密循环中的 SQL肯定会减慢您的速度。我不在乎通话速度有多快。想想看——您要求解析一个查询,运行一个查找——尽管速度很快,但它仍然处于一个紧密的循环中。您的数据集是什么样的?您能否将整个数据集
SELECT
放入内存中,或者至少在 MySQL 之外使用它?如果您在内存中处理这些数据,您将看到显着的性能提升。
Well, given the upvotes on the comment, I'll make it an answer now.
The SQL in the tight loop is definitely slowing you down. I don't care how fast the call is. Think about it -- you're asking for a query to be parsed, a lookup to be run -- as fast as that is, it's still in a tight loop. What does your data set look like? Can you just
SELECT
the entire data set into memory, or at least work with it outside of MySQL?If you work with that data in memory, you will see a significant performance gain.
像这样的事情:
如果你用列表字典替换你的 SQL 查询(这将是棘手的部分),你将获得这样的性能。
Something like this:
If you replace your SQL queries with a dictionary of lists (and that would be the tricky part), you will get this performance.
我敢打赌那台机器有不止一个核心,不是吗?并行运行它。
Python 线程
I'll bet that machine has more than one core, doesn't it? Run it in parallel.
Python Threading
嗯,BFS 不是会标记你已经见过的节点,这样你就不会再访问它们了吗?
Hmm, doesn't BFS involve marking nodes you've already seen so you don't visit them again?