使用时空权衡的最短路径算法?
问题:在未加权的无向图中找到最短路径。
广度优先搜索可以找到两个节点之间的最短路径,但这可能需要 O(|V| + |E|) 时间。预先计算的查找表将允许在 O(1) 时间内答复请求,但代价是 O(|V|^2) 空间。
我想知道的是:是否有一种算法可以提供更细粒度的时空权衡?换句话说,是否有一种算法:
- 查找最短路径的时间比 O(1) 长,但比双向广度优先搜索更快
- 使用预先计算的数据,占用的空间比 O(|V|^2) 少?
实际方面:该图有 800,000 个节点,被认为是一个小世界网络。所有对的最短路径表将达到千兆字节的量级——现在这并不离谱,但它不符合我们的要求。
然而,我出于好奇才问我的问题。让我彻夜难眠的不是“如何减少全对查找表的缓存未命中?”,而是“是否有一种我从未听说过的完全不同的算法? ”
答案可能是否定的,但这没关系。
Problem: finding shortest paths in an unweighted, undirected graph.
Breadth-first search can find the shortest path between two nodes, but this can take up to O(|V| + |E|) time. A precomputed lookup table would allow requests to be answered in O(1) time, but at the cost of O(|V|^2) space.
What I'm wondering: Is there an algorithm which offers a space-time tradeoff that's more fine-grained? In other words, is there an algorithm which:
- Finds shortest paths in more time than O(1), but is faster than a bidirectional breadth-first search
- Uses precomputed data which takes up less space than O(|V|^2)?
On the practical side: The graph is 800,000 nodes and is believed to be a small-world network. The all-pairs shortest paths table would be on the order of gigabytes -- that's not outrageous these days, but it doesn't suit our requirements.
However, I am asking my question out of curiosity. What's keeping me up at night is not "how can I reduce cache misses for an all-pairs lookup table?", but "Is there a completely different algorithm out there that I've never heard of?"
The answer may be no, and that's okay.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您应该首先查看 Dijkstra 算法来查找最短路径。 a*算法是一种变体,它使用启发式方法来减少计算起始节点和目标节点之间的最佳路线(例如欧几里德距离)所需的时间。您可以修改此启发式以提高性能或准确性。
You should start by looking at Dijkstra's algorithm for finding the shortest path. The a* algorithm is a variant that uses a heuristic to reduce the time taken to calculate the optimal route between the start and goal node (such as the euclidean distance). You can modify this heuristic for performance or accuracy.
如果查找表太大而无法存储在磁盘上,那么您的输入集似乎必须非常大。我认为 RAM 中无法容纳数据,这意味着您使用的任何算法都应该进行调整,以最大限度地减少读写量。每当涉及到磁盘时,空间==时间,因为写入磁盘非常慢。
您应该使用的确切算法取决于您拥有的图表类型。 这篇研究论文您可能会感兴趣。全面披露:我自己没有读过,但看起来它可能就是您正在寻找的内容。
编辑:
如果图(几乎)是连通的(小世界网络就是这样),则查找表不能小于 V^2。这意味着所有查找都需要磁盘访问。如果边缘适合主存,那么每次只计算路径可能会更快。否则,您可以从包含所有最短路径长度的表中计算路径。您可以从该表重建路径。
关键是要确保表中在任一方向上彼此靠近的条目在磁盘上也彼此靠近。这种存储模式可以实现以下目标:
它也可以很好地与缓存层次结构配合使用。
为了计算该表,您可以使用修改后的 Floyd-Warshall ,您可以在其中以块的形式处理数据。这将使您能够在合理的时间内执行计算,尤其是在并行化计算的情况下。
It seems as if your input set must be very large, if a lookup table will be too large to store on the disk. I assume that that the data will not fit in RAM then, which means that whatever algorithm you use should be tuned to minimize the amounts of reads and writes. Whenever disks are involved space == time, because writing to disk is so slow.
The exact algorithm you should use depends on what kind of graph you have. This research paper might be of interest to you. Full disclosure: I have not read it myself, but it seems like it might be what you are looking for.
Edit:
If the graph is (almost) connected, which a small-world network is, a lookup table can't be smaller than V^2. This means that all lookups will require disk access. If the edges fit in main memory, it might be faster to just compute the path every time. Otherwise, you might compute the path from a table containing the lengths of all shortests paths. You can reconstruct the path from that table.
The key is to make sure that the entries in the table which are close to each other in either direction are also close to each other on the disk. This storage pattern accomplishes that:
It will also work well with the cache hierarchy.
In order to compute the table you might use a modified Floyd-Warshall, where you process the data in blocks. This would let you perform the computation in a reasonable amount of time, especially if you parallelize it.