全有或全无 - 快速启发式最短路径算法(并行?)
我正在寻找一种好方法来找到数十亿个节点的网络(有向、循环、加权)中两点之间的最短路径。基本上,我想要一种算法,即使最坏的情况很糟糕,通常也会非常非常快地得到解决方案。
我对并行或分布式算法持开放态度,尽管它必须对数据集的大小有意义(在显卡上与 CUDA 配合使用的算法必须能够以块的形式进行处理)。我不打算使用大量计算机来执行此操作,但可能最多使用几台计算机。
I'm looking for a good way to find a shortest path between two points in a network (directed, cyclic, weighted) of billions of nodes. Basically I want an algorithm that will typically get a solution very very quickly, even if its worst case is horrible.
I'm open to parallel or distributed algorithms, although it would have to make sense with the size of the data set (an algorithm that would work with CUDA on a graphics card would have to be able to be processed in chunks). I don't plan on using a farm of computers to do this, but potentially a few max.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
Google 搜索为您提供了很多好的链接。第一个 链接 本身讨论了两个最短路径的并行实现算法。
谈到 CUDA 上的实现,您必须记住数十亿个节点 = 千兆字节的内存。这将对每张卡一次可以使用的节点(以获得最佳性能)进行限制。目前市场上的显卡最大容量约为6GB。这可以让您估计可能需要使用的卡数量(不一定是机器数量)。
A google search gives you a lot of good links. The first link itself talks about parallel implementations of two shortest path algorithms.
And talking about implementation on CUDA, you will have to remember that billions of nodes = Gigabytes of memory. That would provide a limitation on the nodes you can use per card (for optimum performance) at a time. The maximum capacity of a graphics card currently in the market is about 6GB. This can give you an estimate on the number of cards you may need to use (not necessarily the number of machines).
看看 Dikstra 的算法。一般来说,它会进行优化的多深度广度优先搜索,直到保证找到最短路径。找到的第一条路径可能是最短的,但您无法确定,直到搜索的其他分支不会以更短的距离终止。
Look at Dikstra's algorithm. Generally it does an optimized multi-depth breadth first search until you're guaranteed to have found the shortest path. The first path found might be the shortest, but you can't be sure until the other branches of the search don't terminate with a shorter distance.
您可以使用统一成本搜索。该搜索算法将在加权图中找到最优解。如果我没记错的话,搜索复杂度(空间和时间)是 b^(C*/e+1),其中 b 表示分支,C* 是到达目标的最佳路径成本,e 是平均路径成本。
还有一种称为双向搜索的东西,您可以从初始状态和目标状态开始搜索,并希望两个起点在图中间的某个位置相互交叉:)
You could use an uniform cost search. This search algorithm will find a optimal solution in a weighted graph. If I remember correctly, the search complexity (space and time) is b^(C*/e+1), where b denotes the branching, C* the optimal path cost to your goal, and e is the average path cost.
And there is also something called bidirectional search, where you start from the initial state and goal state with the search and hopefully both starting points crosses each other somewhere in the middle of the graph :)
我担心,除非你的图形在内存中很好地布局,否则与 CPU 上经过良好调整的并行算法相比,你不会从使用 CUDA 中获得太多好处。问题是,在“完全无序”的图上行走会导致大量随机内存访问。
当您有 32 个 CUDA 线程并行工作,但它们的内存访问是随机的时,必须串行化获取指令。由于搜索算法不会执行许多困难的数学计算,因此获取内存可能会浪费大部分时间。
I am worried that unless your graph is somehow nicely layed out in the memory, you won't get much benefit from using CUDA, when compared to a well-tuned parallel algorithm on CPU. The problem is, that walking on a "totally-unordered" graphs lead to a lot of random memory accesses.
When you have 32 CUDA-threads working together in parallel, but their memory access is random, the fetch instruction has to be serialised. Since the search algorithm does not perform many hard mathematical computations, fetching memory is where you are likely to loose most of your time.