在非常大的树上执行 DFS 的最佳方法是什么?
情况是这样的:
- 应用程序世界由数十万个状态组成。
- 给定一个状态,我可以计算出一组 3 或 4 个其他可到达的状态。一个简单的递归可以构建一棵状态树,它会变得非常大、非常快。
- 我需要从根状态到该树中的特定深度执行 DFS,以搜索包含“最小”状态的子树(计算节点的值与问题无关)。
使用单线程执行 DFS 可以工作,但速度非常慢。向下覆盖 15 个关卡可能需要几分钟的时间,我需要改进这种糟糕的表现。尝试为每个子树分配一个线程会创建太多线程并导致 OutOfMemoryError
。使用 ThreadPoolExecutor 也好不了多少。
我的问题:遍历这棵大树最有效的方法是什么?
Here's the situation:
- The application world consists of hundreds of thousands of states.
- Given a state, I can work out a set of 3 or 4 other reachable states. A simple recursion can build a tree of states that gets very large very fast.
- I need to perform a DFS to a specific depth in this tree from the root state, to search for the subtree which contains the 'minimal' state (calculating the value of the node is irrelevant to the question).
Using a single thread to perform the DFS works, but is very slow. Covering 15 levels down can take a few good minutes, and I need to improve this atrocious performance. Trying to assign a thread to each subtree created too many threads and caused an OutOfMemoryError
. Using a ThreadPoolExecutor
wasn't much better.
My question: What's the most efficient way to traverse this large tree?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我不认为导航树是您的问题,因为您的树有大约 3600 万个节点。相反,您对每个节点所做的事情更有可能是昂贵的。
我建议
您限制线程数量,并且仅对树的最高级别使用单独的线程。你有多少个核心?一旦您有足够的线程来保持每个核心忙碌,您就不需要创建更多线程,因为这只会增加开销。
Java 有一个内置的 Stack,它是线程安全的,但是我会使用 ArrayList,它更有效。
I don't believe navigating the tree is your problem as your tree has about 36 million nodes. Instead is it more likely what you are doing with each node is expensive.
prints
I suggest you limit the number of threads and only use separate threads for the highest levels of the tree. How many cores do you have? Once you have enough threads to keep every core busy, you don't need to create more threads as this just adds overhead.
Java has a built in Stack which thread safe, however I would just use ArrayList which is more efficient.
您肯定必须使用迭代方法。最简单的方法是基于堆栈的 DFS,其伪代码与此类似:
其时间复杂度为 O(n+m),其中 n (m) 表示图中的节点(边)数量。由于你有一棵树,这是 O(n) 并且应该可以轻松快速地实现 n>1.000.000...
You will definitely have to use an iterative method. Simplest way is a stack based DFS with a pseudo code similar to this:
The time complexity of this is O(n+m) where n (m) denotes the number of nodes (edges) in your graph. Since you have a tree this is O(n) and should work quickly for n>1.000.000 easily...