在最小配对堆中查找 N 个最小值的高效算法
我正在使用此处找到的配对堆实现: https ://github.com/jemalloc/jemalloc/commits/dev/include/jemalloc/internal/ph.h
虽然我偶尔需要迭代堆中的 N 个最小值(其中 N 受堆中元素数量的限制,但通常更少)。有什么有效的方法可以做到这一点吗?
我当前的方法是调用 remove_first()
N 次,然后通过 insert()
将所有弹出的元素推回。
I'm using the pairing heap implementation found here: https://github.com/jemalloc/jemalloc/commits/dev/include/jemalloc/internal/ph.h
Once in a while though I need to iterate over the N min values in the heap (where N is bound by the number of elements in the heap but usually less). Is there any efficient way of doing this?
My current approach is calling remove_first()
N times and then pushing all the popped elements back via insert()
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的方法是
O(k log n)
,其中k
是要打印的项目数,n
是堆。看起来这些操作在您正在使用的实现中已经进行了相当广泛的优化。有一种方法可以用另一个堆来遍历堆,从而在O(k log k)
中解决您的目标,使用k
上的对数因子而不是O(k log k)
会更快。代码>n。该方法相当简单:维护一个初始化为根(堆中的最小值)的值的最小堆(带有指向树中节点的指针)。然后,您可以弹出辅助堆并将当前节点的子节点插入到主堆中,这会更快,因为只有可能是下一个最小值的节点才会在队列中(它们是您已经创建的节点的邻居)到目前为止)。虽然这种方法的大 O 复杂性在技术上更好,但在实践中几乎肯定会慢得多。这个最小配对堆的实现看起来非常高效,并且几乎肯定会超过创建辅助堆和执行此搜索的开销。更不用说额外的代码复杂性以及可能引入错误的可能性,这可能不值得。
我非常确定您无法做得比
O(k log k)
更好。我对此的直觉是,如果你能做得更好,排序时间可能会不断减少,但基于比较的排序 (iirc) 已被证明不可能以比O(n log n)
O(n log n) 更快的速度解决问题代码>.这种直觉可能是错误的,但我认为它可能非常接近事实。祝你好运!Your approach is
O(k log n)
, wherek
is the number of items you want to print andn
is the number of elements in the heap. It looks like these operations have been optimized quite extensively in the implementation you're using. There is a way to traverse the heap with another heap to solve your goal inO(k log k)
instead, which is faster with the log factor onk
instead ofn
. The approach is fairly simple: Maintain a min-heap of values (with pointers to the nodes in the tree) initialized to the root (which is the minimum value in the heap). Then, you can pop off the auxiliary heap and insert the current node's children in the main heap, which is faster since only the nodes which could possibly be the next smallest value are in the queue (which are the neighbors of the nodes you've taken so far).While the big-O complexity of this approach is technically better, it will almost certainly be much slower in practice. The implementation of this min-pairing heap seems very efficient, and it would almost certainly outweigh the overhead of creating an auxiliary heap and performing this search. Not to mention the extra code complexity with the possibility of introducing bugs, it's probably not worth it.
I'm pretty sure that you can't do better than
O(k log k)
. My intuition for this is that there's probably a constant time reduction to sorting if you could do better, but comparison-based sorting (iirc) has been proven to be impossible to solve in faster thanO(n log n)
. This intuition could be wrong, but I think it's probably pretty close to the truth. Best of luck!