空间有限的优先级队列:寻找好的算法
这不是作业。
我正在使用一个小的“优先级队列”(目前作为数组实现)来存储具有最小值的最后N个项目。这有点慢 - O(N) 项目插入时间。当前的实现跟踪数组中最大的项目并丢弃任何不适合数组的项目,但我仍然想进一步减少操作数量。
寻找满足以下要求的优先级队列算法:
- 队列可以实现为数组,其具有固定大小并且_不能_增长。严格禁止在任何队列操作期间进行动态内存分配。
- 任何不适合数组的元素都会被丢弃,但队列会保留遇到的所有最小元素。
- O(log(N)) 插入时间(即将元素添加到队列中应花费 O(log(N)))。
- (可选)对队列中*最大*项目的O(1)访问(队列存储*最小*项目,因此最大项目将首先被丢弃,我需要它们来减少操作数量)
- 易于实施/理解。理想情况下——类似于二分搜索——一旦你理解了它,你就会永远记住它。
- 元素不需要以任何方式排序。我只需要保留遇到过的 N 个最小值。当我需要它们时,我会立即访问所有它们。所以从技术上讲,它不一定是一个队列,我只需要存储 N 个最后的最小值。
我最初考虑使用二进制堆(它们可以通过数组轻松实现),但显然当数组不再增长时它们表现不佳。链接列表和数组需要额外的时间来移动事物。 stl 优先级队列不断增长并使用动态分配(不过,我可能是错的)。
那么,还有其他想法吗?
--编辑--
我对STL的实现不感兴趣。由于函数调用次数较多,STL 实现(由一些人建议)的工作速度比当前使用的线性数组慢一些。
我对优先级队列算法感兴趣,而不是实现。
This is not a homework.
I'm using a small "priority queue" (implemented as array at the moment) for storing last N items with smallest value. This is a bit slow - O(N) item insertion time. Current implementation keeps track of largest item in array and discards any items that wouldn't fit into array, but I still would like to reduce number of operations further.
looking for a priority queue algorithm that matches following requirements:
- queue can be implemented as array, which has fixed size and _cannot_ grow. Dynamic memory allocation during any queue operation is strictly forbidden.
- Anything that doesn't fit into array is discarded, but queue keeps all smallest elements ever encountered.
- O(log(N)) insertion time (i.e. adding element into queue should take up to O(log(N))).
- (optional) O(1) access for *largest* item in queue (queue stores *smallest* items, so the largest item will be discarded first and I'll need them to reduce number of operations)
- Easy to implement/understand. Ideally - something similar to binary search - once you understand it, you remember it forever.
- Elements need not to be sorted in any way. I just need to keep N smallest value ever encountered. When I'll need them, I'll access all of them at once. So technically it doesn't have to be a queue, I just need N last smallest values to be stored.
I initially thought about using binary heaps (they can be easily implemented via arrays), but apparently they don't behave well when array can't grow anymore. Linked lists and arrays will require extra time for moving things around. stl priority queue grows and uses dynamic allocation (I may be wrong about it, though).
So, any other ideas?
--EDIT--
I'm not interested in STL implementation. STL implementation (suggested by a few people) works a bit slower than currently used linear array due to high number of function calls.
I'm interested in priority queue algorithms, not implemnetations.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
基于数组的堆似乎非常适合您的目的。我不知道你为什么拒绝他们。
您使用最大堆。
假设您有一个 N 元素堆(以数组形式实现),其中包含迄今为止看到的 N 个最小元素。
当一个元素进入时,您会检查最大值(O(1) 时间),如果大于则拒绝。
如果传入的值较低,则将根修改为新值并筛选此更改的值 - 最坏情况为 O(log N) 时间。
筛选过程很简单:从根开始,在每一步中都将其值与其较大的子项交换,直到恢复 max-heap 属性。
因此,如果您使用 std::priority_queue,您将不必执行任何可能必须执行的删除操作。根据 std::priority_queue 的实现,这可能会导致内存分配/释放。
因此,您可以使用如下代码:
不过,平均而言,您可能不必一路筛选新值,并且可能会比 O(logn) 平均插入时间更好(尽管我还没有尝试证明这一点)。
您只需分配一次大小为 N 的数组,任何插入都是通过交换数组元素来完成的,因此此后不再有动态内存分配。
查看 wiki 页面,其中包含 heapify 和 sift-down 的伪代码: http://en.wikipedia.org /wiki/堆排序
Array based heaps seem ideal for your purpose. I am not sure why you rejected them.
You use a max-heap.
Say you have an N element heap (implemented as an array) which contains the N smallest elements seen so far.
When an element comes in you check against the max (O(1) time), and reject if it is greater.
If the value coming in is lower, you modify the root to be the new value and sift-down this changed value - worst case O(log N) time.
The sift-down process is simple: Starting at root, at each step you exchange this value with it's larger child until the max-heap property is restored.
So, you will not have to do any deletes which you probably will have to, if you use std::priority_queue. Depending on the implementation of std::priority_queue, this could cause memory allocation/deallocation.
So you can have the code as follows:
On an average, though, you probably will not have to sift-down the new value all the way down and might get better than O(logn) average insert time (though I haven't tried proving it).
You only allocate size N array once and any insertion is done by exchanging elements of the array, so there is no dynamic memory allocation after that.
Check out the wiki page which has pseudo code for heapify and sift-down: http://en.wikipedia.org/wiki/Heapsort
使用
std::priority_queue
并将最大项放在头部。对于每个新项,如果它是>=
头项,则丢弃它,否则弹出头项并插入新项。旁注:标准容器只有在你让它们增长的情况下才会增长。只要您在插入新项目之前删除一项(当然是在达到最大大小之后),这种情况就不会发生。
Use
std::priority_queue
with the largest item at the head. For each new item, discard it if it is>=
the head item, otherwise pop the head item and insert the new item.Side note: Standard containers will only grow if you make them grow. As long as you remove one item before inserting a new item (after it reaches its maximum size, of course), this won't happen.
我工作的大多数优先级队列都是基于链表的。如果您有预先确定的优先级数量,则可以通过使用一系列链表(每个优先级一个链表)轻松创建一个 O(1) 插入的优先级队列。相同优先级的项目当然会退化为 FIFO,但这可以认为是可以接受的。
然后添加和删除就变成了类似的事情(您的 API 可能会有所不同)...
获取队列中的第一个项目就变成了寻找具有最高优先级的非空链表的问题。这可能是 O(N),但是您可以使用一些技巧来加快速度。
希望这有帮助。
Most priority queues I work are based on linked lists. If you have a pre-determined number of priority levels, you can easily create a priority queue with O(1) insertion by having an array of linked lists--one linked list per priority level. Items of the same priority will of course degenerate into either a FIFO, but that can be considered acceptable.
Adding and removal then becomes something like (your API may vary) ...
Getting the first item in the queue then becomes a problem of finding the non-empty linked-list with the highest priority. That may be O(N), but there are several tricks you can use to speed it up.
Hope this helps.
如果优先级数量较小且固定,则可以为每个优先级使用环形缓冲区。如果对象很大,这将导致空间浪费,但如果它们的大小与指针/索引相当,则在对象中存储附加指针的变体可能会以相同的方式增加数组的大小。
或者您可以在数组中使用简单的单链表并存储 2*M+1 个指针/索引,其中一个将指向第一个空闲节点,其他对将指向每个优先级的头和尾。在这种情况下,您必须以平均值进行比较。 O(M),然后用 O(1) 取出下一个节点。并且插入将花费 O(1)。
If amount of priorities is small and fixed than you can use ring-buffer for each priority. That will lead to waste of the space if objects is big, but if their size is comparable with pointer/index than variants with storing additional pointers in objects may increase size of array in the same way.
Or you can use simple single-linked list inside array and store 2*M+1 pointers/indexes, one will point to first free node and other pairs will point to head and tail of each priority. In that cases you'll have to compare in avg. O(M) before taking out next node with O(1). And insertion will take O(1).
如果您以最大大小(可能来自使用占位符初始化的向量)构造一个 STL 优先级队列,然后在插入之前检查大小(如果需要的话,事先删除一个项目),那么您将永远不会在插入操作期间进行动态分配。 STL 的实现非常高效。
If you construct an STL priority queue at the maximum size (perhaps from a vector initialized with placeholders), and then check the size before inserting (removing an item if necessary beforehand) you'll never have dynamic allocation during insert operations. The STL implementation is quite efficient.
Matters Computational 请参见第 158 页。实现本身非常好,您甚至可以进行调整它有点不降低可读性。例如,当您计算左孩子时:
您可以像这样计算右孩子:
Matters Computational see page 158. The implementation itself is quite well, and you can even tweak it a little without making it less readable. For example, when you compute the left child like:
You can compute the rightchild like so:
找到了解决方案(“差异”在代码中表示“优先级”,maxRememberedResults为255(可以是任何(2^n - 1)):
最坏情况下我们得到 O(log(N)) 插入。
该解决方案与昵称“Moron”的用户提供的解决方案相同。
感谢大家的回复。
PS 显然,在睡眠不足的情况下编程并不是一个好主意。
Found a solution ("difference" means "priority" in the code, and maxRememberedResults is 255 (could be any (2^n - 1)):
And we get O(log(N)) insert as a worst case scenario.
It is the same solution as the one provided by user with nickname "Moron".
Thanks to everyone for replies.
P.S. Apparently programming without sleeping enough wasn't a good idea.
最好使用 std::array 和堆算法来实现自己的类。
It's better to implement your own class using std::array and heap algorithms.