当前位置：文江博客话题详情

空间有限的优先级队列：寻找好的算法

发布于 2024-09-03 18:37:54 字数 777 浏览 4 评论 0原文

这不是作业。

我正在使用一个小的“优先级队列”（目前作为数组实现）来存储具有最小值的最后N个项目。这有点慢 - O(N) 项目插入时间。当前的实现跟踪数组中最大的项目并丢弃任何不适合数组的项目，但我仍然想进一步减少操作数量。

寻找满足以下要求的优先级队列算法：

队列可以实现为数组，其具有固定大小并且_不能_增长。严格禁止在任何队列操作期间进行动态内存分配。
任何不适合数组的元素都会被丢弃，但队列会保留遇到的所有最小元素。
O(log(N)) 插入时间（即将元素添加到队列中应花费 O(log(N))）。
（可选）对队列中*最大*项目的O(1)访问（队列存储*最小*项目，因此最大项目将首先被丢弃，我需要它们来减少操作数量）
易于实施/理解。理想情况下——类似于二分搜索——一旦你理解了它，你就会永远记住它。
元素不需要以任何方式排序。我只需要保留遇到过的 N 个最小值。当我需要它们时，我会立即访问所有它们。所以从技术上讲，它不一定是一个队列，我只需要存储 N 个最后的最小值。

我最初考虑使用二进制堆（它们可以通过数组轻松实现），但显然当数组不再增长时它们表现不佳。链接列表和数组需要额外的时间来移动事物。 stl 优先级队列不断增长并使用动态分配（不过，我可能是错的）。

那么，还有其他想法吗？

--编辑--
我对STL的实现不感兴趣。由于函数调用次数较多，STL 实现（由一些人建议）的工作速度比当前使用的线性数组慢一些。

我对优先级队列算法感兴趣，而不是实现。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小巷里的女流氓 2024-09-10 18:37:54

基于数组的堆似乎非常适合您的目的。我不知道你为什么拒绝他们。

您使用最大堆。

假设您有一个 N 元素堆（以数组形式实现），其中包含迄今为止看到的 N 个最小元素。

当一个元素进入时，您会检查最大值（O(1) 时间），如果大于则拒绝。

如果传入的值较低，则将根修改为新值并筛选此更改的值 - 最坏情况为 O(log N) 时间。

筛选过程很简单：从根开始，在每一步中都将其值与其较大的子项交换，直到恢复 max-heap 属性。

因此，如果您使用 std::priority_queue，您将不必执行任何可能必须执行的删除操作。根据 std::priority_queue 的实现，这可能会导致内存分配/释放。

因此，您可以使用如下代码：

分配大小为 N 的数组。
用您看到的前 N 个元素填充它。
heapify（你应该在标准教科书中找到这个，它使用筛选）。这是 O(N)。
现在，您获得的任何新元素，要么在 O(1) 时间内拒绝它，要么在最坏情况下通过筛选插入它 O(logN) 时间。

不过，平均而言，您可能不必一路筛选新值，并且可能会比 O(logn) 平均插入时间更好（尽管我还没有尝试证明这一点）。

您只需分配一次大小为 N 的数组，任何插入都是通过交换数组元素来完成的，因此此后不再有动态内存分配。

查看 wiki 页面，其中包含 heapify 和 sift-down 的伪代码： http://en.wikipedia.org /wiki/堆排序

回复收藏 0 原文

银河中√捞星星 2024-09-10 18:37:54

使用 std::priority_queue 并将最大项放在头部。对于每个新项，如果它是 >= 头项，则丢弃它，否则弹出头项并插入新项。

旁注：标准容器只有在你让它们增长的情况下才会增长。只要您在插入新项目之前删除一项（当然是在达到最大大小之后），这种情况就不会发生。

回复收藏 0 原文

浅听莫相离 2024-09-10 18:37:54

我工作的大多数优先级队列都是基于链表的。如果您有预先确定的优先级数量，则可以通过使用一系列链表（每个优先级一个链表）轻松创建一个 O(1) 插入的优先级队列。相同优先级的项目当然会退化为 FIFO，但这可以认为是可以接受的。

然后添加和删除就变成了类似的事情（您的 API 可能会有所不同）...

listItemAdd (&list[priLevel], &item);      /* Add to tail */
pItem = listItemRemove (&list[priLevel]);  /* Remove from head */

获取队列中的第一个项目就变成了寻找具有最高优先级的非空链表的问题。这可能是 O(N)，但是您可以使用一些技巧来加快速度。

在你的优先级队列结构中，保留一个指向当前最高优先级的链表的指针或索引或其他东西。每次在优先级队列中添加或删除项目时都需要更新此值。
使用位图来指示哪些链表不为空。结合查找最高有效位或查找最低有效位算法，您通常可以一次测试最多 32 个列表。同样，每次添加/删除时都需要更新。

希望这有帮助。

Most priority queues I work are based on linked lists. If you have a pre-determined number of priority levels, you can easily create a priority queue with O(1) insertion by having an array of linked lists--one linked list per priority level. Items of the same priority will of course degenerate into either a FIFO, but that can be considered acceptable.

Adding and removal then becomes something like (your API may vary) ...

listItemAdd (&list[priLevel], &item);      /* Add to tail */
pItem = listItemRemove (&list[priLevel]);  /* Remove from head */

Getting the first item in the queue then becomes a problem of finding the non-empty linked-list with the highest priority. That may be O(N), but there are several tricks you can use to speed it up.

In your priority queue structure, keep a pointer or index or something to the linked list with the current highest priority. This would need to be updated each time an item is added or removed from the priority queue.
Use a bitmap to indicate which linked lists are not empty. Combined with a find most significant bit, or find least significant bit algorithm you can usually test up to 32 lists at once. Again, this would need to be updated on each add / remove.

Hope this helps.

回复收藏 0 原文

泼猴你往哪里跑 2024-09-10 18:37:54

如果优先级数量较小且固定，则可以为每个优先级使用环形缓冲区。如果对象很大，这将导致空间浪费，但如果它们的大小与指针/索引相当，则在对象中存储附加指针的变体可能会以相同的方式增加数组的大小。
或者您可以在数组中使用简单的单链表并存储 2*M+1 个指针/索引，其中一个将指向第一个空闲节点，其他对将指向每个优先级的头和尾。在这种情况下，您必须以平均值进行比较。 O(M)，然后用 O(1) 取出下一个节点。并且插入将花费 O(1)。

回复收藏 0 原文

一向肩并 2024-09-10 18:37:54

如果您以最大大小（可能来自使用占位符初始化的向量）构造一个 STL 优先级队列，然后在插入之前检查大小（如果需要的话，事先删除一个项目），那么您将永远不会在插入操作期间进行动态分配。 STL 的实现非常高效。

回复收藏 0 原文

旧竹 2024-09-10 18:37:54

Matters Computational 请参见第 158 页。实现本身非常好，您甚至可以进行调整它有点不降低可读性。例如，当您计算左孩子时：

int left = i / 2;

您可以像这样计算右孩子：

int right = left + 1;

Matters Computational see page 158. The implementation itself is quite well, and you can even tweak it a little without making it less readable. For example, when you compute the left child like:

int left = i / 2;

You can compute the rightchild like so:

int right = left + 1;

回复收藏 0 原文

平安喜乐 2024-09-10 18:37:54

找到了解决方案（“差异”在代码中表示“优先级”，maxRememberedResults为255（可以是任何（2^n - 1））：

template <typename T> inline void swap(T& a, T& b){
    T c = a;
    a = b;
    b = c;
}


struct MinDifferenceArray{
    enum{maxSize = maxRememberedResults};
    int size;
    DifferenceData data[maxSize];
    void add(const DifferenceData& val){
        if (size >= maxSize){
            if(data[0].difference <= val.difference)
                return;

            data[0] = val;

            for (int i = 0; (2*i+1) < maxSize; ){
                int next = 2*i + 1;
                if (data[next].difference < data[next+1].difference)
                    next++;
                if (data[i].difference < data[next].difference)
                    swap(data[i], data[next]);
                else
                    break;
                i = next;
            }
        }
        else{
            data[size++] = val;
            for (int i = size - 1; i > 0;){
                int parent = (i-1)/2;
                if (data[parent].difference < data[i].difference){
                    swap(data[parent], data[i]);
                    i = parent;
                }
                else
                    break;
            }
        }
    }

    void clear(){
        size = 0;
    }

    MinDifferenceArray()
        :size(0){
    }
};

构建基于max的队列（根是最大的）
直到充满为止，正常填充
当它已满时，对于每个新元素
1. 检查新元素是否小于根元素。
2. 如果大于或等于root，则拒绝。
3. 否则，用新元素替换 root 并执行正常的堆“向下筛选”。

最坏情况下我们得到 O(log(N)) 插入。

该解决方案与昵称“Moron”的用户提供的解决方案相同。
感谢大家的回复。

PS 显然，在睡眠不足的情况下编程并不是一个好主意。

Found a solution ("difference" means "priority" in the code, and maxRememberedResults is 255 (could be any (2^n - 1)):

template <typename T> inline void swap(T& a, T& b){
    T c = a;
    a = b;
    b = c;
}


struct MinDifferenceArray{
    enum{maxSize = maxRememberedResults};
    int size;
    DifferenceData data[maxSize];
    void add(const DifferenceData& val){
        if (size >= maxSize){
            if(data[0].difference <= val.difference)
                return;

            data[0] = val;

            for (int i = 0; (2*i+1) < maxSize; ){
                int next = 2*i + 1;
                if (data[next].difference < data[next+1].difference)
                    next++;
                if (data[i].difference < data[next].difference)
                    swap(data[i], data[next]);
                else
                    break;
                i = next;
            }
        }
        else{
            data[size++] = val;
            for (int i = size - 1; i > 0;){
                int parent = (i-1)/2;
                if (data[parent].difference < data[i].difference){
                    swap(data[parent], data[i]);
                    i = parent;
                }
                else
                    break;
            }
        }
    }

    void clear(){
        size = 0;
    }

    MinDifferenceArray()
        :size(0){
    }
};

build max-based queue (root is largest)
until it is full, fill up normally
when it is full, for every new element
1. Check if new element is smaller than root.
2. if it is larger or equal than root, reject.
3. otherwise, replace root with new element and perform normal heap "sift-down".

And we get O(log(N)) insert as a worst case scenario.

It is the same solution as the one provided by user with nickname "Moron".
Thanks to everyone for replies.

P.S. Apparently programming without sleeping enough wasn't a good idea.

回复收藏 0 原文

把时间冻结 2024-09-10 18:37:54

最好使用 std::array 和堆算法来实现自己的类。

  `template<class T, int fixed_size = 5>
   class fixed_size_arr_pqueue_v2
   {
     std::array<T, fixed_size> _data;
     int _size = 0;

     int parent(int i)
     {
       return (i - 1)/2;
     }

     void heapify(int i, bool downward = false)
     {
       int l = 2*i + 1;
       int r = 2*i + 2;
       int largest = 0;
       if (l < size() && _data[l] > _data[i])
        largest = l;
       else
        largest = i;

       if (r < size() && _data[r] > _data[largest])
        largest = r;   

       if (largest != i)
       {
         std::swap(_data[largest], _data[i]);
         if (!downward)
           heapify(parent(i));
         else
           heapify(largest, true);
       }
    }

    public:

    void push(T &d)
    {
       if (_size == fixed_size)
       {
          //min elements in a max heap lies at leaves only. 
          auto minItr = std::min_element(begin(_data) + _size/2, end(_data));
          auto minPos {minItr - _data.begin()};
          auto min { *minItr};

          if (d > min)
          {
             _data.at(minPos) = d;
             if (_data[parent(minPos)] > d)
             { 
                //this is unlikely to happen in our case? as this position is  a leaf?
                heapify(minPos, true);
             }  
             else
                heapify(parent(minPos));
           }           

          return ;
       }

       _data.at(_size++) = d;
       std::push_heap(_data.begin(), _data.begin() + _size);
    }

    T pop()
    {
       T d = _data.front();
       std::pop_heap(_data.begin(), _data.begin() + _size);
       _size--;
       return d;
    }

    T top()
    {
       return _data.front();
    }

    int size() const
    {
       return _size;
    }
 };`

It's better to implement your own class using std::array and heap algorithms.

  `template<class T, int fixed_size = 5>
   class fixed_size_arr_pqueue_v2
   {
     std::array<T, fixed_size> _data;
     int _size = 0;

     int parent(int i)
     {
       return (i - 1)/2;
     }

     void heapify(int i, bool downward = false)
     {
       int l = 2*i + 1;
       int r = 2*i + 2;
       int largest = 0;
       if (l < size() && _data[l] > _data[i])
        largest = l;
       else
        largest = i;

       if (r < size() && _data[r] > _data[largest])
        largest = r;   

       if (largest != i)
       {
         std::swap(_data[largest], _data[i]);
         if (!downward)
           heapify(parent(i));
         else
           heapify(largest, true);
       }
    }

    public:

    void push(T &d)
    {
       if (_size == fixed_size)
       {
          //min elements in a max heap lies at leaves only. 
          auto minItr = std::min_element(begin(_data) + _size/2, end(_data));
          auto minPos {minItr - _data.begin()};
          auto min { *minItr};

          if (d > min)
          {
             _data.at(minPos) = d;
             if (_data[parent(minPos)] > d)
             { 
                //this is unlikely to happen in our case? as this position is  a leaf?
                heapify(minPos, true);
             }  
             else
                heapify(parent(minPos));
           }           

          return ;
       }

       _data.at(_size++) = d;
       std::push_heap(_data.begin(), _data.begin() + _size);
    }

    T pop()
    {
       T d = _data.front();
       std::pop_heap(_data.begin(), _data.begin() + _size);
       _size--;
       return d;
    }

    T top()
    {
       return _data.front();
    }

    int size() const
    {
       return _size;
    }
 };`

回复收藏 0 原文

~没有更多了~