高效的优先级列表

发布于 2024-09-09 22:01:33 字数 188 浏览 6 评论 0原文

我正在寻找一种有效的数据结构来表示优先级列表。具体来说,我需要为一组项目分配优先级,并仅返回得分最高的项目。我研究了在堆上运行的优先级队列,但它们似乎并不真正适合我的需要。一旦我从队列中轮询最高评级的项目,他们就会重新组织堆结构。

最简单的解决方案当然是链表,在最坏的情况下,插入操作将花费相当长的时间。

有人有更好的解决方案吗?

I am looking for an efficient data structure to represent a priority list. Specifically I need to assign a priority to a set of items and return only the top scoring items. I have looked into priority queues which operate on heaps, but they don't seem to really suit my needs. They will reorganize the heap structure as soon as I will poll the top rating item from the queue.

The simplest solution would of course be a linked list, which in the worst case would take quite long for the insertion operation.

Does anyone have a better solution?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

南风起 2024-09-16 22:01:33

堆看起来非常合适,但似乎你的做法是错误的。

假设您想要顶部 x 个元素(顺便说一句,这个 x 与 n 相比如何?)

您所做的是将所有元素放入最大堆中并获取顶部 x。

我建议您使用恰好包含 x 个元素的最小堆。

您插入堆的前 x 个元素。

下一个传入元素,您将与堆中的最小值进行比较,这可以非常快地完成(O(1) 时间)。如果较小,则忽略传入元素。

如果传入元素大于 min,则增加传入元素的 min 并将其在堆中筛选。最坏情况下这应该是 logx 时间。

完成后(在 nlogx 时间内),您可以在 O(xlogx) 时间内按排序顺序从堆中检索元素。

根据数据的大小(以及 x 的大小),使用此最小堆解决方案可能会非常快。


如果您确实希望插入速度超快并且不太关心检索,那么您也可以执行以下操作。

按元素出现的顺序将元素插入到向量(具有摊销 O(1) 插入时间的数组)中。

使用选择算法找到第 x 个最大元素(在 O(n) 时间内,但常数可能很大)。假设这个数字是 S。

现在遍历数组,将每个元素与 S 进行比较,并选择与 S 一样大的元素。

如果 x 的大小合理并且与 n 相当(例如 n/2 或其他),那么这可能会很好,但如果 x与 n 相比很小,我建议使用最小堆。

Heaps seem very suitable, and it seems like you are going about it wrongly.

Say you wanted the top x elements (how does this x compare to n, btw?)

What you are doing is putting all into a max-heap and getting the top x.

I suggest instead, you use a min-heap of exactly x elements.

First x elements you insert into heap.

Next incoming element, you compare against the min which can be done very quickly (O(1) time) in the heap. If smaller, you just ignore the incoming element.

If incoming element is larger than min, then you increase the min to the incoming element and sift it down in the heap. This should be logx time at worst.

Once done (in nlogx time), you can retrieve the elements from the heap in sorted order in O(xlogx) time.

Depending on how your data is (and how small x is), using this min-heap solution can be really fast.


If you really really want the inserts to be super-fast and don't care much about the retrieval then you can also do the following.

Insert the elements into a vector (array with amortized O(1) insert time) in the order they come.

The use the Selection algorithm to find the xth largest element (in O(n) time, but the constants might be big). Say that number is S.

Now walk the array comparing each element with S and select the ones as large as S.

If x is reasonably sized and comparable to n (like n/2 or something) this might work out fine, but if x is small compared to n, I would suggest go with the min-heap.

森林散布 2024-09-16 22:01:33

如果您只需要 k 个顶级项,并且您从不需要查看其他项,则可以使用仅存储当前顶级 k 个的简单链表或数组 项,加上一个数字(列表中元素的最差分数)。

Add() 操作中,您只需将项目与列表中的最差值进行比较,如果更好,则将当前最差值与添加的项目交换。在最坏的情况下,这需要 O(k) 时间进行插入,因为您需要找到当前得分最差的元素。然而,平均情况是 O(1),因为当您向列表中添加更好的元素时,必须进行交换的概率趋于 0(也就是说,您不实际上添加任何项目)。

因此,如果您随机生成元素,您的性能可能会非常好。即使您生成订购的商品(最坏情况),对于您的 k 值而言,它也可能足够快。

If you need only the k top items and you never need to look a the others, you can use a simple linked list or array storing only the current top k items, plus a number (the worst score of the elements in the list).

In the Add() operation you simply compare the item with the worst value in the list and, if better, you swap the current worst with the added item. This takes O(k) time in the worst case for insertion because you need to find the element that has currently the worst score. The the average case, however, is O(1), since, as you add better elements to the list, the probability of having to do a swap tends to 0 (that is, you're not actually adding any items).

So if you generate elements at random, your performance is likely to be very good. Even if you generate ordered items (worst case), it might be fast enough for your value of k.

鹿! 2024-09-16 22:01:33

唔。 跳过列表?它们应该有 O(log n) 插入(作为基于堆的队列),但获取顶部元素应该是 O(1) [包括删除它]。它们甚至可以使用无锁算法来实现。

Hmm. Skip lists? They should have O(log n) insertion (as heap-based queue) but getting top element should be O(1) [including removing it]. They could be even implemented using lock-free algorithm.

你的他你的她 2024-09-16 22:01:33

JDK 有一个内置的 pqueue 类(java.util.PriorityQueue),它基于堆算法。

抱歉,我刚刚看到有关堆的内容不符合您的需求。你能解释一下为什么吗?您可以编写自定义比较器(或使您的项目具有可比性),PriorityQueue 会适当地对您的项目进行排序。

The JDK has a built-in pqueue class (java.util.PriorityQueue) which is based on a heap algorithm.

Sorry, I only just saw the bit about heaps not fitting your needs. Can you explain why? You can write a custom comparator (or make your items comparable) and the PriorityQueue will order your items appropriately.

终难愈 2024-09-16 22:01:33

平衡树总是能保证对数最坏情况。
尽管线性时间通常被认为是可行的,但仍然存在
对数和线性之间的巨大差异:

对于十亿个元素,差异在于十亿次操作之间
还有几十个。如果每个操作需要1毫秒,那就意味着
从 11 天缩短到不到一秒。

  • 每个节点最多有两个子节点。

  • 堆树已完成并左调整。手段齐全
    如果堆的高度为 H,则每个叶节点要么处于 H 层,要么处于 H-1 层。所有级别均向左调整,这意味着右子树的高度不大于其左兄弟树的高度。因此,如果叶子与内部节点的高度相同,则
    叶子不能位于该节点的左侧。

  • 每个节点在以该节点为根的子树中拥有最高优先级。

输入图片此处描述

二叉搜索树是最常见的树,但我们可以使用
d'ary 树。我们可以使用任何大于 2 的值,并使用相同的
堆的数组表示。

输入图片此处描述

但是我们通过树木获得的改进是有代价的。首先,作为
任何使用指针的数据结构(列表、图形、树和
等等)与数组相比,我们有内存开销。当与
后者我们只需要为数据保留空间(也许,
根据实现细节,一些恒定的空间
指针和节点结构本身),每个树节点都需要
额外的空间用于指向其子代以及可能指向其子代的指针
父级。

参考

A balanced tree would always guarantee a logarithmic worst case.
Although linear time is usually regarded as feasible, there is still a
tremendous difference between logarithmic and linear:

for a billion elements, the difference is between 1 billion operations
and a few dozens. If each operation takes 1 millisecond, that means
going from 11 days to less than a second.

  • Every node has at most two children.

  • The heap tree is complete and left-adjusted. Complete means
    that if the heap has height H, every leaf node is either at level H or H-1. All the levels are left-adjusted, which means that no right sub-tree has a height greater than its left sibling. So, if a leaf is at the same height as an internal node, the
    leaf can’t be on the left of that node.

  • Every node holds the highest priority in the subtree rooted at that node.

enter image description here

Binary search trees are the most common kind of trees, but we can use
d'ary trees. we can use any value greater than 2, and use the same
array representation for the heap.

enter image description here

But the improvement we get with trees comes with a price. First, as
with any data structure that uses pointers (lists, graphs, trees, and
so on) we have a memory overhead in comparison to arrays. While with
the latter we just need to reserve space for the data (plus maybe,
depending on the implementation details, some constant space for
pointers and the node structure itself), every tree node requires
extra space for the pointers to its children and possibly to its
parent.

Reference

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文