是否有具有固定容量和自定义比较器的 PriorityQueue 实现?
相关问题:
- 具有固定大小的Java PriorityQueue
- 如何使用 PriorityQueue?
- 获取数组中 n 个最小元素的索引
- Scala:有没有办法使用PriorityQueue 就像我在 Java 中那样?
我有一个非常大的数据集(超过 500 万个项目),我需要从中获取N 个最大 项目。最自然的方法是使用堆/优先级队列仅存储前 N 个项目。 JVM (Scala/Java) 有几种很好的优先级队列实现,即:
前两个很好,但它们存储所有项目,在我的情况下,这会带来严重的内存开销。第三种(Lucene 实现)没有这样的缺点,但正如我从文档中看到的,它也不支持自定义比较器,这使得它对我来说毫无用处。
所以,我的问题是:是否有一个具有固定容量和自定义比较器的PriorityQueue
实现?
UPD。最后,我根据Peter的回答创建了自己的实现:(
public class FixedSizePriorityQueue<E> extends TreeSet<E> {
private int elementsLeft;
public FixedSizePriorityQueue(int maxSize) {
super(new NaturalComparator());
this.elementsLeft = maxSize;
}
public FixedSizePriorityQueue(int maxSize, Comparator<E> comparator) {
super(comparator);
this.elementsLeft = maxSize;
}
/**
* @return true if element was added, false otherwise
* */
@Override
public boolean add(E e) {
if (elementsLeft == 0 && size() == 0) {
// max size was initiated to zero => just return false
return false;
} else if (elementsLeft > 0) {
// queue isn't full => add element and decrement elementsLeft
boolean added = super.add(e);
if (added) {
elementsLeft--;
}
return added;
} else {
// there is already 1 or more elements => compare to the least
int compared = super.comparator().compare(e, this.first());
if (compared == 1) {
// new element is larger than the least in queue => pull the least and add new one to queue
pollFirst();
super.add(e);
return true;
} else {
// new element is less than the least in queue => return false
return false;
}
}
}
}
其中NaturalComparator
取自这个问题)
Related questions:
- Java PriorityQueue with fixed size
- How do I use a PriorityQueue?
- get indexes of n smallest elements in an array
- Scala: Is there a way to use PriorityQueue like I would in Java?
I have a very large data set (more than 5 millions items) and I need to get N largest items from it. The most natural way to do it is to use heap/priority queue storing only top N items. There are several good implementations of priority queue for JVM (Scala/Java), namely:
First 2 are nice, but they store all the items, which in my case gives critical memory overhead. Third (Lucene implementation) doesn't have such a drawback, but as I can see from documentation it also doesn't support custom comparator, which makes it useless for me.
So, my question is: Is there a PriorityQueue
implementation with fixed capacity and custom comparator?
UPD. Finally I've created my own implementation, based on Peter's answer:
public class FixedSizePriorityQueue<E> extends TreeSet<E> {
private int elementsLeft;
public FixedSizePriorityQueue(int maxSize) {
super(new NaturalComparator());
this.elementsLeft = maxSize;
}
public FixedSizePriorityQueue(int maxSize, Comparator<E> comparator) {
super(comparator);
this.elementsLeft = maxSize;
}
/**
* @return true if element was added, false otherwise
* */
@Override
public boolean add(E e) {
if (elementsLeft == 0 && size() == 0) {
// max size was initiated to zero => just return false
return false;
} else if (elementsLeft > 0) {
// queue isn't full => add element and decrement elementsLeft
boolean added = super.add(e);
if (added) {
elementsLeft--;
}
return added;
} else {
// there is already 1 or more elements => compare to the least
int compared = super.comparator().compare(e, this.first());
if (compared == 1) {
// new element is larger than the least in queue => pull the least and add new one to queue
pollFirst();
super.add(e);
return true;
} else {
// new element is less than the least in queue => return false
return false;
}
}
}
}
(where NaturalComparator
is taken from this question)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
你怎么能说Lucene不支持自定义比较器呢?
它是抽象的,您必须实现抽象方法
lessThan(T a, T b)
How can you say Lucene's doesn't support a custom comparator?
Its abstract and you must implement the abstract method
lessThan(T a, T b)
您可以使用带有自定义比较器的 SortedSet(例如 TreeSet),并在大小达到 N 时删除最小的。
You could use a SortedSet e.g. TreeSet with a custom comparator and remove the smallest when the size reachs N.
虽然是一个老问题,但可能对其他人有帮助。
您可以使用 minMaxPriorityQueue 的Google 的 Java 库 guava。
Though an old question but it may be helpful to somebody else.
You can use minMaxPriorityQueue of Google's Java library guava.
我想不出一个现成的,但你可以检查 我对这个集合的实现具有类似的要求。
区别在于比较器,但如果您从
PriorityQueue
扩展,您就会拥有它。每次添加时检查是否未达到限制,如果已达到,则删除最后一项。I can't think of a ready-to-use one, but you can check my implementation of this collection with similar requirements.
The difference is the comparator, but if you extend from
PriorityQueue
you'll have it. And on each addition check if you haven't reached the limit, and if you have - drop the last item.下面是我之前使用的实现。符合Peter的建议。
顺便说一句,我将不胜感激任何反馈。
编辑:似乎使用
TreeSet
毕竟不是很有效,因为对first()
的调用似乎需要亚线性时间。我将TreeSet
更改为PriorityQueue
。修改后的add()
方法如下所示:Below is the implementation I used before. Complies with Peter's suggestion.
I would appreciate any feedback btw.
EDIT: It seems like using a
TreeSet
is not very efficient after all because the calls tofirst()
seem to take sublinear time. I changed theTreeSet
to aPriorityQueue
. The modifiedadd()
method looks like this:正是我正在寻找的东西。然而,该实现包含一个错误:
即:if elementsLeft > 。 0 和 e 已包含在 TreeSet 中。
在这种情况下,elementsLeft 会减少,但 TreeSet 中的元素数量保持不变。
我建议将 add() 方法中的相应行替换为
Exactly what I was looking for. However, the implementation contains a bug:
Namely: if elementsLeft > 0 and e is already contained in the TreeSet.
In this case, elementsLeft is decreased, but the number of elements in the TreeSet stays the same.
I would suggest to replace the corresponding lines in the add() method by
试试这个代码:
Try this code:
如果你有番石榴,这是我整理的一份。我认为它已经相当完整了。如果我错过了什么,请告诉我。
您可以使用 gauva ForwardingBlockingQueue,这样您就不必映射所有其他方法。
Here is one I put together if you have guava. I think it is is pretty complete. Let me know if I missed something.
You can use the gauva ForwardingBlockingQueue so you don't have to map all the other methods.
嗯,这是一个很老的问题,但我很困惑为什么还没有提出更简单的解决方案。
除非我遗漏了一些东西,否则可以使用 min-heap(Java 的默认 PriorityQueue 实现) 来简单地解决这个问题,稍有改动,因为当 PriorityQueue 的大小变得大于 k 时(即,如果我们正在尝试存储前 k 个元素),您轮询头部。
下面是一个示例,说明
我使用了 Integer 的 PriorityQueue,但它很简单,可以用自定义对象替换它并输入自定义比较器。
除非我遗漏了一些明显的东西,否则我想这就是OP正在寻找的东西。
Well, quite an old question, but I'm confused why a simpler solution hasn't been suggested yet.
Unless I'm missing something, this can be trivially solved using a min-heap (Java's default PriorityQueue implementation) with a slight twist in that the moment the size of the PriorityQueue becomes greater than k(ie if we're trying to store the top k elements), you poll the head.
Here's an example of what I mean
I used a PriorityQueue of Integer, but it's simple enough to replace it with a custom object and feed in a custom Comparator.
Unless I'm missing something obvious, I suppose this is what the OP was looking for.
创建一个有大小限制的 PriorityQueue。它存储N个最大数字。
Create a PriorityQueue that has size limit. It stores N max numbers.