是否有具有固定容量和自定义比较器的 PriorityQueue 实现?

发布于 2024-12-11 18:32:31 字数 3149 浏览 0 评论 0原文

相关问题:

我有一个非常大的数据集(超过 500 万个项目),我需要从中获取N 个最大 项目。最自然的方法是使用堆/优先级队列仅存储前 N 个项目。 JVM (Scala/Java) 有几种很好的优先级队列实现,即:

前两个很好,但它们存储所有项目,在我的情况下,这会带来严重的内存开销。第三种(Lucene 实现)没有这样的缺点,但正如我从文档中看到的,它也不支持自定义比较器,这使得它对我来说毫无用处。

所以,我的问题是:是否有一个具有固定容量自定义比较器PriorityQueue实现

UPD。最后,我根据Peter的回答创建了自己的实现:(

public class FixedSizePriorityQueue<E> extends TreeSet<E> {

    private int elementsLeft;

    public FixedSizePriorityQueue(int maxSize) {
        super(new NaturalComparator());
        this.elementsLeft = maxSize;
    }

    public FixedSizePriorityQueue(int maxSize, Comparator<E> comparator) {
        super(comparator);
        this.elementsLeft = maxSize;
    }


    /**
     * @return true if element was added, false otherwise
     * */
    @Override
    public boolean add(E e) {
        if (elementsLeft == 0 && size() == 0) {
            // max size was initiated to zero => just return false
            return false;
        } else if (elementsLeft > 0) {
            // queue isn't full => add element and decrement elementsLeft
            boolean added = super.add(e);
            if (added) {
                elementsLeft--;
            }
            return added;
        } else {
            // there is already 1 or more elements => compare to the least
            int compared = super.comparator().compare(e, this.first());
            if (compared == 1) {
                // new element is larger than the least in queue => pull the least and add new one to queue
                pollFirst();
                super.add(e);
                return true;
            } else {
                // new element is less than the least in queue => return false
                return false;
            }
        }
    }
}

其中NaturalComparator取自这个问题)

Related questions:

I have a very large data set (more than 5 millions items) and I need to get N largest items from it. The most natural way to do it is to use heap/priority queue storing only top N items. There are several good implementations of priority queue for JVM (Scala/Java), namely:

First 2 are nice, but they store all the items, which in my case gives critical memory overhead. Third (Lucene implementation) doesn't have such a drawback, but as I can see from documentation it also doesn't support custom comparator, which makes it useless for me.

So, my question is: Is there a PriorityQueue implementation with fixed capacity and custom comparator?

UPD. Finally I've created my own implementation, based on Peter's answer:

public class FixedSizePriorityQueue<E> extends TreeSet<E> {

    private int elementsLeft;

    public FixedSizePriorityQueue(int maxSize) {
        super(new NaturalComparator());
        this.elementsLeft = maxSize;
    }

    public FixedSizePriorityQueue(int maxSize, Comparator<E> comparator) {
        super(comparator);
        this.elementsLeft = maxSize;
    }


    /**
     * @return true if element was added, false otherwise
     * */
    @Override
    public boolean add(E e) {
        if (elementsLeft == 0 && size() == 0) {
            // max size was initiated to zero => just return false
            return false;
        } else if (elementsLeft > 0) {
            // queue isn't full => add element and decrement elementsLeft
            boolean added = super.add(e);
            if (added) {
                elementsLeft--;
            }
            return added;
        } else {
            // there is already 1 or more elements => compare to the least
            int compared = super.comparator().compare(e, this.first());
            if (compared == 1) {
                // new element is larger than the least in queue => pull the least and add new one to queue
                pollFirst();
                super.add(e);
                return true;
            } else {
                // new element is less than the least in queue => return false
                return false;
            }
        }
    }
}

(where NaturalComparator is taken from this question)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

梓梦 2024-12-18 18:32:31

你怎么能说Lucene不支持自定义比较器呢?

它是抽象的,您必须实现抽象方法 lessThan(T a, T b)

How can you say Lucene's doesn't support a custom comparator?

Its abstract and you must implement the abstract method lessThan(T a, T b)

情话难免假 2024-12-18 18:32:31

您可以使用带有自定义比较器的 SortedSet(例如 TreeSet),并在大小达到 N 时删除最小的。

You could use a SortedSet e.g. TreeSet with a custom comparator and remove the smallest when the size reachs N.

带刺的爱情 2024-12-18 18:32:31

虽然是一个老问题,但可能对其他人有帮助。
您可以使用 minMaxPriorityQueue 的Google 的 Java 库 guava。

Though an old question but it may be helpful to somebody else.
You can use minMaxPriorityQueue of Google's Java library guava.

哭泣的笑容 2024-12-18 18:32:31

我想不出一个现成的,但你可以检查 我对这个集合的实现具有类似的要求。

区别在于比较器,但如果您从 PriorityQueue 扩展,您就会拥有它。每次添加时检查是否未达到限制,如果已达到,则删除最后一项。

I can't think of a ready-to-use one, but you can check my implementation of this collection with similar requirements.

The difference is the comparator, but if you extend from PriorityQueue you'll have it. And on each addition check if you haven't reached the limit, and if you have - drop the last item.

我不在是我 2024-12-18 18:32:31

下面是我之前使用的实现。符合Peter的建议。

 public @interface NonThreadSafe {
 }

/**
 * A priority queue implementation with a fixed size based on a {@link TreeMap}.
 * The number of elements in the queue will be at most {@code maxSize}.
 * Once the number of elements in the queue reaches {@code maxSize}, trying to add a new element
 * will remove the greatest element in the queue if the new element is less than or equal to
 * the current greatest element. The queue will not be modified otherwise.
 */
@NonThreadSafe
public static class FixedSizePriorityQueue<E> {
    private final TreeSet<E> treeSet; /* backing data structure */
    private final Comparator<? super E> comparator;
    private final int maxSize;

    /**
     * Constructs a {@link FixedSizePriorityQueue} with the specified {@code maxSize}
     * and {@code comparator}.
     *
     * @param maxSize    - The maximum size the queue can reach, must be a positive integer.
     * @param comparator - The comparator to be used to compare the elements in the queue, must be non-null.
     */
    public FixedSizePriorityQueue(final int maxSize, final Comparator<? super E> comparator) {
        super();
        if (maxSize <= 0) {
            throw new IllegalArgumentException("maxSize = " + maxSize + "; expected a positive integer.");
        }
        if (comparator == null) {
            throw new NullPointerException("Comparator is null.");
        }
        this.treeSet = new TreeSet<E>(comparator);
        this.comparator = treeSet.comparator();
        this.maxSize = maxSize;
    }

    /**
     * Adds an element to the queue. If the queue contains {@code maxSize} elements, {@code e} will
     * be compared to the greatest element in the queue using {@code comparator}.
     * If {@code e} is less than or equal to the greatest element, that element will be removed and
     * {@code e} will be added instead. Otherwise, the queue will not be modified
     * and {@code e} will not be added.
     *
     * @param e - Element to be added, must be non-null.
     */
    public void add(final E e) {
        if (e == null) {
            throw new NullPointerException("e is null.");
        }
        if (maxSize <= treeSet.size()) {
            final E firstElm = treeSet.first();
            if (comparator.compare(e, firstElm) < 1) {
                return;
            } else {
                treeSet.pollFirst();
            }
        }
        treeSet.add(e);
    }

    /**
     * @return Returns a sorted view of the queue as a {@link Collections#unmodifiableList(java.util.List)}
     *         unmodifiableList.
     */
    public List<E> asList() {
        return Collections.unmodifiableList(new ArrayList<E>(treeSet));
    }
}

顺便说一句,我将不胜感激任何反馈。

编辑:似乎使用TreeSet毕竟不是很有效,因为对first()的调用似乎需要亚线性时间。我将 TreeSet 更改为 PriorityQueue。修改后的 add() 方法如下所示:

   /**
     * Adds an element to the queue. If the queue contains {@code maxSize} elements, {@code e} will
     * be compared to the lowest element in the queue using {@code comparator}.
     * If {@code e} is greater than or equal to the lowest element, that element will be removed and
     * {@code e} will be added instead. Otherwise, the queue will not be modified
     * and {@code e} will not be added.
     *
     * @param e - Element to be added, must be non-null.
     */
    public void add(final E e) {
        if (e == null) {
            throw new NullPointerException("e is null.");
        }
        if (maxSize <= priorityQueue.size()) {
            final E firstElm = priorityQueue.peek();
            if (comparator.compare(e, firstElm) < 1) {
                return;
            } else {
                priorityQueue.poll();
            }
        }
        priorityQueue.add(e);
    }

Below is the implementation I used before. Complies with Peter's suggestion.

 public @interface NonThreadSafe {
 }

/**
 * A priority queue implementation with a fixed size based on a {@link TreeMap}.
 * The number of elements in the queue will be at most {@code maxSize}.
 * Once the number of elements in the queue reaches {@code maxSize}, trying to add a new element
 * will remove the greatest element in the queue if the new element is less than or equal to
 * the current greatest element. The queue will not be modified otherwise.
 */
@NonThreadSafe
public static class FixedSizePriorityQueue<E> {
    private final TreeSet<E> treeSet; /* backing data structure */
    private final Comparator<? super E> comparator;
    private final int maxSize;

    /**
     * Constructs a {@link FixedSizePriorityQueue} with the specified {@code maxSize}
     * and {@code comparator}.
     *
     * @param maxSize    - The maximum size the queue can reach, must be a positive integer.
     * @param comparator - The comparator to be used to compare the elements in the queue, must be non-null.
     */
    public FixedSizePriorityQueue(final int maxSize, final Comparator<? super E> comparator) {
        super();
        if (maxSize <= 0) {
            throw new IllegalArgumentException("maxSize = " + maxSize + "; expected a positive integer.");
        }
        if (comparator == null) {
            throw new NullPointerException("Comparator is null.");
        }
        this.treeSet = new TreeSet<E>(comparator);
        this.comparator = treeSet.comparator();
        this.maxSize = maxSize;
    }

    /**
     * Adds an element to the queue. If the queue contains {@code maxSize} elements, {@code e} will
     * be compared to the greatest element in the queue using {@code comparator}.
     * If {@code e} is less than or equal to the greatest element, that element will be removed and
     * {@code e} will be added instead. Otherwise, the queue will not be modified
     * and {@code e} will not be added.
     *
     * @param e - Element to be added, must be non-null.
     */
    public void add(final E e) {
        if (e == null) {
            throw new NullPointerException("e is null.");
        }
        if (maxSize <= treeSet.size()) {
            final E firstElm = treeSet.first();
            if (comparator.compare(e, firstElm) < 1) {
                return;
            } else {
                treeSet.pollFirst();
            }
        }
        treeSet.add(e);
    }

    /**
     * @return Returns a sorted view of the queue as a {@link Collections#unmodifiableList(java.util.List)}
     *         unmodifiableList.
     */
    public List<E> asList() {
        return Collections.unmodifiableList(new ArrayList<E>(treeSet));
    }
}

I would appreciate any feedback btw.

EDIT: It seems like using a TreeSet is not very efficient after all because the calls to first() seem to take sublinear time. I changed the TreeSet to a PriorityQueue. The modified add() method looks like this:

   /**
     * Adds an element to the queue. If the queue contains {@code maxSize} elements, {@code e} will
     * be compared to the lowest element in the queue using {@code comparator}.
     * If {@code e} is greater than or equal to the lowest element, that element will be removed and
     * {@code e} will be added instead. Otherwise, the queue will not be modified
     * and {@code e} will not be added.
     *
     * @param e - Element to be added, must be non-null.
     */
    public void add(final E e) {
        if (e == null) {
            throw new NullPointerException("e is null.");
        }
        if (maxSize <= priorityQueue.size()) {
            final E firstElm = priorityQueue.peek();
            if (comparator.compare(e, firstElm) < 1) {
                return;
            } else {
                priorityQueue.poll();
            }
        }
        priorityQueue.add(e);
    }
懵少女 2024-12-18 18:32:31

正是我正在寻找的东西。然而,该实现包含一个错误:

即:if elementsLeft > 。 0 和 e 已包含在 TreeSet 中。
在这种情况下,elementsLeft 会减少,但 TreeSet 中的元素数量保持不变。

我建议将 add() 方法中的相应行替换为

        } else if (elementsLeft > 0) {
        // queue isn't full => add element and decrement elementsLeft
        boolean added = super.add(e);
        if (added) {
            elementsLeft--;
        }
        return added;

Exactly what I was looking for. However, the implementation contains a bug:

Namely: if elementsLeft > 0 and e is already contained in the TreeSet.
In this case, elementsLeft is decreased, but the number of elements in the TreeSet stays the same.

I would suggest to replace the corresponding lines in the add() method by

        } else if (elementsLeft > 0) {
        // queue isn't full => add element and decrement elementsLeft
        boolean added = super.add(e);
        if (added) {
            elementsLeft--;
        }
        return added;
番薯 2024-12-18 18:32:31

试试这个代码:

public class BoundedPQueue<E extends Comparable<E>> {
/**
 * Lock used for all public operations
 */
private final ReentrantLock lock;

PriorityBlockingQueue<E> queue ;
int size = 0;

public BoundedPQueue(int capacity){
    queue = new PriorityBlockingQueue<E>(capacity, new CustomComparator<E>());
    size = capacity;
    this.lock = new ReentrantLock();

}

public boolean offer(E e) {


    final ReentrantLock lock = this.lock;
    lock.lock();
    E vl = null;
    if(queue.size()>= size)  {
        vl= queue.poll();
        if(vl.compareTo(e)<0)
            e=vl;
    }

    try {
        return queue.offer(e);
    } finally {
        lock.unlock();
    }


}

public E poll()  {

    return queue.poll();
}

public static class CustomComparator<E extends Comparable<E>> implements Comparator<E> {


    @Override
    public int compare(E o1, E o2) {
        //give me a max heap
         return o1.compareTo(o2) *-1;

    }
}

}

Try this code:

public class BoundedPQueue<E extends Comparable<E>> {
/**
 * Lock used for all public operations
 */
private final ReentrantLock lock;

PriorityBlockingQueue<E> queue ;
int size = 0;

public BoundedPQueue(int capacity){
    queue = new PriorityBlockingQueue<E>(capacity, new CustomComparator<E>());
    size = capacity;
    this.lock = new ReentrantLock();

}

public boolean offer(E e) {


    final ReentrantLock lock = this.lock;
    lock.lock();
    E vl = null;
    if(queue.size()>= size)  {
        vl= queue.poll();
        if(vl.compareTo(e)<0)
            e=vl;
    }

    try {
        return queue.offer(e);
    } finally {
        lock.unlock();
    }


}

public E poll()  {

    return queue.poll();
}

public static class CustomComparator<E extends Comparable<E>> implements Comparator<E> {


    @Override
    public int compare(E o1, E o2) {
        //give me a max heap
         return o1.compareTo(o2) *-1;

    }
}

}
拥抱我好吗 2024-12-18 18:32:31

如果你有番石榴,这是我整理的一份。我认为它已经相当完整了。如果我错过了什么,请告诉我。

您可以使用 gauva ForwardingBlockingQueue,这样您就不必映射所有其他方法。

import com.google.common.util.concurrent.ForwardingBlockingQueue;

public class PriorityBlockingQueueDecorator<E> extends
        ForwardingBlockingQueue<E> {

    public static final class QueueFullException extends IllegalStateException {

        private static final long serialVersionUID = -9218216017510478441L;

    }

    private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

    private int maxSize;

    private PriorityBlockingQueue<E> delegate;

    public PriorityBlockingQueueDecorator(PriorityBlockingQueue<E> delegate) {
        this(MAX_ARRAY_SIZE, delegate);
    }

    public PriorityBlockingQueueDecorator(int maxSize,
            PriorityBlockingQueue<E> delegate) {
        this.maxSize = maxSize;
        this.delegate = delegate;
    }

    @Override
    protected BlockingQueue<E> delegate() {
        return delegate;
    }

    @Override
    public boolean add(E element) {
        return offer(element);
    }

    @Override
    public boolean addAll(Collection<? extends E> collection) {
        boolean modified = false;
        for (E e : collection)
            if (add(e))
                modified = true;
        return modified;
    }

    @Override
    public boolean offer(E e, long timeout, TimeUnit unit)
            throws InterruptedException {
        return offer(e);
    }

    @Override
    public boolean offer(E o) {
        if (maxSize > size()) {
            throw new QueueFullException();
        }
        return super.offer(o);
    }
}

Here is one I put together if you have guava. I think it is is pretty complete. Let me know if I missed something.

You can use the gauva ForwardingBlockingQueue so you don't have to map all the other methods.

import com.google.common.util.concurrent.ForwardingBlockingQueue;

public class PriorityBlockingQueueDecorator<E> extends
        ForwardingBlockingQueue<E> {

    public static final class QueueFullException extends IllegalStateException {

        private static final long serialVersionUID = -9218216017510478441L;

    }

    private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

    private int maxSize;

    private PriorityBlockingQueue<E> delegate;

    public PriorityBlockingQueueDecorator(PriorityBlockingQueue<E> delegate) {
        this(MAX_ARRAY_SIZE, delegate);
    }

    public PriorityBlockingQueueDecorator(int maxSize,
            PriorityBlockingQueue<E> delegate) {
        this.maxSize = maxSize;
        this.delegate = delegate;
    }

    @Override
    protected BlockingQueue<E> delegate() {
        return delegate;
    }

    @Override
    public boolean add(E element) {
        return offer(element);
    }

    @Override
    public boolean addAll(Collection<? extends E> collection) {
        boolean modified = false;
        for (E e : collection)
            if (add(e))
                modified = true;
        return modified;
    }

    @Override
    public boolean offer(E e, long timeout, TimeUnit unit)
            throws InterruptedException {
        return offer(e);
    }

    @Override
    public boolean offer(E o) {
        if (maxSize > size()) {
            throw new QueueFullException();
        }
        return super.offer(o);
    }
}
你与昨日 2024-12-18 18:32:31

嗯,这是一个很老的问题,但我很困惑为什么还没有提出更简单的解决方案。

除非我遗漏了一些东西,否则可以使用 min-heap(Java 的默认 PriorityQueue 实现) 来简单地解决这个问题,稍有改动,因为当 PriorityQueue 的大小变得大于 k 时(即,如果我们正在尝试存储前 k 个元素),您轮询头部。

下面是一个示例,说明

public void storeKLargest(int[] nums, int k) {
    PriorityQueue<Integer> pq = new PriorityQueue<>(k+1);
    for(int num: nums){
        if(pq.size() < k || pq.peek() < num)
            pq.offer(num);
        if(pq.size() == k+1)
            pq.poll();
    }
}

我使用了 Integer 的 PriorityQueue,但它很简单,可以用自定义对象替换它并输入自定义比较器。

除非我遗漏了一些明显的东西,否则我想这就是OP正在寻找的东西。

Well, quite an old question, but I'm confused why a simpler solution hasn't been suggested yet.

Unless I'm missing something, this can be trivially solved using a min-heap (Java's default PriorityQueue implementation) with a slight twist in that the moment the size of the PriorityQueue becomes greater than k(ie if we're trying to store the top k elements), you poll the head.

Here's an example of what I mean

public void storeKLargest(int[] nums, int k) {
    PriorityQueue<Integer> pq = new PriorityQueue<>(k+1);
    for(int num: nums){
        if(pq.size() < k || pq.peek() < num)
            pq.offer(num);
        if(pq.size() == k+1)
            pq.poll();
    }
}

I used a PriorityQueue of Integer, but it's simple enough to replace it with a custom object and feed in a custom Comparator.

Unless I'm missing something obvious, I suppose this is what the OP was looking for.

谁与争疯 2024-12-18 18:32:31

创建一个有大小限制的 PriorityQueue。它存储N个最大数字。

import java.util.*;

class Demo
{
    public static <E extends Comparable<E>> PriorityQueue<E> getPq(final int n, Comparator<E> comparator)
    {
        return new PriorityQueue<E>(comparator)
        {
            boolean full()
            {
                return size() >= n;
            }

            @Override 
            public boolean add(E e)
            {
                if (!full())
                {
                    return super.add(e);
                }
                else if (peek().compareTo(e) < 0)
                {
                    poll();
                    return super.add(e);
                }
                return false;
            }

            @Override
            public boolean offer(E e)
            {
                if (!full())
                {
                    return super.offer(e);
                }
                else if (peek().compareTo(e) < 0)
                {
                    poll();
                    return super.offer(e);
                }
                return false;
            }
        };
    }

    public static void printq(PriorityQueue pq)
    {
        Object o = null;
        while ((o = pq.poll()) != null)
        {
            System.out.println(o);
        }
    }

    public static void main (String[] args)
    {
        PriorityQueue<Integer> pq = getPq(2, new Comparator<Integer>(){
        @Override
        public int compare(Integer i1, Integer i2)
        {
            return i1.compareTo(i2);
        }
        });
        pq.add(4);
        pq.add(1);
        pq.add(5);
        pq.add(2);
        printq(pq);
    }
}

Create a PriorityQueue that has size limit. It stores N max numbers.

import java.util.*;

class Demo
{
    public static <E extends Comparable<E>> PriorityQueue<E> getPq(final int n, Comparator<E> comparator)
    {
        return new PriorityQueue<E>(comparator)
        {
            boolean full()
            {
                return size() >= n;
            }

            @Override 
            public boolean add(E e)
            {
                if (!full())
                {
                    return super.add(e);
                }
                else if (peek().compareTo(e) < 0)
                {
                    poll();
                    return super.add(e);
                }
                return false;
            }

            @Override
            public boolean offer(E e)
            {
                if (!full())
                {
                    return super.offer(e);
                }
                else if (peek().compareTo(e) < 0)
                {
                    poll();
                    return super.offer(e);
                }
                return false;
            }
        };
    }

    public static void printq(PriorityQueue pq)
    {
        Object o = null;
        while ((o = pq.poll()) != null)
        {
            System.out.println(o);
        }
    }

    public static void main (String[] args)
    {
        PriorityQueue<Integer> pq = getPq(2, new Comparator<Integer>(){
        @Override
        public int compare(Integer i1, Integer i2)
        {
            return i1.compareTo(i2);
        }
        });
        pq.add(4);
        pq.add(1);
        pq.add(5);
        pq.add(2);
        printq(pq);
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文