始终保留n个最佳元素的数据结构

发布于 2024-07-14 01:14:00 字数 561 浏览 8 评论 0原文

我需要一个始终保存迄今为止插入的 n 个最大项目的数据结构（无特定顺序）。

因此，如果 n 为 3，我们可以进行以下会话，其中我插入一些数字并且容器的内容发生变化：

[]  // now insert 1
[1] // now insert 0
[1,0] // now insert 4
[1,0,4] // now insert 3
[1,4,3] // now insert 0
[1,4,3] // now insert 3
[4,3,3]

您明白了。数据结构的名称是什么？实现这一点的最佳方法是什么？或者这个在某个图书馆里？

我正在考虑使用一个具有 priority_queue 元素（委托）的容器，它使用反向比较，因此 pop 将删除最小的元素。因此，insert 函数首先检查要插入的新元素是否大于最小元素。如果是这样，我们就扔掉最小的元素并推送新元素。

（我想到了一个C++实现，但问题仍然与语言无关。）

原文

I need a data structure that always holds the n largest items inserted so far (in no particular order).

So, if n is 3, we could have the following session where I insert a few numbers and the content of the container changes:

[]  // now insert 1
[1] // now insert 0
[1,0] // now insert 4
[1,0,4] // now insert 3
[1,4,3] // now insert 0
[1,4,3] // now insert 3
[4,3,3]

You get the idea. What's the name of the data structure? What's the best way to implement this? Or is this in some library?

I am thinking to use a container that has a priority_queue for its elements (delegation), which uses the reverse comparison, so pop will remove the smallest element. So the insert function first checks if the new element to be inserted is greater than the smallest. If so, we throw that smallest out and push the new element.

(I have a C++ implementation in mind, but the question is language-agnostic nevertheless.)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦回梦里 2024-07-21 01:14:00

您想要的特定数据结构可能是隐式堆。原始数据结构只是一个数组；为了方便起见，假设其大小为 N=2^n 个元素，并且您要保留最大的 N-1 个元素。

这个想法是将数组（称之为 A）视为深度为 n 的完全二叉树：

忽略 A[0]；将A[1]视为根节点
对于每个节点 A[k]，子节点是 A[2*k] 和 A[2*k+1]
节点 A[N/2..N-1] 是叶子

为了将树维护为“堆”，您需要确保每个节点都小于（或等于）其子节点。这称为“堆条件”：

A[k] <= A[2*k]
A[k] <= A[2*k+1]

使用堆来维护最大的N个元素：

注意根A[1]是堆中最小的元素。
将每个新元素 (x) 与根进行比较：如果它较小 (x
否则，将新元素插入堆中，如下所示：
- 从堆中移除根（A[1]，最小元素），并拒绝它
- 将其替换为新元素 (A[1]:= x)
- 现在，恢复堆状态：
  - 如果 x 小于或等于它的两个子元素，则完成
  - 否则，将 x 与最小的孩子交换
  - 在每个新位置重复测试和交换，直到满足堆条件

基本上，这将导致任何替换元素“过滤”树，直到到达其自然位置。这最多需要 n=log2(N) 步，这是你能得到的最好的结果。此外，树的隐式形式允许非常快速的实现；现有的有界优先级队列库很可能会使用隐式堆。

回复收藏 0 原文

呆橘 2024-07-21 01:14:00

在 Java 中，您可以使用由 TreeSet 等实现的 SortedSet。每次插入后检查集合是否太大，如果是则删除最后一个元素。

这是相当有效的，我已经成功地使用它解决了几个欧拉项目问题。

回复收藏 0 原文

耳钉梦 2024-07-21 01:14:00

Priority_queue 是 C++ 中与 STL 最接近的东西。您可以将其包装在另一个类中以创建您自己的自动调整大小的实现。

与语言无关（尽管内存碎片可能不安全）：

插入数据
排序
删除第 n 个元素

std::priority_queue 为您执行步骤 2 之后的所有内容。

回复收藏 0 原文

信仰 2024-07-21 01:14:00

有界优先级队列，我认为...Java 在其标准库中有类似的东西。编辑：它被称为 LinkedBlockingQueue。我不确定 C++ STL 是否包含类似的内容。

回复收藏 0 原文

春花秋月 2024-07-21 01:14:00

是否可以只从已排序的集合中取出前 n 个元素？

回复收藏 0 原文

家住魔仙堡 2024-07-21 01:14:00

在 Pyhton 中，使用 heapq。在它周围创建一个小包装，如下所示：

class TopN_Queue:
    def __init__(self, n):
        self.max_sz = n
        self.data = []

    def add(self, x):
        if len(self.data) == self.max_sz:
            heapq.heappushpop(self.data, x)
        else:
            heapq.heappush(self.data, x)

...

In Pyhton, use heapq. Create a small wrapper around it, something like this:

class TopN_Queue:
    def __init__(self, n):
        self.max_sz = n
        self.data = []

    def add(self, x):
        if len(self.data) == self.max_sz:
            heapq.heappushpop(self.data, x)
        else:
            heapq.heappush(self.data, x)

...

回复收藏 0 原文

那些过往 2024-07-21 01:14:00

是的，您可以保持 N 号的最小头部
然后在每次插入时将新项目与根项目进行比较
如果根项“大于”根项，则弹出根项并插入该项
最后你得到了 N 个最大的项目

回复收藏 0 原文

掩于岁月 2024-07-21 01:14:00

这是 C++ 11 中的一个简单实现。
可能不是最优化的，但易于使用且简洁。

    #include <iostream>
    #include <map>
    
    /// @brief Maintains a set of N elements with the best keys. (duplicate keys are allowed)
    template<class Key, class T, class Compare = std::less<Key> >
    class KeepNBests{
    public:
        KeepNBests(size_t N=0):N(N), is_less(Compare()){}
    
        bool add(const Key& key, const T& t){
            // If there are enough elements,
            if(bests.size()>N // and the candidate key is worse than the worst stored key...
                && is_less(key, bests.cbegin()->first)){
                return false; // nothing to do
            }
            // else insert the new candidate
            bests.insert({key, t});
            // keeps max N elts
            while(bests.size()>N){
                bests.erase(bests.cbegin());
            }
            return true;
        }
        const std::multimap<Key, T, Compare>& get()const{
            return bests;
        }
    public:
        const size_t N;
    private:
        Compare is_less;
        std::multimap<Key, T, Compare> bests;
    };
    
    
    int main() {
        
      KeepNBests<double, std::string> best(3);
      best.add(0., "bad");
      best.add(0., "bad");
      best.add(300., "the best");
      best.add(10., "ok");
      best.add(-1., "bad");
      best.add(-7., "very bad");
      best.add(-1., "bad");
      best.add(100., "good");
      best.add(2., "bof");
      best.add(200., "the second");
    
      for(const auto& b : best.get()){
        std::cout << b.first << " " << b.second << std::endl;
      }
      return 0;
    }

输出：

 100 good
 200 the second
 300 the best

Here's a simple implementation in C++ 11.
Probably not the most optimal, but easy to use and concise.

    #include <iostream>
    #include <map>
    
    /// @brief Maintains a set of N elements with the best keys. (duplicate keys are allowed)
    template<class Key, class T, class Compare = std::less<Key> >
    class KeepNBests{
    public:
        KeepNBests(size_t N=0):N(N), is_less(Compare()){}
    
        bool add(const Key& key, const T& t){
            // If there are enough elements,
            if(bests.size()>N // and the candidate key is worse than the worst stored key...
                && is_less(key, bests.cbegin()->first)){
                return false; // nothing to do
            }
            // else insert the new candidate
            bests.insert({key, t});
            // keeps max N elts
            while(bests.size()>N){
                bests.erase(bests.cbegin());
            }
            return true;
        }
        const std::multimap<Key, T, Compare>& get()const{
            return bests;
        }
    public:
        const size_t N;
    private:
        Compare is_less;
        std::multimap<Key, T, Compare> bests;
    };
    
    
    int main() {
        
      KeepNBests<double, std::string> best(3);
      best.add(0., "bad");
      best.add(0., "bad");
      best.add(300., "the best");
      best.add(10., "ok");
      best.add(-1., "bad");
      best.add(-7., "very bad");
      best.add(-1., "bad");
      best.add(100., "good");
      best.add(2., "bof");
      best.add(200., "the second");
    
      for(const auto& b : best.get()){
        std::cout << b.first << " " << b.second << std::endl;
      }
      return 0;
    }

Output :