始终保留n个最佳元素的数据结构

发布于 2024-07-14 01:14:00 字数 561 浏览 8 评论 0原文

我需要一个始终保存迄今为止插入的 n 个最大项目的数据结构(无特定顺序)。

因此,如果 n 为 3,我们可以进行以下会话,其中我插入一些数字并且容器的内容发生变化:

[]  // now insert 1
[1] // now insert 0
[1,0] // now insert 4
[1,0,4] // now insert 3
[1,4,3] // now insert 0
[1,4,3] // now insert 3
[4,3,3]

您明白了。 数据结构的名称是什么? 实现这一点的最佳方法是什么? 或者这个在某个图书馆里?

我正在考虑使用一个具有 priority_queue 元素(委托)的容器,它使用反向比较,因此 pop 将删除最小的元素。 因此,insert 函数首先检查要插入的新元素是否大于最小元素。 如果是这样,我们就扔掉最小的元素并推送新元素。

(我想到了一个C++实现,但问题仍然与语言无关。)

I need a data structure that always holds the n largest items inserted so far (in no particular order).

So, if n is 3, we could have the following session where I insert a few numbers and the content of the container changes:

[]  // now insert 1
[1] // now insert 0
[1,0] // now insert 4
[1,0,4] // now insert 3
[1,4,3] // now insert 0
[1,4,3] // now insert 3
[4,3,3]

You get the idea. What's the name of the data structure? What's the best way to implement this? Or is this in some library?

I am thinking to use a container that has a priority_queue for its elements (delegation), which uses the reverse comparison, so pop will remove the smallest element. So the insert function first checks if the new element to be inserted is greater than the smallest. If so, we throw that smallest out and push the new element.

(I have a C++ implementation in mind, but the question is language-agnostic nevertheless.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

梦回梦里 2024-07-21 01:14:00

您想要的特定数据结构可能是隐式堆。 原始数据结构只是一个数组; 为了方便起见,假设其大小为 N=2^n 个元素,并且您要保留最大的 N-1 个元素。

这个想法是将数组(称之为 A)视为深度为 n 的完全二叉树:

  • 忽略 A[0]; 将A[1]视为根节点
  • 对于每个节点 A[k],子节点是 A[2*k] 和 A[2*k+1]
  • 节点 A[N/2..N-1] 是叶子

为了将树维护为“堆”,您需要确保每个节点都小于(或等于)其子节点。 这称为“堆条件”:

  • A[k] <= A[2*k]
  • A[k] <= A[2*k+1]

使用堆来维护最大的N个元素:

  • 注意根A[1]是堆中最小的元素。
  • 将每个新元素 (x) 与根进行比较:如果它较小 (x
  • 否则,将新元素插入堆中,如下所示:
    • 从堆中移除根(A[1],最小元素),并拒绝它
    • 将其替换为新元素 (A[1]:= x)
    • 现在,恢复堆状态:
      • 如果 x 小于或等于它的两个子元素,则完成
      • 否则,将 x 与最小的孩子交换
      • 在每个新位置重复测试和交换,直到满足堆条件

基本上,这将导致任何替换元素“过滤”树,直到到达其自然位置。 这最多需要 n=log2(N) 步,这是你能得到的最好的结果。 此外,树的隐式形式允许非常快速的实现; 现有的有界优先级队列库很可能会使用隐式堆。

The specific datastructure you want is probably the implicit heap. The raw datastructure is just an array; for convenience, say that it is N=2^n elements in size, and that you want to maintain the largest N-1 elements.

The idea is to treat the array (call it A) as a complete binary tree of depth n:

  • ignore A[0]; treat A[1] as the root node
  • for each node A[k], the children are A[2*k] and A[2*k+1]
  • nodes A[N/2..N-1] are the leaves

To maintain the tree as a "heap", you need to ensure that each node is smaller than (or equal to) its children. This is called the "heap condition":

  • A[k] <= A[2*k]
  • A[k] <= A[2*k+1]

To use the heap to maintain the largest N elements:

  • note that the root A[1] is the smallest element in the heap.
  • compare each new element (x) to the root: if it is smaller (x<A[1]), reject it.
  • otherwise, insert the new element into the heap, as follows:
    • remove the root (A[1], the smallest element) from the heap, and reject it
    • replace it with the new element (A[1]:= x)
    • now, restore the heap condition:
      • if x is less than or equal to both of its children, you're done
      • otherwise, swap x with the smallest child
      • repeat the test&swap at each new position until the heap condition is met

Basically, this will cause any replacement element to "filter up" the tree until it achieves its natural place. This will take at most n=log2(N) steps, which is as good as you can get. Also, the implicit form of the tree allows a very fast implementation; existing bounded-priority-queue libraries will most likely use an implicit heap.

呆橘 2024-07-21 01:14:00

在 Java 中,您可以使用由 TreeSet 等实现的 SortedSet。 每次插入后检查集合是否太大,如果是则删除最后一个元素。

这是相当有效的,我已经成功地使用它解决了几个欧拉项目问题。

In Java you can use a SortedSet implemented e.g. by a TreeSet. After each insertion check if the set is too large, if yes remove the last element.

This is reasonably efficient, I have used it successfully for solving several Project Euler problems.

耳钉梦 2024-07-21 01:14:00

Priority_queue 是 C++ 中与 STL 最接近的东西。 您可以将其包装在另一个类中以创建您自己的自动调整大小的实现。

与语言无关(尽管内存碎片可能不安全):

  1. 插入数据
  2. 排序
  3. 删除第 n 个元素

std::priority_queue 为您执行步骤 2 之后的所有内容。

A priority_queue is the closest thing in C++ with STL. You could wrap it in another class to create your own implementation that trims the size automatically.

Language-agnostically (although maybe not memory-fragmentation-safely):

  1. Insert data
  2. Sort
  3. Delete everything after the nth element

std::priority_queue does step 2 for you.

信仰 2024-07-21 01:14:00

有界优先级队列,我认为...Java 在其标准库中有类似的东西。 编辑:它被称为 LinkedBlockingQueue。 我不确定 C++ STL 是否包含类似的内容。

A bounded priority queue, I think... Java has something like this in its standard library. EDIT: it's called LinkedBlockingQueue. I'm not sure if the C++ STL includes something similar.

春花秋月 2024-07-21 01:14:00

是否可以只从已排序的集合中取出前 n 个元素?

Isn't it possible to just take the first n elements from a sorted collection?

家住魔仙堡 2024-07-21 01:14:00

在 Pyhton 中,使用 heapq。 在它周围创建一个小包装,如下所示:

class TopN_Queue:
    def __init__(self, n):
        self.max_sz = n
        self.data = []

    def add(self, x):
        if len(self.data) == self.max_sz:
            heapq.heappushpop(self.data, x)
        else:
            heapq.heappush(self.data, x)

...

In Pyhton, use heapq. Create a small wrapper around it, something like this:

class TopN_Queue:
    def __init__(self, n):
        self.max_sz = n
        self.data = []

    def add(self, x):
        if len(self.data) == self.max_sz:
            heapq.heappushpop(self.data, x)
        else:
            heapq.heappush(self.data, x)

...

那些过往 2024-07-21 01:14:00

是的,您可以保持 N 号的最小头部
然后在每次插入时将新项目与根项目进行比较
如果根项“大于”根项,则弹出根项并插入该项
最后你得到了 N 个最大的项目

yes you can maintain a minimal head of size N
then you compare new item with the root item on each insertion
pop the root and insert the item if it's "greater" than the root
finally you end up with N largest items

掩于岁月 2024-07-21 01:14:00

这是 C++ 11 中的一个简单实现。
可能不是最优化的,但易于使用且简洁。

    #include <iostream>
    #include <map>
    
    /// @brief Maintains a set of N elements with the best keys. (duplicate keys are allowed)
    template<class Key, class T, class Compare = std::less<Key> >
    class KeepNBests{
    public:
        KeepNBests(size_t N=0):N(N), is_less(Compare()){}
    
        bool add(const Key& key, const T& t){
            // If there are enough elements,
            if(bests.size()>N // and the candidate key is worse than the worst stored key...
                && is_less(key, bests.cbegin()->first)){
                return false; // nothing to do
            }
            // else insert the new candidate
            bests.insert({key, t});
            // keeps max N elts
            while(bests.size()>N){
                bests.erase(bests.cbegin());
            }
            return true;
        }
        const std::multimap<Key, T, Compare>& get()const{
            return bests;
        }
    public:
        const size_t N;
    private:
        Compare is_less;
        std::multimap<Key, T, Compare> bests;
    };
    
    
    int main() {
        
      KeepNBests<double, std::string> best(3);
      best.add(0., "bad");
      best.add(0., "bad");
      best.add(300., "the best");
      best.add(10., "ok");
      best.add(-1., "bad");
      best.add(-7., "very bad");
      best.add(-1., "bad");
      best.add(100., "good");
      best.add(2., "bof");
      best.add(200., "the second");
    
      for(const auto& b : best.get()){
        std::cout << b.first << " " << b.second << std::endl;
      }
      return 0;
    }

输出 :

 100 good
 200 the second
 300 the best

Here's a simple implementation in C++ 11.
Probably not the most optimal, but easy to use and concise.

    #include <iostream>
    #include <map>
    
    /// @brief Maintains a set of N elements with the best keys. (duplicate keys are allowed)
    template<class Key, class T, class Compare = std::less<Key> >
    class KeepNBests{
    public:
        KeepNBests(size_t N=0):N(N), is_less(Compare()){}
    
        bool add(const Key& key, const T& t){
            // If there are enough elements,
            if(bests.size()>N // and the candidate key is worse than the worst stored key...
                && is_less(key, bests.cbegin()->first)){
                return false; // nothing to do
            }
            // else insert the new candidate
            bests.insert({key, t});
            // keeps max N elts
            while(bests.size()>N){
                bests.erase(bests.cbegin());
            }
            return true;
        }
        const std::multimap<Key, T, Compare>& get()const{
            return bests;
        }
    public:
        const size_t N;
    private:
        Compare is_less;
        std::multimap<Key, T, Compare> bests;
    };
    
    
    int main() {
        
      KeepNBests<double, std::string> best(3);
      best.add(0., "bad");
      best.add(0., "bad");
      best.add(300., "the best");
      best.add(10., "ok");
      best.add(-1., "bad");
      best.add(-7., "very bad");
      best.add(-1., "bad");
      best.add(100., "good");
      best.add(2., "bof");
      best.add(200., "the second");
    
      for(const auto& b : best.get()){
        std::cout << b.first << " " << b.second << std::endl;
      }
      return 0;
    }

Output :

 100 good
 200 the second
 300 the best
跨年 2024-07-21 01:14:00

创建一个最小堆,同时存储一个计数器。

每当到达柜台时; 提取最小。

您可以通过以下方式执行此操作: O(1) insert、get-min 和 O(log log n) extract-min.[1] 或者,您可以使用 O(log n) insert 和 O 执行此操作(1) 对于其他提到的操作。[2]


[1] M. Thorup,“在恒定时间内具有递减键的整数优先级队列和单源最短路径问题”,载于第三十五届 ACM 计算理论年度研讨会论文集,美国纽约州纽约市,2003 年,第 149-158 页。

[2] GS Brodal、G. Lagogiannis、C. Makris、A. Tsakalidis 和 K. Tsichlas,“指针机中的最佳手指搜索树”,J. Comput。 系统。 科学,卷。 67、没有。 2,第 381–418 页,2003 年 9 月。

Create a min-heap, also store a counter.

Whenever the counter is reached; extract-min.

You can do this in: O(1) insert, get-min and O(log log n) extract-min.[1] Alternatively you can do this with O(log n) insert and O(1) for the other mentioned operations.[2]


[1] M. Thorup, “Integer priority queues with decrease key in constant time and the single source shortest paths problem,” in Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, New York, NY, USA, 2003, pp. 149–158.

[2] G. S. Brodal, G. Lagogiannis, C. Makris, A. Tsakalidis, and K. Tsichlas, “Optimal finger search trees in the pointer machine,” J. Comput. Syst. Sci., vol. 67, no. 2, pp. 381–418, Sep. 2003.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文