在不存储整个数组的情况下单遍查找第 K 大数

发布于 2024-11-17 05:49:05 字数 185 浏览 8 评论 0原文

我想到的算法是

  • 保留大小为 K 的 MaxHeap,
  • 如果堆已满,则插入每个元素,
  • 删除较小的值。
  • 最后,Kth max 是 MaxHeap 的较小者,

这将给我 O(NlogK)。有更好的算法吗?我无法进行快速选择,因为数组无法存储在内存中。

The algo I have in mind is

  • keep an MaxHeap of size K
  • insert each element
  • drop out smaller value if heap is full
  • In the end, Kth max is the smaller of MaxHeap

That will give me O(NlogK). Is there a better algorithm? I can't do quick selection, because the array can't be stored in memory.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

寄人书 2024-11-24 05:49:05

根据您的内存限制,您可以使用中位数算法的修改版本来解决 O(n) 时间和 O(k) 空间中的问题。

想法如下。在内存中维护一个大小为2k的数组。使用数组中的前 2k 个元素填充此缓冲区,然后对其运行中位数算法,将最大的 k 个元素放入数组的前半部分,将最小的 k 个元素放入数组的后半部分。然后,丢弃最小的 k 个元素。现在,将接下来的 k 个元素加载到数组的后半部分,使用中位数算法再次将顶部 k 个元素放在左侧,将底部 k 个元素放在右侧。如果您在数组中迭代此过程 - 将缓冲区的后半部分替换为数组中的下一个 k 元素,然后使用中位数中位数将其中的前 k 个元素移动到左半部分 - 那么最后您将前 k 个元素位于数组的左半部分。找到其中最小的一个(在 O(k) 时间内)将给出第 k 个最大的元素。

总体而言,您最终会使用大小为 O(k) 的数组对中位数中位数算法进行 O(n / k) 次调用,即对需要 O(k) 时间的算法进行 O(n / k) 次调用,净运行时间为 O(n)。这与最后一步相结合,运行时间为 O(n + k) = O(n)。此外,由于中位数步骤的内存使用量为 O(k),并且由于周围有一个大小为 O(k) 的缓冲区,因此仅使用 O(k) 内存。换句话说,它比最小堆解决方案渐近更快,并且在内存中渐近等效。

希望这有帮助!

Depending on your memory restrictions, you can use a modified version of the median-of-medians algorithm to solve the problem in O(n) time and O(k) space.

The idea is as follows. Maintain an array of size 2k in memory. Fill this buffer with the first 2k elements from the array, then run the median-of-medians algorithm on it to put the largest k elements in the first half of the array and the smallest k elements in the second half. Then, discard the smallest k elements. Now, load the next k elements into the second half of the array, use the median-of-medians algorithm to again put the top k elements in the left side and the bottom k elements on the right. If you iterate this process across the array - replacing the second half of the buffer with the next k elements from the array, then using median-of-medians to move the top k of those to the left half - then at the end you will have the top k elements in the left half of the array. Finding the smallest of those (in O(k) time) will then give you the kth largest element.

Overall, you end up making O(n / k) calls to the median-of-medians algorithm with an array of size O(k), which is O(n / k) calls to an algorithm that takes O(k) time, for a net runtime of O(n). This, combined with the final step, runs in O(n + k) = O(n) time. Moreover, since the memory usage for the median-of-medians step is O(k), and since you have a buffer of size O(k) lying around, this uses only O(k) memory. In other words, it's asymptotically faster than the min-heap solution and asymptotically equivalent in memory.

Hope this helps!

滿滿的愛 2024-11-24 05:49:05

这被称为 http://en.wikipedia.org/wiki/Selection_algorithm

中的一种算法特别是 http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm

这是 O(N) 时间和 O(1)< /代码> 空间。我相信如果你的数组没有排序,它可以就地完成。如果您的阵列存储在外部(硬盘、网络等),您可以利用必须使用的 ~K 字。如果您的数组是由函数动态生成的,您将遇到类似的情况。

This is known as the http://en.wikipedia.org/wiki/Selection_algorithm

One algorithm in particular is http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm

It is O(N) time and O(1) space. I believe it can be done in-place, if your array isn't sorted. If your array is being stored externally (hard disk, network, etc.), you can leverage the ~K words you have to work with. If your array is dynamically generated by a function, you'll be in a similar situation.

烟凡古楼 2024-11-24 05:49:05

稍微修改算法,因为 maxheap 不支持高效的“查找最小”。

  • 将前 K 个项目插入
    min-heap
  • 对于剩余的项,如果值大于堆

    • 弹出头部,然后插入新项目。
  • 头部是第 K 大的项目。

对于输入,最坏的情况仍然是 O(N lg K),其中每个项目都大于最后 K 中的最小值。但是对于随机分布的输入,您只需对较小百分比的输入进行堆操作。项目。

Modify the algorithm slightly, because a maxheap does not support efficient "find smallest".

  • Insert the first K items into a
    min-heap
  • For the remaining items, if the value is larger than the heap
    head

    • pop the head, and insert the new item.
  • The head is the Kth largest item.

The worst case is still O(N lg K) for an input where each item is greater than the smallest of the last K. But for a randomly distributed input, you will do only have to do the heap operation on a smaller percentage of the items.

若水微香 2024-11-24 05:49:05

运行此代码 - 它运行良好,无需触及元素位置或对其进行排序。

public class nth {

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub
        calc.getit(2);
    }

}
class calc{
    static int arr[] = { 1, 23, 47, 81, 92, 87, 52, 48, 56, 66, 65, 76, 71, 85,
        49, 53, 56, 61, 65, 84 };

    static void getit(int k){
        int i,j=0;
        for(i=0;i<arr.length;i++){
            int c=0;
            for(j=0;j<arr.length;j++){
                if(arr[i]>arr[j]){
                    c++;
                }
            }
            if(c==k-1){
                System.out.println(arr[i]);
            }
        }
    }
}

ran this code - its running fine without touching element position or sorting the same.

public class nth {

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub
        calc.getit(2);
    }

}
class calc{
    static int arr[] = { 1, 23, 47, 81, 92, 87, 52, 48, 56, 66, 65, 76, 71, 85,
        49, 53, 56, 61, 65, 84 };

    static void getit(int k){
        int i,j=0;
        for(i=0;i<arr.length;i++){
            int c=0;
            for(j=0;j<arr.length;j++){
                if(arr[i]>arr[j]){
                    c++;
                }
            }
            if(c==k-1){
                System.out.println(arr[i]);
            }
        }
    }
}
只想待在家 2024-11-24 05:49:05

我有一个 PriorityQueue 的实现。尝试一下:

import java.util.PriorityQueue;

public class FindKthLargestElementWithHeap {

    /**
     * We can use a min heap to solve this problem. 
     * The heap stores the top k elements.
     * Whenever the size is greater than k, delete the min.
     * Time complexity is O(nlog(k)).
     * Space complexity is O(k) for storing the top k numbers.
     */
    public static int findKthLargest(int[] nums, int k) {
        PriorityQueue<Integer> q = new PriorityQueue<>(k);
        for(int i: nums){
            q.offer(i);

            if(q.size()>k){
                q.poll();
            }
        }

        return q.peek();
    }

    public static void main(String args[]) {
        int[] nums = {5,8,6,97,12,3,5,6,4,2,3,};
        //Return the second largest number
        System.out.println(findKthLargest(nums,2));
    }

}

有关更多信息,请访问:https:/ /github.com/m-vahidalizadeh/foundations/blob/master/src/data_structs/FindKthLargestElementWithHeap.java

I have an implementation with PriorityQueue. Try this:

import java.util.PriorityQueue;

public class FindKthLargestElementWithHeap {

    /**
     * We can use a min heap to solve this problem. 
     * The heap stores the top k elements.
     * Whenever the size is greater than k, delete the min.
     * Time complexity is O(nlog(k)).
     * Space complexity is O(k) for storing the top k numbers.
     */
    public static int findKthLargest(int[] nums, int k) {
        PriorityQueue<Integer> q = new PriorityQueue<>(k);
        for(int i: nums){
            q.offer(i);

            if(q.size()>k){
                q.poll();
            }
        }

        return q.peek();
    }

    public static void main(String args[]) {
        int[] nums = {5,8,6,97,12,3,5,6,4,2,3,};
        //Return the second largest number
        System.out.println(findKthLargest(nums,2));
    }

}

For more information, please visit: https://github.com/m-vahidalizadeh/foundations/blob/master/src/data_structures/FindKthLargestElementWithHeap.java.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文