当前位置：文江博客话题详情

O(log n) 中值算法

发布于 2024-09-17 07:13:05 字数 41 浏览 8 评论 0原文

如何以时间复杂度 O(log n) 去除集合的中位数？有什么想法吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

零度℉ 2024-09-24 07:13:06

下面是一个基于 TreeSet 的 Java 解决方案：

public class SetWithMedian {
    private SortedSet<Integer> s = new TreeSet<Integer>();
    private Integer m = null;

    public boolean contains(int e) {
        return s.contains(e);
    }
    public Integer getMedian() {
        return m;
    }
    public void add(int e) {
        s.add(e);
        updateMedian();
    }
    public void remove(int e) {
        s.remove(e);
        updateMedian();
    }
    private void updateMedian() {
        if (s.size() == 0) {
            m = null;
        } else if (s.size() == 1) {
            m = s.first();
        } else {
            SortedSet<Integer> h = s.headSet(m);
            SortedSet<Integer> t = s.tailSet(m + 1);
            int x = 1 - s.size() % 2;
            if (h.size() < t.size() + x)
                m = t.first();
            else if (h.size() > t.size() + x)
                m = h.last();
        }
    }
}

删除中位数（即“s.remove(s.getMedian())”）需要 O(log n) 时间。

编辑：为了帮助理解代码，这里是类属性的不变条件：

private boolean isGood() {
    if (s.isEmpty()) {
        return m == null;
    } else {
        return s.contains(m) && s.headSet(m).size() + s.size() % 2 == s.tailSet(m).size();
    }
}

以人类可读的形式：

如果集合“s”为空，则“m”必须为
无效的。
如果集合“s”不为空，则它必须
包含“m”。
设 x 为元素数量
严格小于“m”，并令 y 为
元素数量大于
或等于“m”。那么如果总共
元素个数为偶数，x 必须为
等于y；否则，x+1 必须是
等于y。

Here's a solution in Java, based on TreeSet:

public class SetWithMedian {
    private SortedSet<Integer> s = new TreeSet<Integer>();
    private Integer m = null;

    public boolean contains(int e) {
        return s.contains(e);
    }
    public Integer getMedian() {
        return m;
    }
    public void add(int e) {
        s.add(e);
        updateMedian();
    }
    public void remove(int e) {
        s.remove(e);
        updateMedian();
    }
    private void updateMedian() {
        if (s.size() == 0) {
            m = null;
        } else if (s.size() == 1) {
            m = s.first();
        } else {
            SortedSet<Integer> h = s.headSet(m);
            SortedSet<Integer> t = s.tailSet(m + 1);
            int x = 1 - s.size() % 2;
            if (h.size() < t.size() + x)
                m = t.first();
            else if (h.size() > t.size() + x)
                m = h.last();
        }
    }
}

Removing the median (i.e. "s.remove(s.getMedian())") takes O(log n) time.

Edit: To help understand the code, here's the invariant condition of the class attributes:

private boolean isGood() {
    if (s.isEmpty()) {
        return m == null;
    } else {
        return s.contains(m) && s.headSet(m).size() + s.size() % 2 == s.tailSet(m).size();
    }
}

In human-readable form:

If the set "s" is empty, then "m" must be
null.
If the set "s" is not empty, then it must
contain "m".
Let x be the number of elements
strictly less than "m", and let y be
the number of elements greater than
or equal "m". Then, if the total
number of elements is even, x must be
equal to y; otherwise, x+1 must be
equal to y.

回复收藏 0 原文

无悔心 2024-09-24 07:13:06

尝试使用红黑树。它应该工作得很好，通过二分搜索你可以得到你的 log(n)。它还具有 log(n) 的删除和插入时间，并且重新平衡也在 log(n) 中完成。

回复收藏 0 原文

帅冕 2024-09-24 07:13:06

正如前面的答案中提到的，如果不触及数据结构的每个元素，就无法找到中位数。如果您寻找的算法必须顺序执行，那么您能做的最好的就是 O(n)。确定性选择算法（中位数）或 BFPRT 算法将以 O(n) 的最坏情况解决问题。您可以在这里找到更多相关信息： http://en.wikipedia.org/wiki/Selection_algorithm #Linear_general_selection_algorithm_-_Median_of_Medians_algorithm

然而，中值中值算法的运行速度可以比 O(n) 更快，从而使其并行。由于其分而治之的性质，该算法可以“轻松”并行化。例如，将输入数组划分为 5 个元素时，您可能会为每个子数组启动一个线程，对其进行排序并找到该线程内的中位数。当此步骤完成时，线程将被连接，并且算法将使用新形成的中值数组再次运行。

请注意，这种设计仅对非常大的数据集有益。生成线程和合并它们的额外开销使得它对于较小的集合不可行。这有一些见解：http://www.umiacs。 umd.edu/research/EXPAR/papers/3494/node18.html

请注意，您可以找到渐近更快的算法，但它们对于日常使用来说不够实用。最好的选择是已经提到的顺序中位数算法。

回复收藏 0 原文

于我来说 2024-09-24 07:13:06

当然，尤达大师的随机算法与其他算法一样，最小复杂度为 n，预期复杂度为 n（不是 log n），最大复杂度为 n 平方（如快速排序）。还是很不错的。

实际上，“随机”主元选择有时可能是固定位置（不涉及 RNG），因为已知初始数组元素足够随机（例如不同值的随机排列，或独立且同分布）或从输入值的近似或确切已知分布。

回复收藏 0 原文

铃予 2024-09-24 07:13:06

我知道一种随机算法，其时间复杂度预计为 O(n)。

算法如下：

输入：n 个数字的数组 A[1...n] [不失一般性，我们可以假设 n 是偶数]

输出：排序数组中的第 n/2 个元素。

算法 ( A[1..n] , k = n/2)：

从 1...n 中随机选择一个主元 - p

将数组分为 2 部分：

L - 元素 <= A[p]

R -具有元素> A[p]

if(n/2 == |L|) A[|L| + 1] 是中值停止

if( n/2 < |L|) 对 (L, k) 进行递归

，否则对 (R, k - (|L| + 1) 进行递归

复杂性：
在）
证明都是数学的。一页长。如果你有兴趣请联系我。

回复收藏 0 原文

彩虹直至黑白 2024-09-24 07:13:06

扩展 rwong 的答案：这是一个示例代码

// partial_sort example
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;


int main () {
  int myints[] = {9,8,7,6,5,4,3,2,1};
  vector<int> myvector (myints, myints+9);
  vector<int>::iterator it;

  partial_sort (myvector.begin(), myvector.begin()+5, myvector.end());

  // print out content:
  cout << "myvector contains:";
  for (it=myvector.begin(); it!=myvector.end(); ++it)
    cout << " " << *it;

  cout << endl;

  return 0;
}

输出：
myvector 包含： 1 2 3 4 5 9 8 7 6

中间的元素将是中位数。

To expand on rwong's answer: Here is an example code

// partial_sort example
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;


int main () {
  int myints[] = {9,8,7,6,5,4,3,2,1};
  vector<int> myvector (myints, myints+9);
  vector<int>::iterator it;

  partial_sort (myvector.begin(), myvector.begin()+5, myvector.end());

  // print out content:
  cout << "myvector contains:";
  for (it=myvector.begin(); it!=myvector.end(); ++it)
    cout << " " << *it;

  cout << endl;

  return 0;
}

Output:
myvector contains: 1 2 3 4 5 9 8 7 6

The element in the middle would be the median.

回复收藏 0 原文

蒗幽 2024-09-24 07:13:05

如果集合已排序，则查找中位数需要 O(1) 项检索。如果项目的顺序是任意的，那么在不检查大多数项目的情况下就不可能确定中位数。如果检查了大多数（但不是全部）项目，则可以保证中位数在某个范围内（如果列表包含重复项，则上限和下限可能匹配），但检查大多数项目列表中的项目意味着 O(n) 项检索。

如果集合中的信息未完全排序，但已知某些排序关系，则检索项所需的时间可能介于 O(1) 和 O(n) 之间，具体取决于已知排序的性质关系。

回复收藏 0 原文

就像说晚安 2024-09-24 07:13:05

对于未排序的列表，重复执行O(n)部分排序，直到位于中间位置的元素是已知的。不过，这至少是O(n)。

有关于正在排序的元素的任何信息吗？

回复收藏 0 原文

临走之时 2024-09-24 07:13:05

对于一般的、未排序的集合，不可能在 O(n) 时间内可靠地找到中值。您可以在 O(1) 中找到排序集的中位数，或者您可以在 O(n log n) 时间内自行对集合进行简单排序，然后在 O(1) 中找到中位数，给出 O(n logn n)算法。或者，最后，还有更聪明的中值选择算法，可以通过分区而不是排序来工作，并产生 O(n) 性能。

但是，如果该集合没有特殊属性并且不允许您进行任何预处理步骤，那么您将永远不会低于 O(n)，因为您需要至少检查所有元素一次以确保您的中位数是正确的。

回复收藏 0 原文

~没有更多了~