O(log n) 中值算法

发布于 2024-09-17 07:13:05 字数 41 浏览 8 评论 0原文

如何以时间复杂度 O(log n) 去除集合的中位数?有什么想法吗?

How can we remove the median of a set with time complexity O(log n)? Some idea?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

零度℉ 2024-09-24 07:13:06

下面是一个基于 TreeSet 的 Java 解决方案:

public class SetWithMedian {
    private SortedSet<Integer> s = new TreeSet<Integer>();
    private Integer m = null;

    public boolean contains(int e) {
        return s.contains(e);
    }
    public Integer getMedian() {
        return m;
    }
    public void add(int e) {
        s.add(e);
        updateMedian();
    }
    public void remove(int e) {
        s.remove(e);
        updateMedian();
    }
    private void updateMedian() {
        if (s.size() == 0) {
            m = null;
        } else if (s.size() == 1) {
            m = s.first();
        } else {
            SortedSet<Integer> h = s.headSet(m);
            SortedSet<Integer> t = s.tailSet(m + 1);
            int x = 1 - s.size() % 2;
            if (h.size() < t.size() + x)
                m = t.first();
            else if (h.size() > t.size() + x)
                m = h.last();
        }
    }
}

删除中位数(即“s.remove(s.getMedian())”)需要 O(log n) 时间。

编辑:为了帮助理解代码,这里是类属性的不变条件:

private boolean isGood() {
    if (s.isEmpty()) {
        return m == null;
    } else {
        return s.contains(m) && s.headSet(m).size() + s.size() % 2 == s.tailSet(m).size();
    }
}

以人类可读的形式:

  • 如果集合“s”为空,则“m”必须为
    无效的。
  • 如果集合“s”不为空,则它必须
    包含“m”。
  • 设 x 为元素数量
    严格小于“m”,并令 y 为
    元素数量大于
    或等于“m”。那么如果总共
    元素个数为偶数,x 必须为
    等于y;否则,x+1 必须是
    等于y。

Here's a solution in Java, based on TreeSet:

public class SetWithMedian {
    private SortedSet<Integer> s = new TreeSet<Integer>();
    private Integer m = null;

    public boolean contains(int e) {
        return s.contains(e);
    }
    public Integer getMedian() {
        return m;
    }
    public void add(int e) {
        s.add(e);
        updateMedian();
    }
    public void remove(int e) {
        s.remove(e);
        updateMedian();
    }
    private void updateMedian() {
        if (s.size() == 0) {
            m = null;
        } else if (s.size() == 1) {
            m = s.first();
        } else {
            SortedSet<Integer> h = s.headSet(m);
            SortedSet<Integer> t = s.tailSet(m + 1);
            int x = 1 - s.size() % 2;
            if (h.size() < t.size() + x)
                m = t.first();
            else if (h.size() > t.size() + x)
                m = h.last();
        }
    }
}

Removing the median (i.e. "s.remove(s.getMedian())") takes O(log n) time.

Edit: To help understand the code, here's the invariant condition of the class attributes:

private boolean isGood() {
    if (s.isEmpty()) {
        return m == null;
    } else {
        return s.contains(m) && s.headSet(m).size() + s.size() % 2 == s.tailSet(m).size();
    }
}

In human-readable form:

  • If the set "s" is empty, then "m" must be
    null.
  • If the set "s" is not empty, then it must
    contain "m".
  • Let x be the number of elements
    strictly less than "m", and let y be
    the number of elements greater than
    or equal "m". Then, if the total
    number of elements is even, x must be
    equal to y; otherwise, x+1 must be
    equal to y.
无悔心 2024-09-24 07:13:06

尝试使用红黑树。它应该工作得很好,通过二分搜索你可以得到你的 log(n)。它还具有 log(n) 的删除和插入时间,并且重新平衡也在 log(n) 中完成。

Try a Red-black-tree. It should work quiet good and with a binary search you get ur log(n). It has aswell a remove and insert time of log(n) and rebalancing is done in log(n) aswell.

帅冕 2024-09-24 07:13:06

正如前面的答案中提到的,如果不触及数据结构的每个元素,就无法找到中位数。如果您寻找的算法必须顺序执行,那么您能做的最好的就是 O(n)。确定性选择算法(中位数)或 BFPRT 算法将以 O(n) 的最坏情况解决问题。您可以在这里找到更多相关信息: http://en.wikipedia.org/wiki/Selection_algorithm #Linear_general_selection_algorithm_-_Median_of_Medians_algorithm

然而,中值中值算法的运行速度可以比 O(n) 更快,从而使其并行。由于其分而治之的性质,该算法可以“轻松”并行化。例如,将输入数组划分为 5 个元素时,您可能会为每个子数组启动一个线程,对其进行排序并找到该线程内的中位数。当此步骤完成时,线程将被连接,并且算法将使用新形成的中值数组再次运行。

请注意,这种设计仅对非常大的数据集有益。生成线程和合并它们的额外开销使得它对于较小的集合不可行。这有一些见解:http://www.umiacs。 umd.edu/research/EXPAR/papers/3494/node18.html

请注意,您可以找到渐近更快的算法,但它们对于日常使用来说不够实用。最好的选择是已经提到的顺序中位数算法。

As mentioned in previous answers, there is no way to find the median without touching every element of the data structure. If the algorithm you look for must be executed sequentially, then the best you can do is O(n). The deterministic selection algorithm (median-of-medians) or BFPRT algorithm will solve the problem with a worst case of O(n). You can find more about that here: http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm

However, the median of medians algorithm can be made to run faster than O(n) making it parallel. Due to it's divide and conquer nature, the algorithm can be "easily" made parallel. For instance, when dividing the input array in elements of 5, you could potentially launch a thread for each sub-array, sort it and find the median within that thread. When this step finished the threads are joined and the algorithm is run again with the newly formed array of medians.

Note that such design would only be beneficial in really large data sets. The additional overhead that spawning threads has and merging them makes it unfeasible for smaller sets. This has a bit of insight: http://www.umiacs.umd.edu/research/EXPAR/papers/3494/node18.html

Note that you can find asymptotically faster algorithms out there, however they are not practical enough for daily use. Your best bet is the already mentioned sequential median-of-medians algorithm.

于我来说 2024-09-24 07:13:06

当然,尤达大师的随机算法与其他算法一样,最小复杂度为 n,预期复杂度为 n(不是 log n),最大复杂度为 n 平方(如快速排序)。还是很不错的。

实际上,“随机”主元选择有时可能是固定位置(不涉及 RNG),因为已知初始数组元素足够随机(例如不同值的随机排列,或独立且同分布)或从输入值的近似或确切已知分布。

Master Yoda's randomized algorithm has, of course, a minimum complexity of n like any other, an expected complexity of n (not log n) and a maximum complexity of n squared like Quicksort. It's still very good.

In practice, the "random" pivot choice might sometimes be a fixed location (without involving a RNG) because the initial array elements are known to be random enough (e.g. a random permutation of distinct values, or independent and identically distributed) or deduced from an approximate or exactly known distribution of input values.

铃予 2024-09-24 07:13:06

我知道一种随机算法,其时间复杂度预计为 O(n)。

算法如下:

输入:n 个数字的数组 A[1...n] [不失一般性,我们可以假设 n 是偶数]

输出:排序数组中的第 n/2 个元素。

算法 ( A[1..n] , k = n/2):

从 1...n 中随机选择一个主元 - p

将数组分为 2 部分:

L - 元素 <= A[p]

R -具有元素> A[p]

if(n/2 == |L|) A[|L| + 1] 是中值停止

if( n/2 < |L|) 对 (L, k) 进行递归

,否则对 (R, k - (|L| + 1) 进行递归

复杂性:
在)
证明都是数学的。一页长。如果你有兴趣请联系我。

I know one randomize algorithm with time complexity of O(n) in expectation.

Here is the algorithm:

Input: array of n numbers A[1...n] [without loss of generality we can assume n is even]

Output: n/2th element in the sorted array.

Algorithm ( A[1..n] , k = n/2):

Pick a pivot - p universally at random from 1...n

Divided array into 2 parts:

L - having element <= A[p]

R - having element > A[p]

if(n/2 == |L|) A[|L| + 1] is the median stop

if( n/2 < |L|) re-curse on (L, k)

else re-curse on (R, k - (|L| + 1)

Complexity:
O( n)
proof is all mathematical. One page long. If you are interested ping me.

彩虹直至黑白 2024-09-24 07:13:06

扩展 rwong 的答案:这是一个示例代码

// partial_sort example
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;


int main () {
  int myints[] = {9,8,7,6,5,4,3,2,1};
  vector<int> myvector (myints, myints+9);
  vector<int>::iterator it;

  partial_sort (myvector.begin(), myvector.begin()+5, myvector.end());

  // print out content:
  cout << "myvector contains:";
  for (it=myvector.begin(); it!=myvector.end(); ++it)
    cout << " " << *it;

  cout << endl;

  return 0;
}

输出:
myvector 包含: 1 2 3 4 5 9 8 7 6

中间的元素将是中位数。

To expand on rwong's answer: Here is an example code

// partial_sort example
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;


int main () {
  int myints[] = {9,8,7,6,5,4,3,2,1};
  vector<int> myvector (myints, myints+9);
  vector<int>::iterator it;

  partial_sort (myvector.begin(), myvector.begin()+5, myvector.end());

  // print out content:
  cout << "myvector contains:";
  for (it=myvector.begin(); it!=myvector.end(); ++it)
    cout << " " << *it;

  cout << endl;

  return 0;
}

Output:
myvector contains: 1 2 3 4 5 9 8 7 6

The element in the middle would be the median.

蒗幽 2024-09-24 07:13:05

如果集合已排序,则查找中位数需要 O(1) 项检索。如果项目的顺序是任意的,那么在不检查大多数项目的情况下就不可能确定中位数。如果检查了大多数(但不是全部)项目,则可以保证中位数在某个范围内(如果列表包含重复项,则上限和下限可能匹配),但检查大多数项目列表中的项目意味着 O(n) 项检索。

如果集合中的信息未完全排序,但已知某些排序关系,则检索项所需的时间可能介于 O(1) 和 O(n) 之间,具体取决于已知排序的性质关系。

If the set is sorted, finding the median requires O(1) item retrievals. If the items are in arbitrary sequence, it will not be possible to identify the median with certainty without examining the majority of the items. If one has examined most, but not all, of the items, that will allow one to guarantee that the median will be within some range [if the list contains duplicates, the upper and lower bounds may match], but examining the majority of the items in a list implies O(n) item retrievals.

If one has the information in a collection which is not fully ordered, but where certain ordering relationships are known, then the time required may require anywhere between O(1) and O(n) item retrievals, depending upon the nature of the known ordering relation.

就像说晚安 2024-09-24 07:13:05

对于未排序的列表,重复执行O(n)部分排序,直到位于中间位置的元素是已知的。不过,这至少是O(n)

有关于正在排序的元素的任何信息吗?

For unsorted lists, repeatedly do O(n) partial sort until the element located at the median position is known. This is at least O(n), though.

Is there any information about the elements being sorted?

临走之时 2024-09-24 07:13:05

对于一般的、未排序的集合,不可能在 O(n) 时间内可靠地找到中值。您可以在 O(1) 中找到排序集的中位数,或者您可以在 O(n log n) 时间内自行对集合进行简单排序,然后在 O(1) 中找到中位数,给出 O(n logn n)算法。或者,最后,还有更聪明的中值选择算法,可以通过分区而不是排序来工作,并产生 O(n) 性能。

但是,如果该集合没有特殊属性并且不允许您进行任何预处理步骤,那么您将永远不会低于 O(n),因为您需要至少检查所有元素一次以确保您的中位数是正确的。

For a general, unsorted set, it is impossible to reliably find the median in better than O(n) time. You can find the median of a sorted set in O(1), or you can trivially sort the set yourself in O(n log n) time and then find the median in O(1), giving an O(n logn n) algorithm. Or, finally, there are more clever median selection algorithms that can work by partitioning instead of sorting and yield O(n) performance.

But if the set has no special properties and you are not allowed any pre-processing step, you will never get below O(n) by the simple fact that you will need to examine all of the elements at least once to ensure that your median is correct.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文