在有限空间中找到中位数的概率
这是 StackOverflow 问题的衍生内容。
假设您有固定数量的存储位置和两个计数器的空间。您将收到随机顺序的 n 个项目(n 个项目的所有排列可能性相同)。收到每件物品后,您可以将其存储在 k 个位置之一(丢弃之前存储的值之一),或丢弃该物品。您还可以递增或递减任一计数器。任何丢弃的物品都无法找回。
问题是
- 最大化找到准确中位数的概率的策略是什么?
- 这个概率是多少?
显然,如果k > n/2,我们可以找到中位数。一般来说,尝试保持丢弃的高值的数量等于丢弃的低值的数量的相同策略似乎应该是最佳的,但我不确定如何证明它,也不知道如何计算它找到的概率中位数。
同样有趣的是我们不知道n但知道n的概率分布的情况。
编辑:现在假设这些值是不同的(没有重复)。但是,如果您也可以解决非不同的情况,那么这将令人印象深刻。
This is a spin off of this StackOverflow question.
Assume that you have a fixed number k of storage locations, and space for two counters. You will receive n items in random order (all permutations of the n items are equally likely). After receiving each item you can either store it in one of the k locations (discarding one of the previously stored values), or discard the item. You can also increment or decrement either of the counters. Any discarded item cannot be retrieved.
The questions are
- What is the strategy that maximizes your probability of finding the exact median?
- What is that probability?
Obviously, if k > n/2, we can find the median. In general it seems that the same strategy of attempting to keep the number of discarded high values equal to the number of discarded low values should be optimal, but I am not sure how to prove it, nor how to figure out the probability that it finds the median.
Also of interest is the case where we do not know n but know the probability distribution of n.
Edit: Assume for now that the values are distinct (no duplicates.) However, if you can solve the non distinct case as well, then that would be impressive.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一个疯狂的猜测:丢弃距离当前存储值的平均值最远的元素。
如果值的分布是多模态的并且我们首先从非主导模态获取值,则与当前中位数进行比较不起作用。
A wild guess: discard the element that is farthest from the mean of the currently stored values.
Comparing to the current median doesn't work if the distribution of values is multi-modal and we get values from a non-dominant mode first.
Munro 和 Paterson 在他们的论文有限存储的选择和排序< /a>.他们表明,您的算法需要 k = Ω(√n) 才能以恒定概率成功,并且通过利用一维随机游走的基本结果,这是渐近最优的。
如果我想证明绝对最优性,我会尝试的第一件事是考虑任意算法 A,然后将其执行与算法 A' 耦合,当 A 第一次偏离您的算法时,你的算法会这样做,然后尝试尽可能地遵循 A 。
Munro and Paterson studied essentially this problem in their paper Selection and sorting with limited storage. They show that your algorithm requires k = Ω(√n) to succeed with constant probability and that this is asymptotically optimal by appealing to basic results about one-dimensional random walks.
If I wanted to prove absolute optimality, the first thing I would try would be to consider an arbitrary algorithm A and then couple its execution with an algorithm A' that, the first time A deviates from your algorithm, does your algorithm would do instead and then attempts to follow A as closely as it can.