选择满足一定属性的随机数组元素
假设我有一个名为 elements
的列表,其中每个列表满足或不满足某个布尔属性 p
。 我想通过均匀分布随机选择满足 p
的元素之一。 我事先不知道有多少项满足此属性p
。
下面的代码会执行此操作吗?:(
pickRandElement(elements, p)
randElement = null
count = 0
foreach element in elements
if (p(element))
count = count + 1
if (randInt(count) == 0)
randElement = element
return randElement
randInt(n)
返回一个随机 int k
,其中 0 <= k
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
它在数学上是有效的。 可以用归纳法证明。
显然适用于满足 p 的 n = 1 元素。
对于n+1个元素,我们会以1/(n+1)的概率选择元素n+1,所以它的概率是可以的。 但这如何影响选择先验 n 个元素之一的最终概率呢?
每个先验 n 都有机会以 1/n 的概率被选择,直到找到元素 n+1。 现在,找到 n+1 后,有 1/(n+1) 的机会选择元素 n+1,因此有 /(n+1) 的机会先前选择的元素仍然被选择。 这意味着在 n+1 次找到后它被选中的最终概率是 1/n * (n/n+1) = 1/n+1——这是我们希望所有 n+1 个元素均匀分布的概率。
如果它适用于 n = 1,并且它适用于给定 n 的 n+1,那么它适用于所有 n。
It works mathematically. Can be proven by induction.
Clearly works for n = 1 element satisfying p.
For n+1 elements, we will choose the element n+1 with probability 1/(n+1), so its probability is OK. But how does that effect the end probability of choosing one of the prior n elements?
Each of the prior n had a chance of being selected with probability 1/n, until we found element n+1. Now, after finding n+1, there is a 1/(n+1) chance that element n+1 is chosen, so there is a n/(n+1) chance that the previously chosen element remains chosen. Which means its final probability of being the chosen after n+1 finds is 1/n * (n/n+1) = 1/n+1 -- which is the probability we want for all n+1 elements for uniform distribution.
If it works for n = 1, and it works for n+1 given n, then it works for all n.
是的,我相信是这样。
当你第一次遇到匹配的元素时,你肯定会选择它。 下一次,您以 1/2 的概率选择新值,因此两个元素中的每一个都有相同的机会。 接下来,您以 1/3 的概率选择新值,而其他每个元素的概率也为 1/2 * 2/3 = 1/3。
我试图找到一篇关于此策略的维基百科文章,但到目前为止失败了......
请注意,更一般地说,您只是从未知长度的序列中挑选一个随机样本。 您的序列恰好是通过获取初始序列并对其进行过滤而生成的,但该算法根本不需要该部分。
我以为我在 MoreLINQ 中有一个 LINQ 运算符来执行此操作,但我在存储库中找不到它...编辑:幸运的是,它仍然存在于 这个答案:
Yes, I believe so.
The first time you encounter a matching element, you definitely pick it. The next time, you pick the new value with a probability of 1/2, so each of the two elements have an equal chance. The following time, you pick the new value with a probability of 1/3, leaving each of the other elements with a probability of 1/2 * 2/3 = 1/3 as well.
I'm trying to find a Wikipedia article about this strategy, but failing so far...
Note that more generally, you're just picking a random sample out of a sequence of unknown length. Your sequence happens to be generated by taking an initial sequence and filtering it, but the algorithm doesn't require that part at all.
I thought I'd got a LINQ operator in MoreLINQ to do this, but I can't find it in the repository... EDIT: Fortunately, it still exists from this answer:
在编程实践中,第 14 页。 70、(马尔可夫链算法)有一个类似的算法:
In The Practice of Programming, pg. 70, (The Markov Chain Algorithm) there is a similar algorithm for that:
decowboy 有一个很好的证据,证明这适用于 TopCoder
decowboy has a nice proof that this works on TopCoder
为了清楚起见,我会尝试:
对我来说,这使得你想要做的事情更加清晰,并且是自我记录的。 最重要的是,它更简单、更优雅,而且现在很明显,每个都会以均匀分布的方式被选择。
For clarity's sake, I would try:
To me, that makes it MUCH clearer what you're trying to do and is self-documenting. On top of that, it's simpler and more elegant, and it's now obvious that each will be picked with an even distribution.