选择满足一定属性的随机数组元素

发布于 2024-07-24 09:18:49 字数 572 浏览 0 评论 0 原文

假设我有一个名为 elements 的列表,其中每个列表满足或不满足某个布尔属性 p。 我想通过均匀分布随机选择满足 p 的元素之一。 我事先不知道有多少项满足此属性p

下面的代码会执行此操作吗?:(

pickRandElement(elements, p)
     randElement = null
     count = 0
     foreach element in elements
          if (p(element))
               count = count + 1
               if (randInt(count) == 0)
                    randElement = element

     return randElement

randInt(n) 返回一个随机 int k,其中 0 <= k 。)

Suppose I have a list, called elements, each of which does or does not satisfy some boolean property p. I want to choose one of the elements that satisfies p by random with uniform distribution. I do not know ahead of time how many items satisfy this property p.

Will the following code do this?:

pickRandElement(elements, p)
     randElement = null
     count = 0
     foreach element in elements
          if (p(element))
               count = count + 1
               if (randInt(count) == 0)
                    randElement = element

     return randElement

(randInt(n) returns a random int k with 0 <= k < n.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

幸福%小乖 2024-07-31 09:18:49

它在数学上是有效的。 可以用归纳法证明。

显然适用于满足 p 的 n = 1 元素。

对于n+1个元素,我们会以1/(n+1)的概率选择元素n+1,所以它的概率是可以的。 但这如何影响选择先验 n 个元素之一的最终概率呢?

每个先验 n 都有机会以 1/n 的概率被选择,直到找到元素 n+1。 现在,找到 n+1 后,有 1/(n+1) 的机会选择元素 n+1,因此有 /(n+1) 的机会先前选择的元素仍然被选择。 这意味着在 n+1 次找到后它被选中的最终概率是 1/n * (n/n+1) = 1/n+1——这是我们希望所有 n+1 个元素均匀分布的概率。

如果它适用于 n = 1,并且它适用于给定 n 的 n+1,那么它适用于所有 n。

It works mathematically. Can be proven by induction.

Clearly works for n = 1 element satisfying p.

For n+1 elements, we will choose the element n+1 with probability 1/(n+1), so its probability is OK. But how does that effect the end probability of choosing one of the prior n elements?

Each of the prior n had a chance of being selected with probability 1/n, until we found element n+1. Now, after finding n+1, there is a 1/(n+1) chance that element n+1 is chosen, so there is a n/(n+1) chance that the previously chosen element remains chosen. Which means its final probability of being the chosen after n+1 finds is 1/n * (n/n+1) = 1/n+1 -- which is the probability we want for all n+1 elements for uniform distribution.

If it works for n = 1, and it works for n+1 given n, then it works for all n.

柒夜笙歌凉 2024-07-31 09:18:49

是的,我相信是这样。

当你第一次遇到匹配的元素时,你肯定会选择它。 下一次,您以 1/2 的概率选择新值,因此两个元素中的每一个都有相同的机会。 接下来,您以 1/3 的概率选择新值,而其他每个元素的概率也为 1/2 * 2/3 = 1/3。

我试图找到一篇关于此策略的维基百科文章,但到目前为止失败了......

请注意,更一般地说,您只是从未知长度的序列中挑选一个随机样本。 您的序列恰好是通过获取初始序列并对其进行过滤而生成的,但该算法根本不需要该部分。

我以为我在 MoreLINQ 中有一个 LINQ 运算符来执行此操作,但我在存储库中找不到它...编辑:幸运的是,它仍然存在于 这个答案

public static T RandomElement<T>(this IEnumerable<T> source,
                                 Random rng)
{
    T current = default(T);
    int count = 0;
    foreach (T element in source)
    {
        count++;
        if (rng.Next(count) == 0)
        {
            current = element;
        }            
    }
    if (count == 0)
    {
        throw new InvalidOperationException("Sequence was empty");
    }
    return current;
}

Yes, I believe so.

The first time you encounter a matching element, you definitely pick it. The next time, you pick the new value with a probability of 1/2, so each of the two elements have an equal chance. The following time, you pick the new value with a probability of 1/3, leaving each of the other elements with a probability of 1/2 * 2/3 = 1/3 as well.

I'm trying to find a Wikipedia article about this strategy, but failing so far...

Note that more generally, you're just picking a random sample out of a sequence of unknown length. Your sequence happens to be generated by taking an initial sequence and filtering it, but the algorithm doesn't require that part at all.

I thought I'd got a LINQ operator in MoreLINQ to do this, but I can't find it in the repository... EDIT: Fortunately, it still exists from this answer:

public static T RandomElement<T>(this IEnumerable<T> source,
                                 Random rng)
{
    T current = default(T);
    int count = 0;
    foreach (T element in source)
    {
        count++;
        if (rng.Next(count) == 0)
        {
            current = element;
        }            
    }
    if (count == 0)
    {
        throw new InvalidOperationException("Sequence was empty");
    }
    return current;
}
香橙ぽ 2024-07-31 09:18:49

编程实践中,第 14 页。 70、(马尔可夫链算法)有一个类似的算法:

[...]
  nmatch = 0;
  for ( /* iterate list */ )
    if (rand() % ++nmatch == 0) /* prob = 1/nmatch */
      w = suf->word;
[...]

“注意选择一个的算法
当我们不知道如何时随机项目
有很多项目。 变量
nmatch 将匹配数计算为
列表被扫描。 表达式

rand() % ++nmatch == 0 
  

递增nmatch,然后为真
概率为 1/n 匹配。”

In The Practice of Programming, pg. 70, (The Markov Chain Algorithm) there is a similar algorithm for that:

[...]
  nmatch = 0;
  for ( /* iterate list */ )
    if (rand() % ++nmatch == 0) /* prob = 1/nmatch */
      w = suf->word;
[...]

"Notice the algorithm for selecting one
item at random when we don't know how
many items there are. The variable
nmatch counts the number of matches as
the list is scanned. The expression

rand() % ++nmatch == 0

increments nmatch and is then true
with probability 1/nmatch."

剑心龙吟 2024-07-31 09:18:49

decowboy 有一个很好的证据,证明这适用于 TopCoder

decowboy has a nice proof that this works on TopCoder

半寸时光 2024-07-31 09:18:49

为了清楚起见,我会尝试:

pickRandElement(elements, p)
     OrderedCollection coll = new OrderedCollection
     foreach element in elements
          if (p(element))
               coll.add(element)
     if (coll.size == 0) return null
     else return coll.get(randInt(coll.size))

对我来说,这使得你想要做的事情更加清晰,并且是自我记录的。 最重要的是,它更简单、更优雅,而且现在很明显,每个都会以均匀分布的方式被选择。

For clarity's sake, I would try:

pickRandElement(elements, p)
     OrderedCollection coll = new OrderedCollection
     foreach element in elements
          if (p(element))
               coll.add(element)
     if (coll.size == 0) return null
     else return coll.get(randInt(coll.size))

To me, that makes it MUCH clearer what you're trying to do and is self-documenting. On top of that, it's simpler and more elegant, and it's now obvious that each will be picked with an even distribution.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文