以相同的概率从集合中选择数字

发布于 2024-12-21 13:48:47 字数 138 浏览 0 评论 0 原文

从 Steven Skiena 的算法设计手册中得到这个问题。

要求从给定的n个数字的集合S中选择k(给定值)个数字形成子集S',使得每个数字的选择概率相等(k/n)。 n 未知(我正在考虑将 S 作为一个链接列表)。 而且,我们只能通过集合S。

Got this question from algorithms design manual by Steven Skiena.

It is required to select k (value given) numbers to form a subset S' from a given set S having n numbers, such that selection probability for each number is equal (k/n).
n is unknown (i was thinking of taking S as a link-list for this).
also, we can have only pass through the set S.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

太阳男子 2024-12-28 13:48:47

n 未知时,您宁愿需要一个在线算法来进行所谓的储层采样

很好的解释&这里提供了证明草图 http://propersubset.com/2010/04/choosing-random-elements.html

我的意思是用Python实现的这个算法(取自上面的链接)

import random
def random_subset( iterator, K ):
    result = []
    N = 0

    for item in iterator:
        N += 1
        if len( result ) < K:
            result.append( item )
        else:
            s = int(random.random() * N)
            if s < K:
                result[ s ] = item

    return result

When n is unknown you'd rather need an on-line algorithm for so-called Reservoir sampling.

The good explanation & proof sketches are provided here http://propersubset.com/2010/04/choosing-random-elements.html

I mean this algorithm implemented in Python (taken from the link above)

import random
def random_subset( iterator, K ):
    result = []
    N = 0

    for item in iterator:
        N += 1
        if len( result ) < K:
            result.append( item )
        else:
            s = int(random.random() * N)
            if s < K:
                result[ s ] = item

    return result
ㄟ。诗瑗 2024-12-28 13:48:47

像这样的

for elem in S
  if random() < (k - S'.size)/S.size // This is float division
    S'.add(elem)

第一个元素以概率 k/n 选择,第二个元素以 (nk)/n * k/(n-1) + k/n * (k-1 )/(n-1) 减少为 k/n 等。

Something like this

for elem in S
  if random() < (k - S'.size)/S.size // This is float division
    S'.add(elem)

The first element is chosen with probability k/n, the second one with (n-k)/n * k/(n-1) + k/n * (k-1)/(n-1) which reduces to k/n, etc.

反差帅 2024-12-28 13:48:47

您应该选择一个能够真正模拟真实活动“从n个数字中随机选择k个数字”的算法。您的算法应该具有两个属性

(1)它必须最后返回k个数字。

(2)它必须真实地模拟目标活动的属性:每个数字都是以概率k/n选择的。

Oborok的答案是错误的,因为它没有第一个属性。

for i = 0 to n
randomly choose an integer number between [1,n-i+1]
if [randomValue <= (k - S'.size)/(S.size - i + 1)] then
    S'.add(S[i])

通过上述选择方案,每个数字都以 k/n 的概率被选择。通过证明下面的等式可以看出:

https://www.facebook.com/photo.php?fbid=677984275648191&l=7cafe5d468

You should choose an algorithm that can truly simulate the real activity "Randomly choose k numbers from n numbers".Your algorithm should has two properties

(1) It must return k numbers at end.

(2) It must truly simulate that properties of target activity : each number is selected with probability k/n.

Oboroks answer is wrong because it hasnt first property.

for i = 0 to n
randomly choose an integer number between [1,n-i+1]
if [randomValue <= (k - S'.size)/(S.size - i + 1)] then
    S'.add(S[i])

With above selecting plan,each number is selected with probability k/n.You can see that by prove below equation :

https://www.facebook.com/photo.php?fbid=677984275648191&l=7cafe5d468

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文