分配由概率分布通知的特定数量的值(在 R 中)

发布于 2024-11-28 04:59:04 字数 578 浏览 0 评论 0原文

您好,提前感谢您的帮助!

我正在尝试生成一个向量,该向量具有根据概率分布分配的特定数量的值。例如,我想要一个长度为 31 的向量,包含 26 个零和 5 个一。 (向量的总和应始终为 5。)但是,向量的位置很重要。为了确定哪些值应该为 1,哪些值应该为 0,我有一个概率向量(长度为 31),如下所示:

probs<-c(0.01,0.02,0.01,0.02,0.01,0.01,0.01,0.04,0.01,0.01,0.12,0.01,0.02,0.01,
0.14,0.06,0.01,0.01,0.01,0.01,0.01,0.14,0.01,0.07,0.01,0.01,0.04,0.08,0.01,0.02,0.01)

我可以根据此分布选择值并使用 rbinom 获得长度为 31 的向量,但是我无法恰好选择五个值。

Inv=rbinom(length(probs),1,probs)
Inv
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0

有什么想法吗?

再次感谢!

Hello and thanks in advance for the help!

I am trying to generate a vector with a specific number of values that are assigned according to a probability distribution. For example, I want a vector of length 31, contained 26 zeroes and 5 ones. (The total sum of the vector should always be five.) However, the location of the ones is important. And to identify which values should be one and which should be zero, I have a vector of probabilities (length 31), which looks like this:

probs<-c(0.01,0.02,0.01,0.02,0.01,0.01,0.01,0.04,0.01,0.01,0.12,0.01,0.02,0.01,
0.14,0.06,0.01,0.01,0.01,0.01,0.01,0.14,0.01,0.07,0.01,0.01,0.04,0.08,0.01,0.02,0.01)

I can select values according to this distribution and get a vector of length 31 using rbinom, but I can't select exactly five values.

Inv=rbinom(length(probs),1,probs)
Inv
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0

Any ideas?

Thanks again!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

时光与爱终年不遇 2024-12-05 04:59:04

仅使用加权 sample.int 来选择位置怎么样?

Inv<-integer(31)
Inv[sample.int(31,5,prob=probs)]<-1
Inv
[1] 0 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0

How about just using a weighted sample.int to select the locations?

Inv<-integer(31)
Inv[sample.int(31,5,prob=probs)]<-1
Inv
[1] 0 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
爺獨霸怡葒院 2024-12-05 04:59:04

Chase 提供了一个很好的答案,并提到了 while() 迭代失控的问题。失控 while() 的问题之一是,如果您一次进行一次尝试,则需要多次尝试才能找到一个与 1 的目标数量匹配,您会产生 t 次调用主函数(在本例中为 rbinom())的开销。

然而,还有一个出路,因为 rbinom() 与 R 中所有这些(伪)随机数生成器一样,是矢量化的,我们可以在时间并检查这些m次试验是否符合51s的要求。如果没有找到,我们会重复进行m次试验,直到找到一个符合要求的试验。这个想法在下面的函数 foo() 中实现。 chunkSize 参数是 m,即一次绘制的试验次数。我还借此机会允许该功能找到多个保形试验;参数n 控制返回多少个保形试验。

foo <- function(probs, target, n = 1, chunkSize = 100) {
    len <- length(probs)
    out <- matrix(ncol = len, nrow = 0) ## return object
    ## draw chunkSize trials
    trial <- matrix(rbinom(len * chunkSize, 1, probs),
                    ncol = len, byrow = TRUE)
    rs <- rowSums(trial)  ## How manys `1`s
    ok <- which(rs == 5L) ## which meet the `target`
    found <- length(ok)   ## how many meet the target
    if(found > 0)         ## if we found some, add them to out
        out <- rbind(out,
                     trial[ok, , drop = FALSE][seq_len(min(n,found)), , 
                                               drop = FALSE])
    ## if we haven't found enough, repeat the whole thing until we do
    while(found < n) {
        trial <- matrix(rbinom(len * chunkSize, 1, probs),
                            ncol = len, byrow = TRUE)
        rs <- rowSums(trial)
        ok <- which(rs == 5L)
        New <- length(ok)
        if(New > 0) {
            found <- found + New
            out <- rbind(out, trial[ok, , drop = FALSE][seq_len(min(n, New)), , 
                                                        drop = FALSE])
        }
    }
    if(n == 1L)           ## comment this, and
        out <- drop(out)  ## this if you don't want dimension dropping
    out
}

它的工作原理如下:

> set.seed(1)
> foo(probs, target = 5)
 [1] 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0
[31] 0
> foo(probs, target = 5, n = 2)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,]    0    0    0    0    0    0    0    0    0     0     0
[2,]    0    0    0    0    0    0    0    0    0     0     1
     [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21]
[1,]     0     0     0     1     1     0     0     0     0     0
[2,]     0     1     0     0     1     0     0     0     0     0
     [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31]
[1,]     1     0     1     0     0     0     1     0     0     0
[2,]     1     0     1     0     0     0     0     0     0     0

请注意,在 n == 1 的情况下,我删除了空维度。如果您不想要此功能,请将最后一个 if 代码块注释掉。

您需要平衡 chunkSize 的大小与一次检查多个试验的计算负担。如果要求(此处为 5 1s)不太可能实现,则增加 chunkSize 以便减少对 rbinom() 的调用。如果可能有这种要求,那么如果您只需要一两次,则每次抽奖试验的点数很少,而 chunkSize 则较大,因为您必须评估每次试抽。

Chase provides a great answer and mentions the problem of the run-away while() iteration. One of the problems with a run-away while() is that if you do this one trial at a time, and it takes many, say t, trials to find one that matches the target number of 1s, you incur the overhead of t calls to the main function, rbinom() in this case.

There is a way out, however, because rbinom(), like all of these (pseudo)random number generators in R, is vectorised, we can generate m trials at a time and check those m trials for conformance to the requirements of 5 1s. If none are found, we repeatedly draw m trials until we find one that does conform. This idea is implemented in the function foo() below. The chunkSize argument is m, the number of trials to draw at a time. I also took the opportunity to allow the function to find more than a single conformal trial; argument n controls how many conformal trials to return.

foo <- function(probs, target, n = 1, chunkSize = 100) {
    len <- length(probs)
    out <- matrix(ncol = len, nrow = 0) ## return object
    ## draw chunkSize trials
    trial <- matrix(rbinom(len * chunkSize, 1, probs),
                    ncol = len, byrow = TRUE)
    rs <- rowSums(trial)  ## How manys `1`s
    ok <- which(rs == 5L) ## which meet the `target`
    found <- length(ok)   ## how many meet the target
    if(found > 0)         ## if we found some, add them to out
        out <- rbind(out,
                     trial[ok, , drop = FALSE][seq_len(min(n,found)), , 
                                               drop = FALSE])
    ## if we haven't found enough, repeat the whole thing until we do
    while(found < n) {
        trial <- matrix(rbinom(len * chunkSize, 1, probs),
                            ncol = len, byrow = TRUE)
        rs <- rowSums(trial)
        ok <- which(rs == 5L)
        New <- length(ok)
        if(New > 0) {
            found <- found + New
            out <- rbind(out, trial[ok, , drop = FALSE][seq_len(min(n, New)), , 
                                                        drop = FALSE])
        }
    }
    if(n == 1L)           ## comment this, and
        out <- drop(out)  ## this if you don't want dimension dropping
    out
}

It works like this:

> set.seed(1)
> foo(probs, target = 5)
 [1] 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0
[31] 0
> foo(probs, target = 5, n = 2)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,]    0    0    0    0    0    0    0    0    0     0     0
[2,]    0    0    0    0    0    0    0    0    0     0     1
     [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21]
[1,]     0     0     0     1     1     0     0     0     0     0
[2,]     0     1     0     0     1     0     0     0     0     0
     [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31]
[1,]     1     0     1     0     0     0     1     0     0     0
[2,]     1     0     1     0     0     0     0     0     0     0

Note that I drop the empty dimension in the case where n == 1. Comment the last if code chunk out if you don't want this feature.

You need to balance the size of chunkSize with the computational burden of checking that many trials at a time. If the requirement (here 5 1s) is very unlikely, then increase chunkSize so you incur fewer calls to rbinom(). If the requirement is likely, there is little point drawing trials and large chunkSize at a time if you only want one or two as you have to evaluate each trial draw.

落叶缤纷 2024-12-05 04:59:04

我认为您想使用一组给定的概率从二项式分布中重新采样,直到达到目标值 5,对吗?如果是这样,那么我认为这就是你想要的。 while 循环可用于迭代,直到满足条件。如果你提供非常不切实际的概率和目标值,我想它可能会变成一个失控函数,所以请考虑自己被警告:)

FOO <- function(probs, target) {
  out <- rbinom(length(probs), 1, probs)

  while (sum(out) != target) {

    out <- rbinom(length(probs), 1, probs)
  }
  return(out)
}

FOO(probs, target = 5)

> FOO(probs, target = 5)  
 [1] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 1 0

I think you want to resample from the binomial distribution with a given set of probabilities until you hit your target value of 5, is that right? If so, then I think this does what you want. A while loop can be used to iterate until the condition is met. If you feed very unrealistic probabilites and target values, I guess it could turn into a run-away function, so consider yourself warned :)

FOO <- function(probs, target) {
  out <- rbinom(length(probs), 1, probs)

  while (sum(out) != target) {

    out <- rbinom(length(probs), 1, probs)
  }
  return(out)
}

FOO(probs, target = 5)

> FOO(probs, target = 5)  
 [1] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 1 0
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文