分配由概率分布通知的特定数量的值(在 R 中)
您好,提前感谢您的帮助!
我正在尝试生成一个向量,该向量具有根据概率分布分配的特定数量的值。例如,我想要一个长度为 31 的向量,包含 26 个零和 5 个一。 (向量的总和应始终为 5。)但是,向量的位置很重要。为了确定哪些值应该为 1,哪些值应该为 0,我有一个概率向量(长度为 31),如下所示:
probs<-c(0.01,0.02,0.01,0.02,0.01,0.01,0.01,0.04,0.01,0.01,0.12,0.01,0.02,0.01,
0.14,0.06,0.01,0.01,0.01,0.01,0.01,0.14,0.01,0.07,0.01,0.01,0.04,0.08,0.01,0.02,0.01)
我可以根据此分布选择值并使用 rbinom 获得长度为 31 的向量,但是我无法恰好选择五个值。
Inv=rbinom(length(probs),1,probs)
Inv
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0
有什么想法吗?
再次感谢!
Hello and thanks in advance for the help!
I am trying to generate a vector with a specific number of values that are assigned according to a probability distribution. For example, I want a vector of length 31, contained 26 zeroes and 5 ones. (The total sum of the vector should always be five.) However, the location of the ones is important. And to identify which values should be one and which should be zero, I have a vector of probabilities (length 31), which looks like this:
probs<-c(0.01,0.02,0.01,0.02,0.01,0.01,0.01,0.04,0.01,0.01,0.12,0.01,0.02,0.01,
0.14,0.06,0.01,0.01,0.01,0.01,0.01,0.14,0.01,0.07,0.01,0.01,0.04,0.08,0.01,0.02,0.01)
I can select values according to this distribution and get a vector of length 31 using rbinom, but I can't select exactly five values.
Inv=rbinom(length(probs),1,probs)
Inv
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0
Any ideas?
Thanks again!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
仅使用加权
sample.int
来选择位置怎么样?How about just using a weighted
sample.int
to select the locations?Chase 提供了一个很好的答案,并提到了 while() 迭代失控的问题。失控
while()
的问题之一是,如果您一次进行一次尝试,则需要多次尝试才能找到一个与1
的目标数量匹配,您会产生 t 次调用主函数(在本例中为rbinom()
)的开销。然而,还有一个出路,因为 rbinom() 与 R 中所有这些(伪)随机数生成器一样,是矢量化的,我们可以在时间并检查这些m次试验是否符合5
1
s的要求。如果没有找到,我们会重复进行m次试验,直到找到一个符合要求的试验。这个想法在下面的函数foo()
中实现。chunkSize
参数是 m,即一次绘制的试验次数。我还借此机会允许该功能找到多个保形试验;参数n
控制返回多少个保形试验。它的工作原理如下:
请注意,在
n == 1
的情况下,我删除了空维度。如果您不想要此功能,请将最后一个if
代码块注释掉。您需要平衡
chunkSize
的大小与一次检查多个试验的计算负担。如果要求(此处为 51
s)不太可能实现,则增加chunkSize
以便减少对rbinom()
的调用。如果可能有这种要求,那么如果您只需要一两次,则每次抽奖试验的点数很少,而chunkSize
则较大,因为您必须评估每次试抽。Chase provides a great answer and mentions the problem of the run-away
while()
iteration. One of the problems with a run-awaywhile()
is that if you do this one trial at a time, and it takes many, say t, trials to find one that matches the target number of1
s, you incur the overhead of t calls to the main function,rbinom()
in this case.There is a way out, however, because
rbinom()
, like all of these (pseudo)random number generators in R, is vectorised, we can generate m trials at a time and check those m trials for conformance to the requirements of 51
s. If none are found, we repeatedly draw m trials until we find one that does conform. This idea is implemented in the functionfoo()
below. ThechunkSize
argument is m, the number of trials to draw at a time. I also took the opportunity to allow the function to find more than a single conformal trial; argumentn
controls how many conformal trials to return.It works like this:
Note that I drop the empty dimension in the case where
n == 1
. Comment the lastif
code chunk out if you don't want this feature.You need to balance the size of
chunkSize
with the computational burden of checking that many trials at a time. If the requirement (here 51
s) is very unlikely, then increasechunkSize
so you incur fewer calls torbinom()
. If the requirement is likely, there is little point drawing trials and largechunkSize
at a time if you only want one or two as you have to evaluate each trial draw.我认为您想使用一组给定的概率从二项式分布中重新采样,直到达到目标值 5,对吗?如果是这样,那么我认为这就是你想要的。
while
循环可用于迭代,直到满足条件。如果你提供非常不切实际的概率和目标值,我想它可能会变成一个失控函数,所以请考虑自己被警告:)FOO(probs, target = 5)
I think you want to resample from the binomial distribution with a given set of probabilities until you hit your target value of 5, is that right? If so, then I think this does what you want. A
while
loop can be used to iterate until the condition is met. If you feed very unrealistic probabilites and target values, I guess it could turn into a run-away function, so consider yourself warned :)FOO(probs, target = 5)