r中的递归采样
我正在尝试用累积概率模拟 7 年以上的死亡,如下所示:
tab <- data.frame(id=1:1000,char=rnorm(1000,7,4))
cum.prob <- c(0.05,0.07,0.08,0.09,0.1,0.11,0.12)
如何根据 cum.prob 中的累积概率以矢量化方式从 tab$id
进行采样而不进行替换?从第 1 年采样的 id 不一定会在第 2 年再次采样。因此 lapply(cum.prob,function(x) sample(tab$id,x*1000))
将不起作用。是否可以对其进行矢量化?
//M
I´m trying to simulate death over 7 years with the cumulative probability as follows:
tab <- data.frame(id=1:1000,char=rnorm(1000,7,4))
cum.prob <- c(0.05,0.07,0.08,0.09,0.1,0.11,0.12)
How can I sample from tab$id
without replacement in a vectorized fashion according to the cumulative probability in cum.prob
? The ids sampled from yr 1 can necessarily not be sampled again in yr 2. Hence the lapply(cum.prob,function(x) sample(tab$id,x*1000))
will not work. Is it possible to vectorize this?
//M
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一种方法是:首先获取给定个体在给定年份死亡的概率为 probYrDeath,即 probYrDeath[i] = Prob( individual dies in Year i ),其中
i=1,2,...,7
。现在根据
probYrDeath
中的概率,从序列 1:8 中生成 1000 个“死亡年份”的随机样本,并通过第 7 年未死亡的概率进行增强:我们解释“ 'DeathYr = 8'”为“7年内不死亡”,并提取
tab
的子集,其中DeathYr != 8
:可以验证累计死亡比例每年的近似值在
cum.prob
中:Here's one way: First get the probability of a given individual's dying in a given year as
probYrDeath
, i.e.probYrDeath[i] = Prob( individual dies in year i )
, wherei=1,2,...,7
.Now generate a random sample of 1000 "Death Years", with replacement, from the sequence 1:8, according to the probabilities in
probYrDeath
, augmented by the probability of not dying by year 7:We interpret "'DeathYr = 8'" as "not dying within 7 years", and extract the subset of
tab
whereDeathYr != 8
:You can verify that the cumulative proportions of deaths in each year approximate the values in
cum.prob
:这对您有用吗:
根据您想要的结果形式,您可以更改最后一步。
Does this work for you:
Depending upon which form you want the result in, you can change the last step.