r中的递归采样

发布于 2024-11-09 22:15:20 字数 369 浏览 5 评论 0原文

我正在尝试用累积概率模拟 7 年以上的死亡，如下所示：

tab <- data.frame(id=1:1000,char=rnorm(1000,7,4))

cum.prob <- c(0.05,0.07,0.08,0.09,0.1,0.11,0.12)

如何根据 cum.prob 中的累积概率以矢量化方式从 tab$id 进行采样而不进行替换？从第 1 年采样的 id 不一定会在第 2 年再次采样。因此 lapply(cum.prob,function(x) sample(tab$id,x*1000)) 将不起作用。是否可以对其进行矢量化？

//M

原文

I´m trying to simulate death over 7 years with the cumulative probability as follows:

tab <- data.frame(id=1:1000,char=rnorm(1000,7,4))

cum.prob <- c(0.05,0.07,0.08,0.09,0.1,0.11,0.12)

How can I sample from tab$id without replacement in a vectorized fashion according to the cumulative probability in cum.prob ? The ids sampled from yr 1 can necessarily not be sampled again in yr 2. Hence the lapply(cum.prob,function(x) sample(tab$id,x*1000)) will not work. Is it possible to vectorize this?

//M

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

七七 2024-11-16 22:15:20

一种方法是：首先获取给定个体在给定年份死亡的概率为 probYrDeath，即 probYrDeath[i] = Prob( individual dies in Year i )，其中i=1,2,...,7。

probYrDeath <- c(diff(c(0,cum.prob)).

现在根据 probYrDeath 中的概率，从序列 1:8 中生成 1000 个“死亡年份”的随机样本，并通过第 7 年未死亡的概率进行增强：

set.seed(1) ## for reproducibility
tab$DeathYr <- sample( 8, 1000, replace = TRUE, 
                       prob = c(probYrDeath, 1-sum(probYrDeath)))

我们解释“ 'DeathYr = 8'”为“7年内不死亡”，并提取tab的子集，其中DeathYr != 8：

tab_sample <- subset(tab, DeathYr != 8 )

可以验证累计死亡比例每年的近似值在 cum.prob 中：

> cumsum(table(tab_sample$DeathYr)/1000)
    1     2     3     4     5     6     7 
0.045 0.071 0.080 0.094 0.105 0.115 0.124

Here's one way: First get the probability of a given individual's dying in a given year as probYrDeath, i.e. probYrDeath[i] = Prob( individual dies in year i ), where i=1,2,...,7.

probYrDeath <- c(diff(c(0,cum.prob)).

Now generate a random sample of 1000 "Death Years", with replacement, from the sequence 1:8, according to the probabilities in probYrDeath, augmented by the probability of not dying by year 7:

set.seed(1) ## for reproducibility
tab$DeathYr <- sample( 8, 1000, replace = TRUE, 
                       prob = c(probYrDeath, 1-sum(probYrDeath)))

We interpret "'DeathYr = 8'" as "not dying within 7 years", and extract the subset of tab where DeathYr != 8:

tab_sample <- subset(tab, DeathYr != 8 )

You can verify that the cumulative proportions of deaths in each year approximate the values in cum.prob:

> cumsum(table(tab_sample$DeathYr)/1000)
    1     2     3     4     5     6     7 
0.045 0.071 0.080 0.094 0.105 0.115 0.124

回复收藏 0 原文

极致的悲 2024-11-16 22:15:20

这对您有用吗：

prob.death.per.year<-c(1-cum.prob[length(cum.prob)], cum.prob - c(0, cum.prob[-length(cum.prob)]))
dead.in.years<-as.vector(rmultinom(1, length(tab$id),prob.death.per.year))[-1]
totsamp<-sum(dead.in.years)
data.frame(id=sample(tab$id, totsamp), dead.after=rep(seq_along(dead.in.years), dead.in.years))

根据您想要的结果形式，您可以更改最后一步。

Does this work for you:

prob.death.per.year<-c(1-cum.prob[length(cum.prob)], cum.prob - c(0, cum.prob[-length(cum.prob)]))
dead.in.years<-as.vector(rmultinom(1, length(tab$id),prob.death.per.year))[-1]
totsamp<-sum(dead.in.years)
data.frame(id=sample(tab$id, totsamp), dead.after=rep(seq_along(dead.in.years), dead.in.years))

Depending upon which form you want the result in, you can change the last step.

回复收藏 0 原文

~没有更多了~