在R中与BRM的两倍差的后部分布

发布于 2025-02-09 14:44:38 字数 4608 浏览 1 评论 0原文

我想为两组的差异产生一个后部分布。我可以在R中的Jags中做到这一点，但是希望进入本世纪并与Stan复制。我正在使用BRMS。

我的问题是：

我是否正确地产生了效果大小的后验分布？
如果没有，我应该使用答案中建议的其他功能之一 a>？
我如何解释可能为负面的效果大小？我认为这需要先验使用非β，但不确定哪些是可取的。

以下代码指定了一个控制和干预组的控制和干预组事件率。它为每个组（步骤1），先前的分布（步骤2）生成beta分布，构建了二项式模型（步骤3）和后代（步骤4）。

我有理由相信我在上述步骤中所做的事情是正确的。对于AS_DRAWS_DF生成的值是什么，我有点不确定 - 即B_Intercept_p vs. B_GROUPINT_P。在这些论坛上的进一步阅读建议从b_intercept_p减去b_groupint_p;确实会产生一个可能是后间隔（绘制alt.graph对象时标记为new_p）的图，但是我在这一点上非常不确定，并感谢任何澄清。

# Setup
## Packages
library(tidyverse)
library(ProbBayes)
library(brms)
library(tidybayes)

## Options
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())


# ***
# Setup simulation
# ***

# Trial data
con.n = 150
con.event = 50
int.n = 150
int.event = 30


# Distributions
## x1 is the expected control event rate at probability p.x1
## x2 is the control event rate at probability p.x2
x1 = 0.4
p.x1 = 0.5
x2 = 0.6
p.x2 = 0.9

## As above, but for the difference between intervention and control group
y1 = 0.05
p.y1 = 0.5
y2 = 0.1
p.y2 = 0.9


# Model setup
## For brm()
n.chains = 3
n.iter = 4000
n.warmup = 500

## For beta.select(); based on model so the number of observations are equal
n.draws = n.iter - n.warmup



# ***
# Simulation starts here
# ***



# Define Functions
## Log transformation
fun_log.trans = function(x) {
  log.trans = log(x / (1 - x))
  log.trans
}


## Inverse log transformation
fun_invlog.trans = function(x) {
  invlog.trans = exp(x) / (1 + exp(x))
  invlog.trans
}



# Run things
# 0. Put data into dataframe
data = data.frame(group = c("int"    , "con"),
                  n =     c(int.n    , con.n),
                  event = c(int.event, con.event))


# 1. Generate prior distributions
## Beta prior for the control group event rate
beta0.val = beta.select(list(x = x1, p = p.x1),
                        list(x = x2, p = p.x2))
p0.sim = rbeta(n.draws, beta0.val[1], beta0.val[2])
### Log transform it
theta0.sim = fun_log.trans(p0.sim)


## Prior distribution for the difference in logit-p for each group
beta1.val = beta.select(list(x = y1, p = p.y1),
                        list(x = y2, p = p.y2))
p1.sim = rbeta(n.draws, beta1.val[1], beta1.val[2])
### Log transform
theta1.sim = fun_log.trans(p1.sim)


## 2. Create a matrix to store priors
priors = get_prior(family = binomial,
                   event | trials(n) ~ group,
                   data = data)

### Get characteristics of the normal distribution for the priors generated in step 1
theta0.val = c(mean(theta0.sim), sd(theta0.sim))
theta1.val = c(mean(theta1.sim), sd(theta1.sim))

### Input the these characteristics into the prior matrix generated at the start of step 2
priors$prior[3] = paste("normal(", theta0.val[1], ", ", theta0.val[2], ")", sep = "")
priors$prior[2] = paste("normal(", theta1.val[1], ", ", theta1.val[2], ")", sep = "")



# 3. Generate model with stan
model = brm(family = binomial,
            event | trials(n) ~ group,
            data = data,
            prior = priors,
            iter = n.iter,
            warmup = n.warmup,
            chains = n.chains,
            refresh = 0)



## Plot model
plot(model)
print(model)

# 4. Generate posteriors
posteriors = as_draws_df(model)

## Inverse log transform function on theta to get p again
posteriors = posteriors %>%
  mutate(across(starts_with("b_"), .f = fun_invlog.trans, .names = "{.col}_p")) %>%
  rename_with(~paste0(., "_theta"), .cols = starts_with("b_") & !ends_with("_p"))

## 95% posterior interval estimate
quantile(posteriors$b_Intercept_p, c(0.025, 0.975))


# 5. Plot posterior densities
## Take the posterior interval data and bind it with the priors
## Ideally, n.draws = number of iterations - warmup
graph = posteriors %>%
  select(ends_with("_p")) %>%
  cbind(p0.sim, p1.sim) %>%
  pivot_longer(cols = everything(),
               names_to = "distribution",
               values_to = "value")

alt.graph = posteriors %>%
  mutate(new_p = b_Intercept_p - b_groupint_p) %>%
  select(new_p) %>%
  cbind(p0.sim, p1.sim)

quantile(alt.graph$new_p, c(0.025, 0.975))

alt.graph = alt.graph %>%
  pivot_longer(cols = everything(),
               names_to = "distribution",
               values_to = "value")

## Plot it
alt.graph %>%
  ggplot(aes(x = value)) +
  geom_density(aes(fill = distribution),
               alpha = 0.5) +
  theme_light()

原文

I would like to produce a posterior distribution for the difference of two groups. I can do this in JAGS in R, but am hoping to move into this century and replicate this with Stan. I am using brms.

My questions are:

Have I correctly produced the posterior distribution for the effect size?
If not, should I use one of the other functions suggested in the answer here?
How can I account for a prior effect size that could be negative? I imagine this requires using a non-beta prior, but am not sure what would be preferable.

The below code specifies a control and intervention group, with a control and intervention group event rate. It generates beta distributions for each group (step 1), prior distributions (step 2), builds a binomial model (step 3), and posteriors (step 4).

I am reasonably confident what I’ve done in the above steps is correct. I am a bit uncertain as to what the values generated by as_draws_df are - namely the b_Intercept_p vs. b_groupint_p. Further reading on these forums suggested subtracting b_groupint_p from b_Intercept_p; which does produce a plot that could plausibly be the posterior interval (labelled new_p when plotting the alt.graph object), but I am pretty uncertain on this point and would appreciate any clarification.

# Setup
## Packages
library(tidyverse)
library(ProbBayes)
library(brms)
library(tidybayes)

## Options
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())


# ***
# Setup simulation
# ***

# Trial data
con.n = 150
con.event = 50
int.n = 150
int.event = 30


# Distributions
## x1 is the expected control event rate at probability p.x1
## x2 is the control event rate at probability p.x2
x1 = 0.4
p.x1 = 0.5
x2 = 0.6
p.x2 = 0.9

## As above, but for the difference between intervention and control group
y1 = 0.05
p.y1 = 0.5
y2 = 0.1
p.y2 = 0.9


# Model setup
## For brm()
n.chains = 3
n.iter = 4000
n.warmup = 500

## For beta.select(); based on model so the number of observations are equal
n.draws = n.iter - n.warmup



# ***
# Simulation starts here
# ***



# Define Functions
## Log transformation
fun_log.trans = function(x) {
  log.trans = log(x / (1 - x))
  log.trans
}


## Inverse log transformation
fun_invlog.trans = function(x) {
  invlog.trans = exp(x) / (1 + exp(x))
  invlog.trans
}



# Run things
# 0. Put data into dataframe
data = data.frame(group = c("int"    , "con"),
                  n =     c(int.n    , con.n),
                  event = c(int.event, con.event))


# 1. Generate prior distributions
## Beta prior for the control group event rate
beta0.val = beta.select(list(x = x1, p = p.x1),
                        list(x = x2, p = p.x2))
p0.sim = rbeta(n.draws, beta0.val[1], beta0.val[2])
### Log transform it
theta0.sim = fun_log.trans(p0.sim)


## Prior distribution for the difference in logit-p for each group
beta1.val = beta.select(list(x = y1, p = p.y1),
                        list(x = y2, p = p.y2))
p1.sim = rbeta(n.draws, beta1.val[1], beta1.val[2])
### Log transform
theta1.sim = fun_log.trans(p1.sim)


## 2. Create a matrix to store priors
priors = get_prior(family = binomial,
                   event | trials(n) ~ group,
                   data = data)

### Get characteristics of the normal distribution for the priors generated in step 1
theta0.val = c(mean(theta0.sim), sd(theta0.sim))
theta1.val = c(mean(theta1.sim), sd(theta1.sim))

### Input the these characteristics into the prior matrix generated at the start of step 2
priors$prior[3] = paste("normal(", theta0.val[1], ", ", theta0.val[2], ")", sep = "")
priors$prior[2] = paste("normal(", theta1.val[1], ", ", theta1.val[2], ")", sep = "")



# 3. Generate model with stan
model = brm(family = binomial,
            event | trials(n) ~ group,
            data = data,
            prior = priors,
            iter = n.iter,
            warmup = n.warmup,
            chains = n.chains,
            refresh = 0)



## Plot model
plot(model)
print(model)

# 4. Generate posteriors
posteriors = as_draws_df(model)

## Inverse log transform function on theta to get p again
posteriors = posteriors %>%
  mutate(across(starts_with("b_"), .f = fun_invlog.trans, .names = "{.col}_p")) %>%
  rename_with(~paste0(., "_theta"), .cols = starts_with("b_") & !ends_with("_p"))

## 95% posterior interval estimate
quantile(posteriors$b_Intercept_p, c(0.025, 0.975))


# 5. Plot posterior densities
## Take the posterior interval data and bind it with the priors
## Ideally, n.draws = number of iterations - warmup
graph = posteriors %>%
  select(ends_with("_p")) %>%
  cbind(p0.sim, p1.sim) %>%
  pivot_longer(cols = everything(),
               names_to = "distribution",
               values_to = "value")

alt.graph = posteriors %>%
  mutate(new_p = b_Intercept_p - b_groupint_p) %>%
  select(new_p) %>%
  cbind(p0.sim, p1.sim)

quantile(alt.graph$new_p, c(0.025, 0.975))

alt.graph = alt.graph %>%
  pivot_longer(cols = everything(),
               names_to = "distribution",
               values_to = "value")

## Plot it
alt.graph %>%
  ggplot(aes(x = value)) +
  geom_density(aes(fill = distribution),
               alpha = 0.5) +
  theme_light()

分享到QQ

分享到微博