试图从贝叶斯统计数据中复制数字而没有泪水：一个采样绘制的角度，但失败了

发布于 2025-01-28 10:30:17 字数 1860 浏览 2 评论 0原文

我正在尝试从纸贝叶斯统计数据中复制三个数字，而不会泪流满面：一个采样绘制的视角，可以在这里找到： http://hedibert.org/wp-content/uploads/2013/2013/12/12/12/192smithgelfand.pdf 我的目标是复制第5节的结果。这是我的代码：

theta1<-runif(1000,0,1)
theta2<-runif(1000,0,1)
theta<-cbind(theta1,theta2)
theta<-as.data.frame(theta)

plot(theta1,theta2)

n1<-c(5,6,4)
n2<-c(5,4,6)
y<-c(7,5,6)

l<-rep(NA,nrow(theta))

for (i in 1:nrow(theta)){
  
  llh.1.store<-rep(NA,4)
  for (j in 2:5){
    llh.1.store[j-1]<-(factorial(n1[1])/(factorial(j)*factorial(n1[1]-j)))*(factorial(n2[1])/(factorial(y[1]-j)*factorial(n2[1]-y[1]+j)))*(theta[i,1]^j)*((1-theta[i,1])^(n1[1]-j))*(theta[i,2]^(y[1]-j))*((1-theta[i,2])^(n2[1]-y[1]+j))
  }
  llh1<-sum(llh.1.store)
  
  llh.2.store<-rep(NA,5)
  for (x in 1:5){
    llh.2.store[x]<-(factorial(n1[2])/(factorial(x)*factorial(n1[2]-x)))*(factorial(n2[2])/(factorial(y[2]-x)*factorial(n2[2]-y[2]+x)))*(theta[i,1]^x)*((1-theta[i,1])^(n1[2]-x))*(theta[i,2]^(y[2]-x))*((1-theta[i,2])^(n2[2]-y[2]+x))
  }
  llh2<-sum(llh.2.store)
  
  llh.3.store<-rep(NA,5)
  for (t in 0:4){
    llh.3.store[t+1]<-(factorial(n1[3])/(factorial(t)*factorial(n1[3]-t)))*(factorial(n2[3])/(factorial(y[3]-t)*factorial(n2[3]-y[3]+t)))*(theta[i,1]^t)*((1-theta[i,1])^(n1[3]-t))*(theta[i,2]^(y[3]-t))*((1-theta[i,2])^(n2[3]-y[3]+t))
  }
  llh3<-sum(llh.3.store)
  
  l[i]<-prod(llh1,llh2,llh3)
}

q<-l/sum(l)
post.theta<-sample_n(theta,1000,replace=TRUE,weight=q)

ggplot(post.theta) +
  aes(x = theta1, y = theta2) +
  geom_point(
    shape = "circle",
    size = 1.85,
    colour = "#440154"
  ) +
  labs(title = "Sample from Posterior") +
  ggthemes::theme_few()

但是它不会产生与图2相同的图。有人可以告诉我我做错了什么吗？

原文

I'm trying to replicate the three figures from the paper Bayesian statistics without tears: A sampling-resampling perspective, which can be found here: http://hedibert.org/wp-content/uploads/2013/12/1992SmithGelfand.pdf
My goal is to replicate the results from section 5. Here's my code:

theta1<-runif(1000,0,1)
theta2<-runif(1000,0,1)
theta<-cbind(theta1,theta2)
theta<-as.data.frame(theta)

plot(theta1,theta2)

n1<-c(5,6,4)
n2<-c(5,4,6)
y<-c(7,5,6)

l<-rep(NA,nrow(theta))

for (i in 1:nrow(theta)){
  
  llh.1.store<-rep(NA,4)
  for (j in 2:5){
    llh.1.store[j-1]<-(factorial(n1[1])/(factorial(j)*factorial(n1[1]-j)))*(factorial(n2[1])/(factorial(y[1]-j)*factorial(n2[1]-y[1]+j)))*(theta[i,1]^j)*((1-theta[i,1])^(n1[1]-j))*(theta[i,2]^(y[1]-j))*((1-theta[i,2])^(n2[1]-y[1]+j))
  }
  llh1<-sum(llh.1.store)
  
  llh.2.store<-rep(NA,5)
  for (x in 1:5){
    llh.2.store[x]<-(factorial(n1[2])/(factorial(x)*factorial(n1[2]-x)))*(factorial(n2[2])/(factorial(y[2]-x)*factorial(n2[2]-y[2]+x)))*(theta[i,1]^x)*((1-theta[i,1])^(n1[2]-x))*(theta[i,2]^(y[2]-x))*((1-theta[i,2])^(n2[2]-y[2]+x))
  }
  llh2<-sum(llh.2.store)
  
  llh.3.store<-rep(NA,5)
  for (t in 0:4){
    llh.3.store[t+1]<-(factorial(n1[3])/(factorial(t)*factorial(n1[3]-t)))*(factorial(n2[3])/(factorial(y[3]-t)*factorial(n2[3]-y[3]+t)))*(theta[i,1]^t)*((1-theta[i,1])^(n1[3]-t))*(theta[i,2]^(y[3]-t))*((1-theta[i,2])^(n2[3]-y[3]+t))
  }
  llh3<-sum(llh.3.store)
  
  l[i]<-prod(llh1,llh2,llh3)
}

q<-l/sum(l)
post.theta<-sample_n(theta,1000,replace=TRUE,weight=q)

ggplot(post.theta) +
  aes(x = theta1, y = theta2) +
  geom_point(
    shape = "circle",
    size = 1.85,
    colour = "#440154"
  ) +
  labs(title = "Sample from Posterior") +
  ggthemes::theme_few()

But it doesn't generate the same plot as figure 2. Can anyone please tell me what I did wrong?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

海拔太高太耀眼 2025-02-04 10:30:17

这是Gelfand and Smith（1990，TAS）模型的描述：

因此，实际上可能是由J的三个总和的产物制成的。

出于好奇，我为此后部写了一个吉布斯采样器，将缺失的x [i]模拟为潜在变量，与上述OP（蓝色）相比，我没有发现结果（黄色）的任何差异（黄色）。：

这是我的R代码：

T=1e4
    
theta=matrix(NA,nrow=T,ncol=2)
x1=rep(NA,3)
    
theta[1,]=runif(2)
for(t in 1:(T-1)){
   for(j in 1:3){
     a<-max(0,n2[j]-y[j]):min(y[j],n1[j])
     x1[j]=sample(a,1,
        prob=choose(n1[j],a)*choose(n2[j],y[j]-a)*
           theta[t,1]^a*(1-theta[t,1])^(n1[j]-a)*
           theta[t,2]^(y[j]-a)*(1-theta[t,2])^(n2[j]-y[j]+a)
           )
      }
      theta[t+1,1]=rbeta(1,sum(x1)+1,sum(n1)-sum(x1)+1)
      theta[t+1,2]=rbeta(1,sum(y)-sum(x1)+1,sum(n2)-sum(y)+sum(x1)+1)
}
    
plot(theta[sample(1:nrow(theta),1000),],
   xlab = expression(theta[1]), 
   ylab = expression(theta[2]),
   col = "#440154", pch=19, cex=.4,
   main = "Sample from Posterior")

此结果也与纸上得出的可能性函数兼容：

Here is the description of the model of Gelfand and Smith (1990, TAS):

Hence the likelihood is indeed made of the product of the three sums over the j's.

Out of curiosity, I wrote a Gibbs sampler for this posterior, simulating the missing X¹[i]'s as latent variables and I did not spot any difference in the outcome (yellow) when compared with the above one from the OP (blue):

Here is my R code:

T=1e4
    
theta=matrix(NA,nrow=T,ncol=2)
x1=rep(NA,3)
    
theta[1,]=runif(2)
for(t in 1:(T-1)){
   for(j in 1:3){
     a<-max(0,n2[j]-y[j]):min(y[j],n1[j])
     x1[j]=sample(a,1,
        prob=choose(n1[j],a)*choose(n2[j],y[j]-a)*
           theta[t,1]^a*(1-theta[t,1])^(n1[j]-a)*
           theta[t,2]^(y[j]-a)*(1-theta[t,2])^(n2[j]-y[j]+a)
           )
      }
      theta[t+1,1]=rbeta(1,sum(x1)+1,sum(n1)-sum(x1)+1)
      theta[t+1,2]=rbeta(1,sum(y)-sum(x1)+1,sum(n2)-sum(y)+sum(x1)+1)
}
    
plot(theta[sample(1:nrow(theta),1000),],
   xlab = expression(theta[1]), 
   ylab = expression(theta[2]),
   col = "#440154", pch=19, cex=.4,
   main = "Sample from Posterior")

This outcome is also compatible with the likelihood function derived from the paper: