为什么我会得到相似的 CI,但样本量却如此不同?

发布于 2025-01-18 18:11:40 字数 2360 浏览 5 评论 0原文

我刚刚学会了如何在R中进行bootstrap,我很兴奋。我正在玩一些数据,发现我采用多少bootstrap样本并不重要,CI似乎总是相同的。我相信,样本越多,CI就越狭窄。这是代码。

library(boot)

M.<-function(dados,i){
d<-dados[i,]
mean(d$queimadas)
}

bootmu<-boot(dados,statistic=M.,R=10000)

boot.ci(bootmu)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates

CALL : 
boot.ci(boot.out = bootmu)

Intervals : 
Level      Normal              Basic         
95%   (18.36, 21.64 )   (18.37, 21.63 )  

Level     Percentile            BCa          
95%   (18.37, 21.63 )   (18.37, 21.63 )  
Calculations and Intervals on Original Scale
Warning message:
In boot.ci(bootmu) : bootstrap variances needed for studentized intervals

如人们所见,我取了10000个样品。现在,让我们尝试使用100个。


bootmu<-boot(dados,statistic=M.,R=100)

boot.ci(bootmu)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 100 bootstrap replicates

CALL : 
boot.ci(boot.out = bootmu)

Intervals : 
Level      Normal              Basic         
95%   (18.33, 21.45 )   (18.19, 21.61 )  

Level     Percentile            BCa          
95%   (18.39, 21.81 )   (18.10, 21.10 )  
Calculations and Intervals on Original Scale
Some basic intervals may be unstable
Some percentile intervals may be unstable
Warning : BCa Intervals used Extreme Quantiles
Some BCa intervals may be unstable
Warning messages:
1: In boot.ci(bootmu) :
  bootstrap variances needed for studentized intervals
2: In norm.inter(t, adj.alpha) :
  extreme order statistics used as endpoints
> 

样本量降低了很多倍,但顺式基本相同。为什么?

如果有人想复制完全相同的示例,则是数据。

> dados
   queimadas plantacoes
1         27        418
2         13        353
3         21        239
4         14        251
5         18        482
6         18        361
7         22        213
8         24        374
9         21        298
10        15        182
11        23        413
12        17        218
13        10        299
14        23        306
15        22        267
16        18         56
17        24        538
18        19        424
19        15         64
20        16        225
21        25        266
22        21        218
23        24        424
24        26         38
25        19        309
26        20        451
27        16        351
28        15        174
29        24        302
30        30        492

I just learned how to do bootstrap in R, and I'm excited. I was playing with some data, and found that, doesn't matter how many bootstrap samples I take, the CIs seem to be always around the same. I believe that, the more samples, the more narrow should the CI be. Here's the code.

library(boot)

M.<-function(dados,i){
d<-dados[i,]
mean(d$queimadas)
}

bootmu<-boot(dados,statistic=M.,R=10000)

boot.ci(bootmu)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates

CALL : 
boot.ci(boot.out = bootmu)

Intervals : 
Level      Normal              Basic         
95%   (18.36, 21.64 )   (18.37, 21.63 )  

Level     Percentile            BCa          
95%   (18.37, 21.63 )   (18.37, 21.63 )  
Calculations and Intervals on Original Scale
Warning message:
In boot.ci(bootmu) : bootstrap variances needed for studentized intervals

As one can see, I took 10000 samples. Now let's try with just 100.


bootmu<-boot(dados,statistic=M.,R=100)

boot.ci(bootmu)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 100 bootstrap replicates

CALL : 
boot.ci(boot.out = bootmu)

Intervals : 
Level      Normal              Basic         
95%   (18.33, 21.45 )   (18.19, 21.61 )  

Level     Percentile            BCa          
95%   (18.39, 21.81 )   (18.10, 21.10 )  
Calculations and Intervals on Original Scale
Some basic intervals may be unstable
Some percentile intervals may be unstable
Warning : BCa Intervals used Extreme Quantiles
Some BCa intervals may be unstable
Warning messages:
1: In boot.ci(bootmu) :
  bootstrap variances needed for studentized intervals
2: In norm.inter(t, adj.alpha) :
  extreme order statistics used as endpoints
> 

The sample size is many times lower, but the CIs are essentially the same. Why?

If anyone wants to replicate the exact same example, here's the data.

> dados
   queimadas plantacoes
1         27        418
2         13        353
3         21        239
4         14        251
5         18        482
6         18        361
7         22        213
8         24        374
9         21        298
10        15        182
11        23        413
12        17        218
13        10        299
14        23        306
15        22        267
16        18         56
17        24        538
18        19        424
19        15         64
20        16        225
21        25        266
22        21        218
23        24        424
24        26         38
25        19        309
26        20        451
27        16        351
28        15        174
29        24        302
30        30        492

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

旧竹 2025-01-25 18:11:40

估计器的置信区间不取决于引导程序重复的数量,而是取决于原始数据集的大小。

增加引导重复的数量将提高计算采样分布(因此置信区间)的精度,但不能使您对样本均值的估计更加精确。

尝试使用分析方法计算平均值周围的置信区间以进行比较。

> confint(lm(dados$queimadas~1))
               2.5 %   97.5 %
(Intercept) 18.27624 21.72376

您将看到两个引导程序(具有 100 或 10000 个样本)都相当好地估计了线性回归计算出的 CI

The confidence interval for your estimator does not depend on the number of bootstrap replicates, it depends on the size of the original dataset.

Increasing the number of bootstrap replicates will increase the precision with which the sampling distribution (hence the confidence intervals) are calculated, but cannot make your estimate of the mean of your samples more precise.

Try calculating the confidence interval around the mean using an analytic method for comparison.

> confint(lm(dados$queimadas~1))
               2.5 %   97.5 %
(Intercept) 18.27624 21.72376

You will see that both bootstraps (with 100 or 10000 samples) are both estimating the CI calculated by linear regression fairly well

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文