为什么我会得到相似的 CI,但样本量却如此不同?
我刚刚学会了如何在R中进行bootstrap,我很兴奋。我正在玩一些数据,发现我采用多少bootstrap样本并不重要,CI似乎总是相同的。我相信,样本越多,CI就越狭窄。这是代码。
library(boot)
M.<-function(dados,i){
d<-dados[i,]
mean(d$queimadas)
}
bootmu<-boot(dados,statistic=M.,R=10000)
boot.ci(bootmu)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates
CALL :
boot.ci(boot.out = bootmu)
Intervals :
Level Normal Basic
95% (18.36, 21.64 ) (18.37, 21.63 )
Level Percentile BCa
95% (18.37, 21.63 ) (18.37, 21.63 )
Calculations and Intervals on Original Scale
Warning message:
In boot.ci(bootmu) : bootstrap variances needed for studentized intervals
如人们所见,我取了10000个样品。现在,让我们尝试使用100个。
bootmu<-boot(dados,statistic=M.,R=100)
boot.ci(bootmu)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 100 bootstrap replicates
CALL :
boot.ci(boot.out = bootmu)
Intervals :
Level Normal Basic
95% (18.33, 21.45 ) (18.19, 21.61 )
Level Percentile BCa
95% (18.39, 21.81 ) (18.10, 21.10 )
Calculations and Intervals on Original Scale
Some basic intervals may be unstable
Some percentile intervals may be unstable
Warning : BCa Intervals used Extreme Quantiles
Some BCa intervals may be unstable
Warning messages:
1: In boot.ci(bootmu) :
bootstrap variances needed for studentized intervals
2: In norm.inter(t, adj.alpha) :
extreme order statistics used as endpoints
>
样本量降低了很多倍,但顺式基本相同。为什么?
如果有人想复制完全相同的示例,则是数据。
> dados
queimadas plantacoes
1 27 418
2 13 353
3 21 239
4 14 251
5 18 482
6 18 361
7 22 213
8 24 374
9 21 298
10 15 182
11 23 413
12 17 218
13 10 299
14 23 306
15 22 267
16 18 56
17 24 538
18 19 424
19 15 64
20 16 225
21 25 266
22 21 218
23 24 424
24 26 38
25 19 309
26 20 451
27 16 351
28 15 174
29 24 302
30 30 492
I just learned how to do bootstrap in R, and I'm excited. I was playing with some data, and found that, doesn't matter how many bootstrap samples I take, the CIs seem to be always around the same. I believe that, the more samples, the more narrow should the CI be. Here's the code.
library(boot)
M.<-function(dados,i){
d<-dados[i,]
mean(d$queimadas)
}
bootmu<-boot(dados,statistic=M.,R=10000)
boot.ci(bootmu)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates
CALL :
boot.ci(boot.out = bootmu)
Intervals :
Level Normal Basic
95% (18.36, 21.64 ) (18.37, 21.63 )
Level Percentile BCa
95% (18.37, 21.63 ) (18.37, 21.63 )
Calculations and Intervals on Original Scale
Warning message:
In boot.ci(bootmu) : bootstrap variances needed for studentized intervals
As one can see, I took 10000 samples. Now let's try with just 100.
bootmu<-boot(dados,statistic=M.,R=100)
boot.ci(bootmu)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 100 bootstrap replicates
CALL :
boot.ci(boot.out = bootmu)
Intervals :
Level Normal Basic
95% (18.33, 21.45 ) (18.19, 21.61 )
Level Percentile BCa
95% (18.39, 21.81 ) (18.10, 21.10 )
Calculations and Intervals on Original Scale
Some basic intervals may be unstable
Some percentile intervals may be unstable
Warning : BCa Intervals used Extreme Quantiles
Some BCa intervals may be unstable
Warning messages:
1: In boot.ci(bootmu) :
bootstrap variances needed for studentized intervals
2: In norm.inter(t, adj.alpha) :
extreme order statistics used as endpoints
>
The sample size is many times lower, but the CIs are essentially the same. Why?
If anyone wants to replicate the exact same example, here's the data.
> dados
queimadas plantacoes
1 27 418
2 13 353
3 21 239
4 14 251
5 18 482
6 18 361
7 22 213
8 24 374
9 21 298
10 15 182
11 23 413
12 17 218
13 10 299
14 23 306
15 22 267
16 18 56
17 24 538
18 19 424
19 15 64
20 16 225
21 25 266
22 21 218
23 24 424
24 26 38
25 19 309
26 20 451
27 16 351
28 15 174
29 24 302
30 30 492
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
估计器的置信区间不取决于引导程序重复的数量,而是取决于原始数据集的大小。
增加引导重复的数量将提高计算采样分布(因此置信区间)的精度,但不能使您对样本均值的估计更加精确。
尝试使用分析方法计算平均值周围的置信区间以进行比较。
您将看到两个引导程序(具有 100 或 10000 个样本)都相当好地估计了线性回归计算出的 CI
The confidence interval for your estimator does not depend on the number of bootstrap replicates, it depends on the size of the original dataset.
Increasing the number of bootstrap replicates will increase the precision with which the sampling distribution (hence the confidence intervals) are calculated, but cannot make your estimate of the mean of your samples more precise.
Try calculating the confidence interval around the mean using an analytic method for comparison.
You will see that both bootstraps (with 100 or 10000 samples) are both estimating the CI calculated by linear regression fairly well