我正在尝试使用 R 在 Excel 中重现加权最小二乘法 (WLS) 进行确认。我使用(简单但可重现)以下数据集来执行双重检查:
x<-c(1,2,3,4,5,6)
y<-c(9,23,30,42,54,66)
w<-1/x
当我使用 lm 和权重参数计算 WLS 时,如下所示:
WLS<-lm(y~x, weights = w)
summary(WLS)
输出为:
> summary(WLS)
Call:
lm(formula = y ~ x, weights = w)
Weighted Residuals:
1 2 3 4 5 6
-0.50162 1.67280 -1.02017 -0.44984 -0.01447 0.34087
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.6311 1.2241 -1.333 0.254
x 11.1327 0.4181 26.627 1.18e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.05 on 4 degrees of freedom
Multiple R-squared: 0.9944, Adjusted R-squared: 0.993
F-statistic: 709 on 1 and 4 DF, p-value: 1.182e-05
我已阅读 这里可以使用以下几行手动计算从 R 计算出的残差标准误差(为方便起见,对上述模型进行了调整):
k=length(WLS$coefficients)-1 #Subtract one to ignore intercept
SSE=sum(WLS$residuals**2)
n=length(WLS$residuals)
sqrt(SSE/(n-(1+k))) #Residual Standard Error
此计算与我在中看到的公式一致很多书(例如此处)。然而,当运行此手动计算时,返回的结果是1.618487
(即不是1.05)。
我发现此处也可以通过应用 OLS 来执行 WLS使用变换转换变量(矩阵符号模型:Y'=X'B+e'):Y=W^(1/2)Y; X'=W^(1/2)X ; e'=W^(1/2)e。我用以下代码在 R 中执行它:
v<-w^(1/2)
x2<-x*v
y2<-y*v
WLS2<-lm(y2~0+v+x2)
即截距预测为零的模型,v 代表新的截距。这样做,我得到以下输出:
> summary(WLS2)
Call:
lm(formula = y2 ~ 0 + v + x2)
Residuals:
1 2 3 4 5 6
-0.50162 1.67280 -1.02017 -0.44984 -0.01447 0.34087
Coefficients:
Estimate Std. Error t value Pr(>|t|)
v -1.6311 1.2241 -1.333 0.254
x2 11.1327 0.4181 26.627 1.18e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.05 on 4 degrees of freedom
Multiple R-squared: 0.9982, Adjusted R-squared: 0.9972
F-statistic: 1085 on 2 and 4 DF, p-value: 3.388e-06
请注意,回归系数和残差标准误差相同,但 R²、F 统计量和残差不同。此外,当我使用该模型的残差 (WLS2) 计算残差标准误差时,我确实得到了 1.049927
。
我的问题:有人可以解释一下为什么 R 返回的残差标准误差对于两个模型来说是相同的,尽管有不同的残差?第一个模型(无需数据转换)的残余标准误差应为 1.618487
(手动计算)是否正确?这是 R 内部计算 WLS 的问题吗?看来 R 在计算残差标准误差之前省略了对残差进行反向变换。
谢谢 !
I am trying to reproduce Weighted Least Squares (WLS) in Excel using R for confirmation. I use the (trivial but reproducible) following dataset to perform a double check :
x<-c(1,2,3,4,5,6)
y<-c(9,23,30,42,54,66)
w<-1/x
When I calculate a WLS using lm and the weight argument as follows :
WLS<-lm(y~x, weights = w)
summary(WLS)
The output is :
> summary(WLS)
Call:
lm(formula = y ~ x, weights = w)
Weighted Residuals:
1 2 3 4 5 6
-0.50162 1.67280 -1.02017 -0.44984 -0.01447 0.34087
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.6311 1.2241 -1.333 0.254
x 11.1327 0.4181 26.627 1.18e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.05 on 4 degrees of freedom
Multiple R-squared: 0.9944, Adjusted R-squared: 0.993
F-statistic: 709 on 1 and 4 DF, p-value: 1.182e-05
I have read here that the Residual standard error as calculated from R can be calculated manually using the following lines (adapted to the above model for convenience) :
k=length(WLS$coefficients)-1 #Subtract one to ignore intercept
SSE=sum(WLS$residuals**2)
n=length(WLS$residuals)
sqrt(SSE/(n-(1+k))) #Residual Standard Error
This calculation is consistent with the formula I have seen in many books (e.g. here). However, when running this manual calculation, the result returned is 1.618487
(i.e. not 1.05).
I found namely here that WLS can also be performed by applying OLS to transformed variables (model in matrix notation: Y'=X'B+e') using the transformation : Y=W^(1/2)Y ; X'=W^(1/2)X ; e'=W^(1/2)e. I perfomed it in R with the following code :
v<-w^(1/2)
x2<-x*v
y2<-y*v
WLS2<-lm(y2~0+v+x2)
i.e. model with intercept forecd to zero, v represents the new intercept. Doing so, I get the following output :
> summary(WLS2)
Call:
lm(formula = y2 ~ 0 + v + x2)
Residuals:
1 2 3 4 5 6
-0.50162 1.67280 -1.02017 -0.44984 -0.01447 0.34087
Coefficients:
Estimate Std. Error t value Pr(>|t|)
v -1.6311 1.2241 -1.333 0.254
x2 11.1327 0.4181 26.627 1.18e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.05 on 4 degrees of freedom
Multiple R-squared: 0.9982, Adjusted R-squared: 0.9972
F-statistic: 1085 on 2 and 4 DF, p-value: 3.388e-06
Note that the regression coefficients and Residual standard error are the same but the R², F -statistic and residuals are different. Also, I do get 1.049927
when I calculate the Residual standard error with the residuals of that model (WLS2).
My question : can someone kindly explain why the Residual standard error returned by R are the same for the two models despite having different residuals ? Is it correct that the Residual standard error should be 1.618487
(as calculated manually) for the first model (without data transformation) ? Is that a problem with how R internally computes WLS ? It seems that R omits to back-transform residuals prior to calculate the Residual standard error.
Thanks !
发布评论
评论(1)
否,因为您为模型安装了权重,但随后忘记了 sigma 计算中的权重。
由 reprex 软件包 (v2.0.1) 创建于 2022 年 3 月 20 日
这是因为第二个模型明确包含权重:
No because you fitted a model with weights but then forgot about the weights in your computation of sigma.
Created on 2022-03-20 by the reprex package (v2.0.1)
It's because the second model explicitly includes the weights: