当前位置：文江博客话题详情

与 R 的“残差标准误差”不一致（以 lm 为单位）如果是 WLS

发布于 2025-01-15 12:29:26 字数 2742 浏览 3 评论 0 原文

我正在尝试使用 R 在 Excel 中重现加权最小二乘法 (WLS) 进行确认。我使用（简单但可重现）以下数据集来执行双重检查：

x<-c(1,2,3,4,5,6)
y<-c(9,23,30,42,54,66)
w<-1/x

当我使用 lm 和权重参数计算 WLS 时，如下所示：

WLS<-lm(y~x, weights = w)
summary(WLS)

输出为：

> summary(WLS)

Call:
lm(formula = y ~ x, weights = w)

Weighted Residuals:
       1        2        3        4        5        6 
-0.50162  1.67280 -1.02017 -0.44984 -0.01447  0.34087 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -1.6311     1.2241  -1.333    0.254    
x            11.1327     0.4181  26.627 1.18e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.05 on 4 degrees of freedom
Multiple R-squared:  0.9944,    Adjusted R-squared:  0.993 
F-statistic:   709 on 1 and 4 DF,  p-value: 1.182e-05

我已阅读这里可以使用以下几行手动计算从 R 计算出的残差标准误差（为方便起见，对上述模型进行了调整）：

k=length(WLS$coefficients)-1 #Subtract one to ignore intercept
SSE=sum(WLS$residuals**2)
n=length(WLS$residuals)
sqrt(SSE/(n-(1+k))) #Residual Standard Error

此计算与我在中看到的公式一致很多书（例如此处）。然而，当运行此手动计算时，返回的结果是1.618487（即不是1.05）。

我发现此处也可以通过应用 OLS 来执行 WLS使用变换转换变量（矩阵符号模型：Y'=X'B+e'）：Y=W^(1/2)Y； X'=W^(1/2)X ; e'=W^(1/2)e。我用以下代码在 R 中执行它：

v<-w^(1/2)
x2<-x*v
y2<-y*v
WLS2<-lm(y2~0+v+x2)

即截距预测为零的模型，v 代表新的截距。这样做，我得到以下输出：

> summary(WLS2)

Call:
lm(formula = y2 ~ 0 + v + x2)

Residuals:
       1        2        3        4        5        6 
-0.50162  1.67280 -1.02017 -0.44984 -0.01447  0.34087 

Coefficients:
   Estimate Std. Error t value Pr(>|t|)    
v   -1.6311     1.2241  -1.333    0.254    
x2  11.1327     0.4181  26.627 1.18e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.05 on 4 degrees of freedom
Multiple R-squared:  0.9982,    Adjusted R-squared:  0.9972 
F-statistic:  1085 on 2 and 4 DF,  p-value: 3.388e-06

请注意，回归系数和残差标准误差相同，但 R²、F 统计量和残差不同。此外，当我使用该模型的残差 (WLS2) 计算残差标准误差时，我确实得到了 1.049927。

我的问题：有人可以解释一下为什么 R 返回的残差标准误差对于两个模型来说是相同的，尽管有不同的残差？第一个模型（无需数据转换）的残余标准误差应为 1.618487 （手动计算）是否正确？这是 R 内部计算 WLS 的问题吗？看来 R 在计算残差标准误差之前省略了对残差进行反向变换。

谢谢！

原文

I am trying to reproduce Weighted Least Squares (WLS) in Excel using R for confirmation. I use the (trivial but reproducible) following dataset to perform a double check :

x<-c(1,2,3,4,5,6)
y<-c(9,23,30,42,54,66)
w<-1/x

When I calculate a WLS using lm and the weight argument as follows :

WLS<-lm(y~x, weights = w)
summary(WLS)

The output is :

> summary(WLS)

Call:
lm(formula = y ~ x, weights = w)

Weighted Residuals:
       1        2        3        4        5        6 
-0.50162  1.67280 -1.02017 -0.44984 -0.01447  0.34087 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -1.6311     1.2241  -1.333    0.254    
x            11.1327     0.4181  26.627 1.18e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.05 on 4 degrees of freedom
Multiple R-squared:  0.9944,    Adjusted R-squared:  0.993 
F-statistic:   709 on 1 and 4 DF,  p-value: 1.182e-05

I have read here that the Residual standard error as calculated from R can be calculated manually using the following lines (adapted to the above model for convenience) :

k=length(WLS$coefficients)-1 #Subtract one to ignore intercept
SSE=sum(WLS$residuals**2)
n=length(WLS$residuals)
sqrt(SSE/(n-(1+k))) #Residual Standard Error

This calculation is consistent with the formula I have seen in many books (e.g. here). However, when running this manual calculation, the result returned is 1.618487 (i.e. not 1.05).

I found namely here that WLS can also be performed by applying OLS to transformed variables (model in matrix notation: Y'=X'B+e') using the transformation : Y=W^(1/2)Y ; X'=W^(1/2)X ; e'=W^(1/2)e. I perfomed it in R with the following code :

v<-w^(1/2)
x2<-x*v
y2<-y*v
WLS2<-lm(y2~0+v+x2)

i.e. model with intercept forecd to zero, v represents the new intercept. Doing so, I get the following output :

> summary(WLS2)

Call:
lm(formula = y2 ~ 0 + v + x2)

Residuals:
       1        2        3        4        5        6 
-0.50162  1.67280 -1.02017 -0.44984 -0.01447  0.34087 

Coefficients:
   Estimate Std. Error t value Pr(>|t|)    
v   -1.6311     1.2241  -1.333    0.254    
x2  11.1327     0.4181  26.627 1.18e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.05 on 4 degrees of freedom
Multiple R-squared:  0.9982,    Adjusted R-squared:  0.9972 
F-statistic:  1085 on 2 and 4 DF,  p-value: 3.388e-06

Note that the regression coefficients and Residual standard error are the same but the R², F -statistic and residuals are different. Also, I do get 1.049927 when I calculate the Residual standard error with the residuals of that model (WLS2).

My question : can someone kindly explain why the Residual standard error returned by R are the same for the two models despite having different residuals ? Is it correct that the Residual standard error should be 1.618487 (as calculated manually) for the first model (without data transformation) ? Is that a problem with how R internally computes WLS ? It seems that R omits to back-transform residuals prior to calculate the Residual standard error.

Thanks !

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蓝梦月影 2025-01-22 12:29:26

第一个模型（不进行数据转换）的残余标准误差应为 1.618487（手动计算）是否正确？

否，因为您为模型安装了权重，但随后忘记了 sigma 计算中的权重。

x <- c(1, 2, 3, 4, 5, 6)
y <- c(9, 23, 30, 42, 54, 66)
w <- 1 / x

WLS <- lm(y ~ x, weights = w)
summary(WLS)$sigma
#> [1] 1.049927

# You computed sigma for lm(y ~ x)
k <- length(WLS$coefficients) - 1 # Subtract one to ignore intercept
SSE <- sum(WLS$residuals**2)
n <- length(WLS$residuals)
sqrt(SSE / (n - (1 + k))) # Residual Standard Error, without weighting
#> [1] 1.618487

# But what you really wanted is to compute sigma for lm (y ~ x | weights = w)
SSE <- sum(w * (WLS$residuals)**2)
sqrt(SSE / (n - (1 + k))) # Residual Standard Error, with weighting
#> [1] 1.049927

^{由 reprex 软件包 (v2.0.1) 创建于 2022 年 3 月 20 日}

尽管两个模型的残差不同，但为什么 R 返回的残差标准误差相同？

这是因为第二个模型明确包含权重：

WLS：y ~ 1 + x，权重 = w
WLS2：sqrt(w) * y ~ sqrt(w) + sqrt(w) * x 权重 = 1

Is it correct that the Residual standard error should be 1.618487 (as calculated manually) for the first model (without data transformation) ?

No because you fitted a model with weights but then forgot about the weights in your computation of sigma.

x <- c(1, 2, 3, 4, 5, 6)
y <- c(9, 23, 30, 42, 54, 66)
w <- 1 / x

WLS <- lm(y ~ x, weights = w)
summary(WLS)$sigma
#> [1] 1.049927

# You computed sigma for lm(y ~ x)
k <- length(WLS$coefficients) - 1 # Subtract one to ignore intercept
SSE <- sum(WLS$residuals**2)
n <- length(WLS$residuals)
sqrt(SSE / (n - (1 + k))) # Residual Standard Error, without weighting
#> [1] 1.618487

# But what you really wanted is to compute sigma for lm (y ~ x | weights = w)
SSE <- sum(w * (WLS$residuals)**2)
sqrt(SSE / (n - (1 + k))) # Residual Standard Error, with weighting
#> [1] 1.049927

^{Created on 2022-03-20 by the reprex package (v2.0.1)}

Why the Residual standard error returned by R are the same for the two models despite having different residuals ?

It's because the second model explicitly includes the weights:

WLS: y ~ 1 + x with weights = w
WLS2: sqrt(w) * y ~ sqrt(w) + sqrt(w) * x with weights = 1

回复收藏 0 原文

~没有更多了~

关于作者

离去的眼神

暂无简介

文章

496 人气

关注发私信

友情链接

文江博客

与 R 的“残差标准误差”不一致（以 lm 为单位）如果是 WLS

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

与 R 的“残差标准误差”不一致（以 lm 为单位）如果是 WLS

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。