当预测值没有方差时，为什么 lm 返回值？

发布于 2025-01-05 01:35:49 字数 706 浏览 1 评论 0原文

考虑以下 R 代码（我认为它最终会调用一些 Fortran）：

X <- 1:1000
Y <- rep(1,1000)
summary(lm(Y~X))

为什么摘要返回值？由于 Y 没有方差，这个模型是否应该无法拟合？更重要的是，为什么模型 R^2 ~= .5？

编辑

我跟踪了从 lm 到 lm.fit 的代码，可以看到这个调用：

z <- .Fortran("dqrls", qr = x, n = n, p = p, y = y, ny = ny,
   tol = as.double(tol), coefficients = mat.or.vec(p, ny), residuals = y,
   effects = y, rank = integer(1L), pivot = 1L:p, qraux = double(p),
   work = double(2 * p), PACKAGE = "base")

这就是实际拟合发生的地方。查看 http://svn.r-project.org/R /trunk/src/appl/dqrls.f）并没有帮助我理解发生了什么，因为我不懂fortran。

原文

Consider the following R code (which, I think, eventually calls some Fortran):

X <- 1:1000
Y <- rep(1,1000)
summary(lm(Y~X))

Why are values returned by summary? Shouldn't this model fail to fit since there is no variance in Y? More importantly, why is the model R^2 ~= .5?

Edit

I tracked the code from lm to lm.fit and can see this call:

z <- .Fortran("dqrls", qr = x, n = n, p = p, y = y, ny = ny,
   tol = as.double(tol), coefficients = mat.or.vec(p, ny), residuals = y,
   effects = y, rank = integer(1L), pivot = 1L:p, qraux = double(p),
   work = double(2 * p), PACKAGE = "base")

That is where the actual fit seems to happen. Looking at http://svn.r-project.org/R/trunk/src/appl/dqrls.f) did not help me understand what is going on, because I do not know fortran.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无风消散 2025-01-12 01:35:49

从统计学上来说，我们应该预期什么（我想说“预期”，但这是一个非常具体的术语;-)）？系数应该是（0,1），而不是“无法拟合”。假设 (X,Y) 的协方差与 X 的方差成正比，而不是相反。由于 X 具有非零方差，因此没有问题。由于协方差为 0，X 的估计系数应为 0。因此，在机器容差范围内，这就是您得到的答案。

这里不存在统计异常。可能存在统计上的误解。还有机器容差的问题，但考虑到预测变量和响应值的规模，1E-19 数量级的系数可以忽略不计。

更新 1：可以在此维基百科页面上找到简单线性回归的快速回顾。需要注意的关键是 Var(x) 位于分母中，Cov(x,y) 位于分子中。在本例中，分子为 0，分母非零，因此没有理由期待 NaN 或 NA。然而，有人可能会问为什么 x 的结果系数不是 0，这与 QR 分解的数值精度问题有关。