如何将矩阵列的列作为R中的线性回归中的预测值？

发布于 2025-01-31 09:24:32 字数 999 浏览 2 评论 0原文

问题陈述：可以通过数据（汽油，package =“ pls”）找到60种汽油和相应辛烷值的一些近红外光谱。计算计算的平均值每个频率并使用问题4中的五种不同方法预测最佳模型的响应。

注意：这是练习11.5，在 r，r，第二版，的线性模型中朱利安遥远。同样，“来自问题4的五种不同方法”是：具有所有预测指标的线性回归，使用AIC，主成分回归，部分最小二乘和脊回归选择的变量的线性回归。

到目前为止，我的工作：我们

require(pls)
data(gasoline, package="pls")
test_index = seq(1,nrow(gasoline),10)
train_index = 1:nrow(gasoline)
train_index = train_index[!train_index %in% test_index]
train_gas = gasoline[train_index,]
test_gas = gasoline[test_index,]
lmod = lm(octane~NIR,train_gas)

到目前为止做得很好。但是，如果我查看模型的摘要，我会发现348个系数未因奇异性而定义。（为什么？）此外，将NIR矩阵（预测变量）的列的平均值变成可接受的数据框架。

我的问题：如何到达高度曲线的预测函数将使我做这样的事情：

new_data = apply(train_gas$NIR, 2, mean)
*some code here*
predict(lmod, new_data)

？

顺便说一句，正如我对统计数据的大量调节一样，我可以积极地断言这个问题将在Stats.se上被关闭。这是一个“编程或数据请求”，因此在Stats.se上不受欢迎。

我还查找了一些相关问题，但似乎完全不合适。

原文

Problem Statement: Some near infrared spectra on 60 samples of gasoline and corresponding octane numbers can be found by data(gasoline, package="pls"). Compute the mean value for each frequency and predict the response for the best model using the five different methods from Question 4.

Note: This is Exercise 11.5 in Linear Models with R, 2nd Ed., by Julian Faraway. Also, the "five different methods from Question 4" are: linear regression with all predictors, linear regression with variables selected using AIC, principal component regression, partial least squares, and ridge regression.

My Work So Far: We do

require(pls)
data(gasoline, package="pls")
test_index = seq(1,nrow(gasoline),10)
train_index = 1:nrow(gasoline)
train_index = train_index[!train_index %in% test_index]
train_gas = gasoline[train_index,]
test_gas = gasoline[test_index,]
lmod = lm(octane~NIR,train_gas)

So far, so good. However, if I look at a summary of the model, I find that 348 coefficients are not defined because of singularities. (Why?) Moreover, massaging the mean values of the columns of the NIR matrix (the predictors) into an acceptable data frame is proving difficult.

My Question: How can I get to the point where the highly-fussy predict function will let me do something like this:

new_data = apply(train_gas$NIR, 2, mean)
*some code here*
predict(lmod, new_data)

Incidentally, as I have done a significant amount of moderating on Stats.SE, I can assert positively that this question would be closed on Stats.SE as being off-topic. It's a "programming or data request", and hence unwelcome on Stats.SE.

I have also looked up a few related questions on SO, but nothing seems to fit exactly.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小ぇ时光︴ 2025-02-07 09:24:32

这看起来确实很漂亮 crossvalidated 对我... '（元素）是一个401列矩阵：

data.frame':    60 obs. of  2 variables:
 $ octane: num  85.3 85.2 88.5 83.4 87.9 ...
 $ NIR   : 'AsIs' num [1:60, 1:401] -0.0502 -0.0442 -0.0469 -0.0467 -0.0509 ...

但是，基本问题是这是一个p＆gt;＆gt; n问题；有60个观察结果和401个预测因子。因此，标准线性回归可能不会有意义 - 您可能想使用诸如lasso/ridge（即，glmnet）之类的惩罚方法。（NCOLS + 1）...）

这就是为什么您从60个观察值中获得未定义的系数（如果没有某种惩罚，您就无法估计402个系数我们可以执行线性模型和预测（但是不明显）：

NIR <- gasoline$NIR
class(NIR) <- "matrix" ## override "AsIs" class
g2 <- data.frame(octane = gasoline$octane, NIR)
dim(g2) ## 60 402 - now this is a 'regular' data frame

## using train_index from above
train_gas <- g2[train_index,]
lmod = lm(octane~., train_gas)
## drop first column (response); use `lapply()` to maintain list structure
new_data <- as.data.frame(lapply(train_gas[-1], mean))
predict(lmod, new_data)
##        1 
## 87.16019 
## Warning message:
## In predict.lm(lmod, new_data) :
##   prediction from a rank-deficient fit may be misleading

a 稍微更直接的方法（但仍然丑陋）是将模型拟合到原始怪异结构并构造与该模型相匹配的预测框架怪异的结构，即，

pp <- data.frame(NIR=I(matrix(colMeans(train_gas$NIR), nrow = 1)))

如果您愿意放弃preditive（）您可以这样做：

sum(na.omit(coef(lmod) * c(1, colMeans(train_gas$NIR))))

This does seem pretty CrossValidated-ish to me ... gasoline is a rather odd object, containing a 'column' (element) that is a 401-column matrix:

data.frame':    60 obs. of  2 variables:
 $ octane: num  85.3 85.2 88.5 83.4 87.9 ...
 $ NIR   : 'AsIs' num [1:60, 1:401] -0.0502 -0.0442 -0.0469 -0.0467 -0.0509 ...

However, the fundamental problem is that this is a p>>n problem; there are 60 observations and 401 predictors. Thus, a standard linear regression probably just won't make sense - you probably want to use a penalized approach like LASSO/ridge (i.e., glmnet). This is why you get the undefined coefficients (without some kind of penalization, you can't estimate 402 coefficients (ncols + 1 for the intercept) from 60 observations ...)

However, if we do want to hack this into a shape where we can do the linear model and prediction (however ill-advised):

NIR <- gasoline$NIR
class(NIR) <- "matrix" ## override "AsIs" class
g2 <- data.frame(octane = gasoline$octane, NIR)
dim(g2) ## 60 402 - now this is a 'regular' data frame

## using train_index from above
train_gas <- g2[train_index,]
lmod = lm(octane~., train_gas)
## drop first column (response); use `lapply()` to maintain list structure
new_data <- as.data.frame(lapply(train_gas[-1], mean))
predict(lmod, new_data)
##        1 
## 87.16019 
## Warning message:
## In predict.lm(lmod, new_data) :
##   prediction from a rank-deficient fit may be misleading

A slightly more direct approach (but still ugly) is to fit the model to the original weird structure and construct a prediction frame that matches that weird structure, i.e.

pp <- data.frame(NIR=I(matrix(colMeans(train_gas$NIR), nrow = 1)))

If you were willing to forgo predict() you could do it like this:

sum(na.omit(coef(lmod) * c(1, colMeans(train_gas$NIR))))

回复收藏 0 原文

~没有更多了~

关于作者

牵你手

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

如何将矩阵列的列作为R中的线性回归中的预测值？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如何将矩阵列的列作为R中的线性回归中的预测值？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。