R通过lapply命令从乘法回归中提取回归系数
我有一个包含多个变量的大型数据集,其中一个是状态变量,每个状态编码为 1-50。我想对数据集的其余 27 个变量(总共 55 个变量)运行 28 个变量的回归,并且针对每个州。
换句话说,对 covariate1、covariate2、...、covariate27 运行变量 1 的回归,以获取状态 == 1 的观察结果。然后,我想对变量 1 的状态 2-50 重复此操作,并对变量 2、变量 3、...、变量 28 重复整个过程。
我认为我已经编写了正确的 R 代码来执行此操作,但接下来我想做的是提取系数,最好提取到系数矩阵中。有人可以帮我解决这个问题吗?这是我到目前为止编写的代码:
for (num in 1:50) {
#PUF is the data set I'm using
#Subset the data by states
PUFnum <- subset(PUF, state==num)
#Attach data set with state specific data
attach(PUFnum)
#Run our prediction regression
#the variables class1 through e19700 are the 27 covariates I want to use
regression <- lapply(PUFnum, function(z) lm(z ~ class1+class2+class3+class4+class5+class6+class7+
xtot+e00200+e00300+e00600+e00900+e01000+p04470+e04800+
e09600+e07180+e07220+e07260+e06500+e10300+
e59720+e11900+e18425+e18450+e18500+e19700))
Beta <- lapply(regression, function(d) d<- coef(regression$d))
detach(PUFnum)
}
I have a large dataset with several variables, one of which is a state variable, coded 1-50 for each state. I'd like to run a regression of 28 variables on the remaining 27 variables of the dataset (there are 55 variables total), and specific for each state.
In other words, run a regression of variable1 on covariate1, covariate2, ..., covariate27 for observations where state==1. I'd then like to repeat this for variable1 for states 2-50, and the repeat the whole process for variable2, variable3,..., variable28.
I think I've written the correct R code to do this, but the next thing I'd like to do is extract the coefficients, ideally into a coefficient matrix. Could someone please help me with this? Here's the code I've written so far:
for (num in 1:50) {
#PUF is the data set I'm using
#Subset the data by states
PUFnum <- subset(PUF, state==num)
#Attach data set with state specific data
attach(PUFnum)
#Run our prediction regression
#the variables class1 through e19700 are the 27 covariates I want to use
regression <- lapply(PUFnum, function(z) lm(z ~ class1+class2+class3+class4+class5+class6+class7+
xtot+e00200+e00300+e00600+e00900+e01000+p04470+e04800+
e09600+e07180+e07220+e07260+e06500+e10300+
e59720+e11900+e18425+e18450+e18500+e19700))
Beta <- lapply(regression, function(d) d<- coef(regression$d))
detach(PUFnum)
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是经典
Split-Apply-Combine
问题的另一个示例,可以使用 @hadley 的plyr
包来解决。在您的问题中,您希望我将使用
MASS
库中提供的Cars93
数据集来说明它。我们有兴趣根据国家/地区的原产地
找出马力
和引擎大小
之间的关系。编辑。对于您的示例,将
PUF
替换为Cars93
,将state
替换为Origin
,将fm
替换为公式This is another example of the classic
Split-Apply-Combine
problem, which can be addressed using theplyr
package by @hadley. In your problem, you want toI will illustrate it with the
Cars93
dataset available inMASS
library. We are interested in figuring out the relationship betweenhorsepower
andenginesize
based onorigin
of country.EDIT. For your example, substitute
PUF
forCars93
,state
forOrigin
andfm
for the formula我已经稍微清理了你的代码:
如果你愿意,你甚至可以将这一切放在一行中:
I've cleaned up your code slightly:
If you wanted, you could even put this all in one line: