PCA 分数对原始变量的最小二乘拟合

发布于 2024-10-31 05:53:43 字数 467 浏览 0 评论 0原文

我有 100 个变量,我想使用变量 var15-v25 进行因子分析。为此,我首先将变量提取到另一个对象中(例如 f),&然后进行主成分分析。

现在我想将 PCA 分数与原始数据集合并,以使用 PCA 分数作为预测变量来运行回归。

有人可以建议我合并这两个数据集的方法吗?我使用的代码如下:

spss_data_factor <- sqldf("SELECT Respondent_Serial,Q4_01_Q4,Q4_02_Q4,Q4_03_Q4,Q4_04_Q4,Q4_05_Q4,Q4_06_Q4,Q4_07_Q4,Q4_08_Q4,Q4_09_Q4,Q4_10_Q4 FROM spss_data_rel")
f <- princomp(spss_data_factor1, cor = TRUE)
summary(f, loadings=TRUE)
f$scores[, 1:5]

I have 100 vars, and I want to do factor analysis using variables var15-v25. To do that first I extracted the variables into another object (say f), & then run the principal component analysis.

Now I want to merge PCA scores with the original dataset to run regression using PCA scores as predictors.

Can anybody please suggest me the method to merge these 2 datasets. The code I used are the following:

spss_data_factor <- sqldf("SELECT Respondent_Serial,Q4_01_Q4,Q4_02_Q4,Q4_03_Q4,Q4_04_Q4,Q4_05_Q4,Q4_06_Q4,Q4_07_Q4,Q4_08_Q4,Q4_09_Q4,Q4_10_Q4 FROM spss_data_rel")
f <- princomp(spss_data_factor1, cor = TRUE)
summary(f, loadings=TRUE)
f$scores[, 1:5]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

坏尐絯℡ 2024-11-07 05:53:43

请避免使用 R 基础包中的名称 - factor 是一种保留。它会工作得很好,但它可能会在开发的某个时刻让您感到困惑...并且您的 factor 不是一个文件,它是 princomp 类的 R 对象。

不管怎样,您想定义一个以因子得分作为预测变量的回归模型吗?小菜一碟...并且不需要合并:

fa <- princomp(mtcars, cor=TRUE)
fa_scores <- fa$scores
fit <- lm(mtcars$hp ~ fa_scores)
summary(fit)

Call:
lm(formula = mtcars$hp ~ fa_scores)

Residuals:
       Min         1Q     Median         3Q        Max 
-2.521e-14 -7.825e-15 -2.416e-15  5.622e-15  4.329e-14 

Coefficients:
                   Estimate Std. Error    t value Pr(>|t|)    
(Intercept)       1.467e+02  2.862e-15  5.125e+16   <2e-16 ***
fa_scoresComp.1  -2.227e+01  1.113e-15 -2.000e+16   <2e-16 ***
fa_scoresComp.2  -1.679e+01  1.758e-15 -9.549e+15   <2e-16 ***
fa_scoresComp.3   9.449e+00  3.614e-15  2.614e+15   <2e-16 ***
fa_scoresComp.4  -4.567e+00  5.513e-15 -8.285e+14   <2e-16 ***
fa_scoresComp.5  -3.644e+01  6.055e-15 -6.019e+15   <2e-16 ***
fa_scoresComp.6  -4.821e+00  6.222e-15 -7.747e+14   <2e-16 ***
fa_scoresComp.7  -1.010e-01  7.783e-15 -1.297e+13   <2e-16 ***
fa_scoresComp.8   1.501e+01  8.164e-15  1.838e+15   <2e-16 ***
fa_scoresComp.9  -3.886e+01  1.031e-14 -3.768e+15   <2e-16 ***
fa_scoresComp.10  1.672e+01  1.255e-14  1.333e+15   <2e-16 ***
fa_scoresComp.11 -1.731e+01  1.928e-14 -8.979e+14   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 1.619e-14 on 20 degrees of freedom
Multiple R-squared:     1,  Adjusted R-squared:     1 
F-statistic: 5.053e+31 on 11 and 20 DF,  p-value: < 2.2e-16 

您可能还希望将原始数据集转换为矩阵,以便在响应矩阵的每一列上执行 ncol(mtcars) 回归。 lm函数支持response ~ terms公式,其中response可以是一个矩阵。请参阅?lm

如果响应是矩阵,则线性模型
通过最小二乘法分别拟合
到矩阵的每一列。

所以,你可以做这样的事情:

fit2 <- lm(as.matrix(mtcars) ~ fa_scores)
summary(fit2) # handle with care! =)

我希望这对你有帮助......


无论如何,如果你想进行因子分析,请参阅此链接。您应该安装 William Revelle 的 psych 软件包。

Please avoid using names from R base packages - factor is kind of reserved. It will work just fine, but it may confuse you at some point of development... And your factor is not a file, it's a R object of princomp class.

Anyway, you want to define a regression model with factor scores as predictors? Piece of cake... and no merging is required:

fa <- princomp(mtcars, cor=TRUE)
fa_scores <- fa$scores
fit <- lm(mtcars$hp ~ fa_scores)
summary(fit)

Call:
lm(formula = mtcars$hp ~ fa_scores)

Residuals:
       Min         1Q     Median         3Q        Max 
-2.521e-14 -7.825e-15 -2.416e-15  5.622e-15  4.329e-14 

Coefficients:
                   Estimate Std. Error    t value Pr(>|t|)    
(Intercept)       1.467e+02  2.862e-15  5.125e+16   <2e-16 ***
fa_scoresComp.1  -2.227e+01  1.113e-15 -2.000e+16   <2e-16 ***
fa_scoresComp.2  -1.679e+01  1.758e-15 -9.549e+15   <2e-16 ***
fa_scoresComp.3   9.449e+00  3.614e-15  2.614e+15   <2e-16 ***
fa_scoresComp.4  -4.567e+00  5.513e-15 -8.285e+14   <2e-16 ***
fa_scoresComp.5  -3.644e+01  6.055e-15 -6.019e+15   <2e-16 ***
fa_scoresComp.6  -4.821e+00  6.222e-15 -7.747e+14   <2e-16 ***
fa_scoresComp.7  -1.010e-01  7.783e-15 -1.297e+13   <2e-16 ***
fa_scoresComp.8   1.501e+01  8.164e-15  1.838e+15   <2e-16 ***
fa_scoresComp.9  -3.886e+01  1.031e-14 -3.768e+15   <2e-16 ***
fa_scoresComp.10  1.672e+01  1.255e-14  1.333e+15   <2e-16 ***
fa_scoresComp.11 -1.731e+01  1.928e-14 -8.979e+14   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 1.619e-14 on 20 degrees of freedom
Multiple R-squared:     1,  Adjusted R-squared:     1 
F-statistic: 5.053e+31 on 11 and 20 DF,  p-value: < 2.2e-16 

You may also want to convert original dataset to matrix, in order to carry out ncol(mtcars) regressions, on each column of response matrix. lm function supports response ~ terms formula, where response can be a matrix. See ?lm:

If response is a matrix a linear model
is fitted separately by least-squares
to each column of the matrix.

So, you can do something like this:

fit2 <- lm(as.matrix(mtcars) ~ fa_scores)
summary(fit2) # handle with care! =)

I hope that this was helpful...


Anyway, if you want to perform a factor analysis, please see this link. You should install William Revelle's psych package.

深海夜未眠 2024-11-07 05:53:43

谢谢你aL3xa!我找到了解决方案的答案。我把它放在这里,因为有人可能会发现它有帮助。

## Factor Analysis
library(psych)
spss_data_fac=read.csv("D:\\Arijit\\spss_data_rel_01.csv")
fa.parallel(spss_data_fac[,40:49])
spss_data_fac_01=factanal(spss_data_fac[,40:49],factors=2,scores="regression",rotation="promax")
spss_data_fac_01$scores
## Factor Analysis factors are used for logistic regression
spss_dat_reg=glm(spss_data_fac$Q8~spss_data_fac_01$scores+spss_data_fac$Q14)
summary(spss_dat_reg)

问候,
一个

Thank you aL3xa! I found the answer of the solution. I'm putting it here as somebody might find it helpful.

## Factor Analysis
library(psych)
spss_data_fac=read.csv("D:\\Arijit\\spss_data_rel_01.csv")
fa.parallel(spss_data_fac[,40:49])
spss_data_fac_01=factanal(spss_data_fac[,40:49],factors=2,scores="regression",rotation="promax")
spss_data_fac_01$scores
## Factor Analysis factors are used for logistic regression
spss_dat_reg=glm(spss_data_fac$Q8~spss_data_fac_01$scores+spss_data_fac$Q14)
summary(spss_dat_reg)

Regards,
A

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文