R错误:线性回归模型预测和冗余的错误
R新手在这里。我正在研究一个项目,以评估因性别分层的感知压力是否存在差异(男性= 0,女性= 1)。我同时学习统计信息和代码,所以我认为我的代码有一些冗余。我正在使用收入,教育和活动水平的协变量来建立预测模型。
数据集的标题为数据。性别是(0/1),感知压力为(0-20,被视为连续),收入(4个类别(编码1-4),教育为(0/1),活动水平为量表(0-5 可以通过两个样本测试来评估性别组的意义
我有一个单独的代码, 代码是 错误: 分配的数据预测(完整,数据= new,Na.omit = true)
必须与现有数据兼容。 X现有数据有2653行。 X分配的数据有2243行。 仅尺寸1的向量被回收。 backtrace:
- base ::
$< -
(*tmp*
,lmprediction,value =< dbl; dbl>
) - tibble
< fn>
(< vctrs ____>
)
如何调整它以运行线性预测?另外,我知道我忘记了一些东西,所以如果您发现任何错误,缺失或多余的东西,请告诉我!谢谢!
Data sample: tibble 6x6
age Income HSgrad activeIndex perceivedStress gender
<dbl> <dbl> <dbl> <dbl> <fct> <dbl>
1 63.4 1 0 1.75 12 0
2 56.0 3 1 2 7 1
3 56.5 4 1 2.75 0 1
4 40.0 2 1 2.75 9 1
5 47.7 2 0 1 10 1
6 68.1 NA 0 2.5 0 0
gender<- ifelse(dfJHS$sex=="Male",0,1)
dfJHS$gender <- gender
View(dfJHS)
Data<-dfJHS %>% select(-sex)
View(Data)
dim(Data)
Data$perceivedStress <- factor(Data$perceivedStress)
#Remove NA
Data %>% drop_na()
Data[complete.cases(Data),]
#section with data visualizations you probably won't need for this (lots of histograms, shapiro test, and a qq plot)
#check linear fit for two primary variables and perform linear regression.
model <-lm(perceivedStress ~ gender, data = Data)
summary(model)
#Checking this data meets assumptions for a linear regression.
aug <- augment(model)
resids <- residuals(model)
fitted <- fitted(model)
#Convert primary dependent variable to factor for analysis
Data$perceivedStress <- factor(Data$perceivedStress)
#check linear fit for two primary variables and perform linear regression.
model <-lm(perceivedStress ~ gender, data = Data)
summary(model)
#Checking this data meets assumptions for a linear regression.
aug <- augment(model)
resids <- residuals(model)
fitted <- fitted(model)
##Assumption 1: Residuals Normally Distributed
ggplot(aug) + geom_histogram(aes(x=.resid),
bins=15)
ggplot(aug) + geom_qq(aes(sample=.resid))
##Assumption 2: Homoscedasticity
ggplot(aug) + geom_point(aes(x=.fitted, y=.resid)) +
geom_hline(yintercept=0, lty=2)+
theme_bw()
##Assumption 4: Linear Relationship
ggplot(aug, aes(x=gender, y=perceivedStress)) + geom_point()+
geom_smooth(method = "lm", se = FALSE)+
theme_bw()
#determine if glm is better fit - No notable differences due to no change in complexity.
mod <- glm(perceivedStress ~ gender, data=Data)
summary(survive_age)
summary(mod)
aug <- augment(mod)
resids <- residuals(mod)
fitted <- fitted(mod)
## Assumption 1: Residuals Normally Distributed
ggplot(aug) + geom_histogram(aes(x=.resid),
bins=15)
ggplot(aug) + geom_qq(aes(sample=.resid))
## Assumption 2: Homoscedasticity
ggplot(aug) + geom_point(aes(x=.fitted, y=.resid)) +
geom_hline(yintercept=0, lty=2)+
theme_bw()
## Assumption 4: Linear Relationship
ggplot(aug, aes(x=gender, y=perceivedStress)) +
geom_point()+
geom_smooth(method = "lm", se = FALSE)+
theme_bw()
#ERROR OCCURS IN THIS CHUNK
#Confirm Model works using predictions and Model Specification
new<-(Data$gender=1)
full <- lm(formula = as.numeric(perceivedStress) ~ gender*age*Income*HSgrad, data=Data)
full
Data$lmprediction<- predict(full, Data = new, na.omit=TRUE)
var<-Data$perceivedStress
Data$lmprediction<- predict(full, Subset)
rmse2 <- function(x=gender, y=perceivedStress, data=Data, na.rm = TRUE){
res <- sqrt(mean((Data$gender-Data$perceivedStress)^2, na.rm = TRUE))
return(res)}
#observed RMSE of full model
rmse2(x=gender, y=lmprediction, data=Data)
#test other models
model1 <- lm(formula = perceivedStress~., data=Data)
model1
#Total models include model(only perceivedStress and gender), mod(.), and full(interactions).
#Model validation through backwards selection
aic.backwards <- step(full, trace=TRUE)
glance(aic.backwards)
tidy(aic.backwards)
R Novice here. I'm working on a project to evaluate if there is a difference in perceived stress as stratified by gender (Male=0,Female=1). I'm simultaneously learning the statistics and the code, so I think there's some redundancy in my code. I was using the covariates of income, education, and activity levels to build a predictive model.
The data set is titled Data. Gender is (0/1), perceived stress is (0-20, treated as continuous), income (4 categories(coded 1-4), education is (0/1), and activity levels is a scale(0-5). I have a separate code to evaluate the perceived stress mean by gender groups via two sample t test. I'm also working on a regression model. I believe linear regression is correct here, but I'm having some issues.
The error code is
Error:
Assigned data predict(full, Data = new, na.omit = TRUE)
must be compatible with existing data.
x Existing data has 2653 rows.
x Assigned data has 2243 rows.
Only vectors of size 1 are recycled.
Backtrace:
- base::
$<-
(*tmp*
, lmprediction, value =<dbl>
) - tibble
<fn>
(<vctrs___>
)
How can I adjust this to run the linear prediction? Also, I know I forgot something, so if you notice anything wrong, missing, or redundant, please let me know! Thanks!
Data sample: tibble 6x6
age Income HSgrad activeIndex perceivedStress gender
<dbl> <dbl> <dbl> <dbl> <fct> <dbl>
1 63.4 1 0 1.75 12 0
2 56.0 3 1 2 7 1
3 56.5 4 1 2.75 0 1
4 40.0 2 1 2.75 9 1
5 47.7 2 0 1 10 1
6 68.1 NA 0 2.5 0 0
gender<- ifelse(dfJHS$sex=="Male",0,1)
dfJHS$gender <- gender
View(dfJHS)
Data<-dfJHS %>% select(-sex)
View(Data)
dim(Data)
Data$perceivedStress <- factor(Data$perceivedStress)
#Remove NA
Data %>% drop_na()
Data[complete.cases(Data),]
#section with data visualizations you probably won't need for this (lots of histograms, shapiro test, and a qq plot)
#check linear fit for two primary variables and perform linear regression.
model <-lm(perceivedStress ~ gender, data = Data)
summary(model)
#Checking this data meets assumptions for a linear regression.
aug <- augment(model)
resids <- residuals(model)
fitted <- fitted(model)
#Convert primary dependent variable to factor for analysis
Data$perceivedStress <- factor(Data$perceivedStress)
#check linear fit for two primary variables and perform linear regression.
model <-lm(perceivedStress ~ gender, data = Data)
summary(model)
#Checking this data meets assumptions for a linear regression.
aug <- augment(model)
resids <- residuals(model)
fitted <- fitted(model)
##Assumption 1: Residuals Normally Distributed
ggplot(aug) + geom_histogram(aes(x=.resid),
bins=15)
ggplot(aug) + geom_qq(aes(sample=.resid))
##Assumption 2: Homoscedasticity
ggplot(aug) + geom_point(aes(x=.fitted, y=.resid)) +
geom_hline(yintercept=0, lty=2)+
theme_bw()
##Assumption 4: Linear Relationship
ggplot(aug, aes(x=gender, y=perceivedStress)) + geom_point()+
geom_smooth(method = "lm", se = FALSE)+
theme_bw()
#determine if glm is better fit - No notable differences due to no change in complexity.
mod <- glm(perceivedStress ~ gender, data=Data)
summary(survive_age)
summary(mod)
aug <- augment(mod)
resids <- residuals(mod)
fitted <- fitted(mod)
## Assumption 1: Residuals Normally Distributed
ggplot(aug) + geom_histogram(aes(x=.resid),
bins=15)
ggplot(aug) + geom_qq(aes(sample=.resid))
## Assumption 2: Homoscedasticity
ggplot(aug) + geom_point(aes(x=.fitted, y=.resid)) +
geom_hline(yintercept=0, lty=2)+
theme_bw()
## Assumption 4: Linear Relationship
ggplot(aug, aes(x=gender, y=perceivedStress)) +
geom_point()+
geom_smooth(method = "lm", se = FALSE)+
theme_bw()
#ERROR OCCURS IN THIS CHUNK
#Confirm Model works using predictions and Model Specification
new<-(Data$gender=1)
full <- lm(formula = as.numeric(perceivedStress) ~ gender*age*Income*HSgrad, data=Data)
full
Data$lmprediction<- predict(full, Data = new, na.omit=TRUE)
var<-Data$perceivedStress
Data$lmprediction<- predict(full, Subset)
rmse2 <- function(x=gender, y=perceivedStress, data=Data, na.rm = TRUE){
res <- sqrt(mean((Data$gender-Data$perceivedStress)^2, na.rm = TRUE))
return(res)}
#observed RMSE of full model
rmse2(x=gender, y=lmprediction, data=Data)
#test other models
model1 <- lm(formula = perceivedStress~., data=Data)
model1
#Total models include model(only perceivedStress and gender), mod(.), and full(interactions).
#Model validation through backwards selection
aic.backwards <- step(full, trace=TRUE)
glance(aic.backwards)
tidy(aic.backwards)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论