R错误:线性回归模型预测和冗余的错误

发布于 2025-01-26 14:00:40 字数 4264 浏览 2 评论 0原文

R新手在这里。我正在研究一个项目,以评估因性别分层的感知压力是否存在差异(男性= 0,女性= 1)。我同时学习统计信息和代码,所以我认为我的代码有一些冗余。我正在使用收入,教育和活动水平的协变量来建立预测模型。

数据集的标题为数据。性别是(0/1),感知压力为(0-20,被视为连续),收入(4个类别(编码1-4),教育为(0/1),活动水平为量表(0-5 可以通过两个样本测试来评估性别组的意义

我有一个单独的代码, 代码是 错误: 分配的数据预测(完整,数据= new,Na.omit = true)必须与现有数据兼容。 X现有数据有2653行。 X分配的数据有2243行。 仅尺寸1的向量被回收。 backtrace:

  1. base :: $< -*tmp*,lmprediction,value = < dbl; dbl>
  2. tibble < fn>< vctrs ____>

如何调整它以运行线性预测?另外,我知道我忘记了一些东西,所以如果您发现任何错误,缺失或多余的东西,请告诉我!谢谢!

Data sample: tibble 6x6
age Income HSgrad activeIndex perceivedStress gender
  <dbl>  <dbl>  <dbl>       <dbl> <fct>          <dbl>
1  63.4      1      0        1.75 12             0
2  56.0      3      1        2    7              1
3  56.5      4      1        2.75 0              1
4  40.0      2      1        2.75 9              1
5  47.7      2      0        1    10             1
6  68.1     NA      0        2.5  0              0


   gender<- ifelse(dfJHS$sex=="Male",0,1)
dfJHS$gender <- gender
View(dfJHS)
Data<-dfJHS %>% select(-sex)
View(Data)
dim(Data)
Data$perceivedStress <- factor(Data$perceivedStress)
#Remove NA
Data %>% drop_na()
Data[complete.cases(Data),]
#section with data visualizations you probably won't need for this (lots of histograms, shapiro test, and a qq plot)

 #check linear fit for two primary variables and perform  linear regression.
model <-lm(perceivedStress ~ gender, data = Data)
summary(model)

#Checking this data meets assumptions for a linear regression.
aug <- augment(model)
resids <- residuals(model)
fitted <- fitted(model)
#Convert primary dependent variable to factor for analysis
Data$perceivedStress <- factor(Data$perceivedStress)
#check linear fit for two primary variables and perform  linear regression.
model <-lm(perceivedStress ~ gender, data = Data)
summary(model)

#Checking this data meets assumptions for a linear regression.
aug <- augment(model)
resids <- residuals(model)
fitted <- fitted(model)
##Assumption 1: Residuals Normally Distributed
ggplot(aug) + geom_histogram(aes(x=.resid),
                             bins=15)
ggplot(aug) + geom_qq(aes(sample=.resid))
  
##Assumption 2: Homoscedasticity
ggplot(aug) + geom_point(aes(x=.fitted, y=.resid)) +
  geom_hline(yintercept=0, lty=2)+
  theme_bw()
##Assumption 4: Linear Relationship
ggplot(aug, aes(x=gender, y=perceivedStress)) + geom_point()+
  geom_smooth(method = "lm", se = FALSE)+
  theme_bw()

#determine if glm is better fit - No notable differences due to no change in complexity.
mod <- glm(perceivedStress ~ gender, data=Data)
summary(survive_age)
summary(mod)
aug <- augment(mod)
resids <- residuals(mod)
fitted <- fitted(mod)
## Assumption 1: Residuals Normally Distributed
ggplot(aug) + geom_histogram(aes(x=.resid),
                             bins=15)
ggplot(aug) + geom_qq(aes(sample=.resid))

## Assumption 2: Homoscedasticity
ggplot(aug) + geom_point(aes(x=.fitted, y=.resid)) +
  geom_hline(yintercept=0, lty=2)+
  theme_bw()

## Assumption 4: Linear Relationship
ggplot(aug, aes(x=gender, y=perceivedStress)) + 
  geom_point()+
  geom_smooth(method = "lm", se = FALSE)+
  theme_bw()

#ERROR OCCURS IN THIS CHUNK
#Confirm Model works using predictions and Model Specification
new<-(Data$gender=1)
full <- lm(formula = as.numeric(perceivedStress) ~ gender*age*Income*HSgrad, data=Data)
full
Data$lmprediction<- predict(full, Data = new, na.omit=TRUE)
var<-Data$perceivedStress
Data$lmprediction<- predict(full, Subset)


rmse2 <- function(x=gender, y=perceivedStress, data=Data, na.rm = TRUE){
  res <- sqrt(mean((Data$gender-Data$perceivedStress)^2, na.rm = TRUE))
  return(res)}
#observed RMSE of full model
rmse2(x=gender, y=lmprediction, data=Data)
#test other models

model1 <- lm(formula = perceivedStress~., data=Data)
model1
#Total models include model(only perceivedStress and gender), mod(.), and full(interactions). 
#Model validation through backwards selection
aic.backwards <- step(full, trace=TRUE) 
glance(aic.backwards)
tidy(aic.backwards)

R Novice here. I'm working on a project to evaluate if there is a difference in perceived stress as stratified by gender (Male=0,Female=1). I'm simultaneously learning the statistics and the code, so I think there's some redundancy in my code. I was using the covariates of income, education, and activity levels to build a predictive model.

The data set is titled Data. Gender is (0/1), perceived stress is (0-20, treated as continuous), income (4 categories(coded 1-4), education is (0/1), and activity levels is a scale(0-5). I have a separate code to evaluate the perceived stress mean by gender groups via two sample t test. I'm also working on a regression model. I believe linear regression is correct here, but I'm having some issues.

The error code is
Error:
Assigned data predict(full, Data = new, na.omit = TRUE) must be compatible with existing data.
x Existing data has 2653 rows.
x Assigned data has 2243 rows.
Only vectors of size 1 are recycled.
Backtrace:

  1. base::$<-(*tmp*, lmprediction, value = <dbl>)
  2. tibble <fn>(<vctrs___>)

How can I adjust this to run the linear prediction? Also, I know I forgot something, so if you notice anything wrong, missing, or redundant, please let me know! Thanks!

Data sample: tibble 6x6
age Income HSgrad activeIndex perceivedStress gender
  <dbl>  <dbl>  <dbl>       <dbl> <fct>          <dbl>
1  63.4      1      0        1.75 12             0
2  56.0      3      1        2    7              1
3  56.5      4      1        2.75 0              1
4  40.0      2      1        2.75 9              1
5  47.7      2      0        1    10             1
6  68.1     NA      0        2.5  0              0


   gender<- ifelse(dfJHS$sex=="Male",0,1)
dfJHS$gender <- gender
View(dfJHS)
Data<-dfJHS %>% select(-sex)
View(Data)
dim(Data)
Data$perceivedStress <- factor(Data$perceivedStress)
#Remove NA
Data %>% drop_na()
Data[complete.cases(Data),]
#section with data visualizations you probably won't need for this (lots of histograms, shapiro test, and a qq plot)

 #check linear fit for two primary variables and perform  linear regression.
model <-lm(perceivedStress ~ gender, data = Data)
summary(model)

#Checking this data meets assumptions for a linear regression.
aug <- augment(model)
resids <- residuals(model)
fitted <- fitted(model)
#Convert primary dependent variable to factor for analysis
Data$perceivedStress <- factor(Data$perceivedStress)
#check linear fit for two primary variables and perform  linear regression.
model <-lm(perceivedStress ~ gender, data = Data)
summary(model)

#Checking this data meets assumptions for a linear regression.
aug <- augment(model)
resids <- residuals(model)
fitted <- fitted(model)
##Assumption 1: Residuals Normally Distributed
ggplot(aug) + geom_histogram(aes(x=.resid),
                             bins=15)
ggplot(aug) + geom_qq(aes(sample=.resid))
  
##Assumption 2: Homoscedasticity
ggplot(aug) + geom_point(aes(x=.fitted, y=.resid)) +
  geom_hline(yintercept=0, lty=2)+
  theme_bw()
##Assumption 4: Linear Relationship
ggplot(aug, aes(x=gender, y=perceivedStress)) + geom_point()+
  geom_smooth(method = "lm", se = FALSE)+
  theme_bw()

#determine if glm is better fit - No notable differences due to no change in complexity.
mod <- glm(perceivedStress ~ gender, data=Data)
summary(survive_age)
summary(mod)
aug <- augment(mod)
resids <- residuals(mod)
fitted <- fitted(mod)
## Assumption 1: Residuals Normally Distributed
ggplot(aug) + geom_histogram(aes(x=.resid),
                             bins=15)
ggplot(aug) + geom_qq(aes(sample=.resid))

## Assumption 2: Homoscedasticity
ggplot(aug) + geom_point(aes(x=.fitted, y=.resid)) +
  geom_hline(yintercept=0, lty=2)+
  theme_bw()

## Assumption 4: Linear Relationship
ggplot(aug, aes(x=gender, y=perceivedStress)) + 
  geom_point()+
  geom_smooth(method = "lm", se = FALSE)+
  theme_bw()

#ERROR OCCURS IN THIS CHUNK
#Confirm Model works using predictions and Model Specification
new<-(Data$gender=1)
full <- lm(formula = as.numeric(perceivedStress) ~ gender*age*Income*HSgrad, data=Data)
full
Data$lmprediction<- predict(full, Data = new, na.omit=TRUE)
var<-Data$perceivedStress
Data$lmprediction<- predict(full, Subset)


rmse2 <- function(x=gender, y=perceivedStress, data=Data, na.rm = TRUE){
  res <- sqrt(mean((Data$gender-Data$perceivedStress)^2, na.rm = TRUE))
  return(res)}
#observed RMSE of full model
rmse2(x=gender, y=lmprediction, data=Data)
#test other models

model1 <- lm(formula = perceivedStress~., data=Data)
model1
#Total models include model(only perceivedStress and gender), mod(.), and full(interactions). 
#Model validation through backwards selection
aic.backwards <- step(full, trace=TRUE) 
glance(aic.backwards)
tidy(aic.backwards)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文