error-handling r regression linear-regression

R错误：线性回归模型预测和冗余的错误

发布于 2025-01-26 14:00:40 字数 4264 浏览 2 评论 0原文

R新手在这里。我正在研究一个项目，以评估因性别分层的感知压力是否存在差异（男性= 0，女性= 1）。我同时学习统计信息和代码，所以我认为我的代码有一些冗余。我正在使用收入，教育和活动水平的协变量来建立预测模型。

数据集的标题为数据。性别是（0/1），感知压力为（0-20，被视为连续），收入（4个类别（编码1-4），教育为（0/1），活动水平为量表（0-5 可以通过两个样本测试来评估性别组的意义

我有一个单独的代码，代码是错误：分配的数据预测（完整，数据= new，Na.omit = true）必须与现有数据兼容。 X现有数据有2653行。 X分配的数据有2243行。仅尺寸1的向量被回收。 backtrace：

base :: $＆lt; -（*tmp*，lmprediction，value = ＆lt; dbl; dbl＆gt;）
tibble ＆lt; fn＆gt;（＆lt; vctrs ____＆gt;）

如何调整它以运行线性预测？另外，我知道我忘记了一些东西，所以如果您发现任何错误，缺失或多余的东西，请告诉我！谢谢！

Data sample: tibble 6x6
age Income HSgrad activeIndex perceivedStress gender
  <dbl>  <dbl>  <dbl>       <dbl> <fct>          <dbl>
1  63.4      1      0        1.75 12             0
2  56.0      3      1        2    7              1
3  56.5      4      1        2.75 0              1
4  40.0      2      1        2.75 9              1
5  47.7      2      0        1    10             1
6  68.1     NA      0        2.5  0              0


   gender<- ifelse(dfJHS$sex=="Male",0,1)
dfJHS$gender <- gender
View(dfJHS)
Data<-dfJHS %>% select(-sex)
View(Data)
dim(Data)
Data$perceivedStress <- factor(Data$perceivedStress)
#Remove NA
Data %>% drop_na()
Data[complete.cases(Data),]
#section with data visualizations you probably won't need for this (lots of histograms, shapiro test, and a qq plot)

 #check linear fit for two primary variables and perform  linear regression.
model <-lm(perceivedStress ~ gender, data = Data)
summary(model)

#Checking this data meets assumptions for a linear regression.
aug <- augment(model)
resids <- residuals(model)
fitted <- fitted(model)
#Convert primary dependent variable to factor for analysis
Data$perceivedStress <- factor(Data$perceivedStress)
#check linear fit for two primary variables and perform  linear regression.
model <-lm(perceivedStress ~ gender, data = Data)
summary(model)

#Checking this data meets assumptions for a linear regression.
aug <- augment(model)
resids <- residuals(model)
fitted <- fitted(model)
##Assumption 1: Residuals Normally Distributed
ggplot(aug) + geom_histogram(aes(x=.resid),
                             bins=15)
ggplot(aug) + geom_qq(aes(sample=.resid))
  
##Assumption 2: Homoscedasticity
ggplot(aug) + geom_point(aes(x=.fitted, y=.resid)) +
  geom_hline(yintercept=0, lty=2)+
  theme_bw()
##Assumption 4: Linear Relationship
ggplot(aug, aes(x=gender, y=perceivedStress)) + geom_point()+
  geom_smooth(method = "lm", se = FALSE)+
  theme_bw()

#determine if glm is better fit - No notable differences due to no change in complexity.
mod <- glm(perceivedStress ~ gender, data=Data)
summary(survive_age)
summary(mod)
aug <- augment(mod)
resids <- residuals(mod)
fitted <- fitted(mod)
## Assumption 1: Residuals Normally Distributed
ggplot(aug) + geom_histogram(aes(x=.resid),
                             bins=15)
ggplot(aug) + geom_qq(aes(sample=.resid))

## Assumption 2: Homoscedasticity
ggplot(aug) + geom_point(aes(x=.fitted, y=.resid)) +
  geom_hline(yintercept=0, lty=2)+
  theme_bw()

## Assumption 4: Linear Relationship
ggplot(aug, aes(x=gender, y=perceivedStress)) + 
  geom_point()+
  geom_smooth(method = "lm", se = FALSE)+
  theme_bw()

#ERROR OCCURS IN THIS CHUNK
#Confirm Model works using predictions and Model Specification
new<-(Data$gender=1)
full <- lm(formula = as.numeric(perceivedStress) ~ gender*age*Income*HSgrad, data=Data)
full
Data$lmprediction<- predict(full, Data = new, na.omit=TRUE)
var<-Data$perceivedStress
Data$lmprediction<- predict(full, Subset)


rmse2 <- function(x=gender, y=perceivedStress, data=Data, na.rm = TRUE){
  res <- sqrt(mean((Data$gender-Data$perceivedStress)^2, na.rm = TRUE))
  return(res)}
#observed RMSE of full model
rmse2(x=gender, y=lmprediction, data=Data)
#test other models

model1 <- lm(formula = perceivedStress~., data=Data)
model1
#Total models include model(only perceivedStress and gender), mod(.), and full(interactions). 
#Model validation through backwards selection
aic.backwards <- step(full, trace=TRUE) 
glance(aic.backwards)
tidy(aic.backwards)

原文

R Novice here. I'm working on a project to evaluate if there is a difference in perceived stress as stratified by gender (Male=0,Female=1). I'm simultaneously learning the statistics and the code, so I think there's some redundancy in my code. I was using the covariates of income, education, and activity levels to build a predictive model.

The data set is titled Data. Gender is (0/1), perceived stress is (0-20, treated as continuous), income (4 categories(coded 1-4), education is (0/1), and activity levels is a scale(0-5). I have a separate code to evaluate the perceived stress mean by gender groups via two sample t test. I'm also working on a regression model. I believe linear regression is correct here, but I'm having some issues.

The error code is
Error:
Assigned data predict(full, Data = new, na.omit = TRUE) must be compatible with existing data.
x Existing data has 2653 rows.
x Assigned data has 2243 rows.
Only vectors of size 1 are recycled.
Backtrace:

base::$<-(*tmp*, lmprediction, value = <dbl>)
tibble <fn>(<vctrs___>)

How can I adjust this to run the linear prediction? Also, I know I forgot something, so if you notice anything wrong, missing, or redundant, please let me know! Thanks!

Data sample: tibble 6x6
age Income HSgrad activeIndex perceivedStress gender
  <dbl>  <dbl>  <dbl>       <dbl> <fct>          <dbl>
1  63.4      1      0        1.75 12             0
2  56.0      3      1        2    7              1
3  56.5      4      1        2.75 0              1
4  40.0      2      1        2.75 9              1
5  47.7      2      0        1    10             1
6  68.1     NA      0        2.5  0              0


   gender<- ifelse(dfJHS$sex=="Male",0,1)
dfJHS$gender <- gender
View(dfJHS)
Data<-dfJHS %>% select(-sex)
View(Data)
dim(Data)
Data$perceivedStress <- factor(Data$perceivedStress)
#Remove NA
Data %>% drop_na()
Data[complete.cases(Data),]
#section with data visualizations you probably won't need for this (lots of histograms, shapiro test, and a qq plot)

 #check linear fit for two primary variables and perform  linear regression.
model <-lm(perceivedStress ~ gender, data = Data)
summary(model)

#Checking this data meets assumptions for a linear regression.
aug <- augment(model)
resids <- residuals(model)
fitted <- fitted(model)
#Convert primary dependent variable to factor for analysis
Data$perceivedStress <- factor(Data$perceivedStress)
#check linear fit for two primary variables and perform  linear regression.
model <-lm(perceivedStress ~ gender, data = Data)
summary(model)

#Checking this data meets assumptions for a linear regression.
aug <- augment(model)
resids <- residuals(model)
fitted <- fitted(model)
##Assumption 1: Residuals Normally Distributed
ggplot(aug) + geom_histogram(aes(x=.resid),
                             bins=15)
ggplot(aug) + geom_qq(aes(sample=.resid))
  
##Assumption 2: Homoscedasticity
ggplot(aug) + geom_point(aes(x=.fitted, y=.resid)) +
  geom_hline(yintercept=0, lty=2)+
  theme_bw()
##Assumption 4: Linear Relationship
ggplot(aug, aes(x=gender, y=perceivedStress)) + geom_point()+
  geom_smooth(method = "lm", se = FALSE)+
  theme_bw()

#determine if glm is better fit - No notable differences due to no change in complexity.
mod <- glm(perceivedStress ~ gender, data=Data)
summary(survive_age)
summary(mod)
aug <- augment(mod)
resids <- residuals(mod)
fitted <- fitted(mod)
## Assumption 1: Residuals Normally Distributed
ggplot(aug) + geom_histogram(aes(x=.resid),
                             bins=15)
ggplot(aug) + geom_qq(aes(sample=.resid))

## Assumption 2: Homoscedasticity
ggplot(aug) + geom_point(aes(x=.fitted, y=.resid)) +
  geom_hline(yintercept=0, lty=2)+
  theme_bw()

## Assumption 4: Linear Relationship
ggplot(aug, aes(x=gender, y=perceivedStress)) + 
  geom_point()+
  geom_smooth(method = "lm", se = FALSE)+
  theme_bw()

#ERROR OCCURS IN THIS CHUNK
#Confirm Model works using predictions and Model Specification
new<-(Data$gender=1)
full <- lm(formula = as.numeric(perceivedStress) ~ gender*age*Income*HSgrad, data=Data)
full
Data$lmprediction<- predict(full, Data = new, na.omit=TRUE)
var<-Data$perceivedStress
Data$lmprediction<- predict(full, Subset)


rmse2 <- function(x=gender, y=perceivedStress, data=Data, na.rm = TRUE){
  res <- sqrt(mean((Data$gender-Data$perceivedStress)^2, na.rm = TRUE))
  return(res)}
#observed RMSE of full model
rmse2(x=gender, y=lmprediction, data=Data)
#test other models

model1 <- lm(formula = perceivedStress~., data=Data)
model1
#Total models include model(only perceivedStress and gender), mod(.), and full(interactions). 
#Model validation through backwards selection
aic.backwards <- step(full, trace=TRUE) 
glance(aic.backwards)
tidy(aic.backwards)

分享到QQ

分享到微博