没有列有NA,但是有错误的“参数”暗示行数不同:”
试图使用OLS_STEP_BOTH_P来找到用于火车数据集的最佳型号。但是,运行以下操作后继续遇到错误:
house_sales <- read.csv("Sales.csv")
house_sales <- subset(house_sales, select = -c(id, lat, long, zipcode))
house_sales$date <- as.numeric(substr(house_sales$date, 1, 8))
house_sales$price <- as.numeric(gsub('[\\$,]', '', house_sales$price))
library(caret)
set.seed(1023)
DataSplit <- createDataPartition(house_sales$price, p = 0.7, list = FALSE)
TrainData <- house_sales[DataSplit,]
TestData <- house_sales[-DataSplit,]
full_model <- lm(price~., data = TrainData)
library(olsrr)
ols_step_both_p(full_model, pent=0.1, prem=0.05, details=TRUE)
错误:
data.frame中的错误(model = rep(seq_len(all_step),lbetas),预测器 =名称(beta),:参数暗示行数不同:100,92
在线搜索是出于原因,这似乎是因为某些列的行号不同。试图查看是否有任何包含Na的列:
apply(TrainData, 2, function(x) sum(is.na(x)))
结果:
日期价格卧室浴室sqft_living 0 0 0 0 0 SQFT_LOT地板滨水图状况 0 0 0 0 0 等级sqft_above sqft_basement yr_built yr_renovated 0 0 0 0 0 sqft_living15 sqft_lot15 0 0
并检查了火车数据集中的所有列,它们具有相同数量的行。
这是公共数据的链接: kc_house_sales
您也可以看到此链接:
Tried to use ols_step_both_p to find the best model for Train dataset. But keep getting the error after running the following:
house_sales <- read.csv("Sales.csv")
house_sales <- subset(house_sales, select = -c(id, lat, long, zipcode))
house_sales$date <- as.numeric(substr(house_sales$date, 1, 8))
house_sales$price <- as.numeric(gsub('[\\$,]', '', house_sales$price))
library(caret)
set.seed(1023)
DataSplit <- createDataPartition(house_sales$price, p = 0.7, list = FALSE)
TrainData <- house_sales[DataSplit,]
TestData <- house_sales[-DataSplit,]
full_model <- lm(price~., data = TrainData)
library(olsrr)
ols_step_both_p(full_model, pent=0.1, prem=0.05, details=TRUE)
Error:
Error in data.frame(model = rep(seq_len(all_step), lbetas), predictor
= names(betas), : arguments imply differing number of rows: 100, 92
Searched online for reason and it looks like it's because some columns' row numbers are different. Tried to see if there is any column containing NA:
apply(TrainData, 2, function(x) sum(is.na(x)))
and the result:
date price bedrooms bathrooms sqft_living 0 0 0 0 0 sqft_lot floors waterfront view condition 0 0 0 0 0 grade sqft_above sqft_basement yr_built yr_renovated 0 0 0 0 0 sqft_living15 sqft_lot15 0 0
And checked all columns in Train dataset, they have the same number of rows.
Here is the link of the public data:
KC_House_Sales
You can also see this link:
KC_House_Sales
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论