是否可以在我从sqlite中提取的数据库上进行RandomForest分类?
这里的新事物很抱歉错误。
我正在使用一组大量数据(16GB),并尝试使用RSTUDIO中的R创建RandomForest分类模型。我无法使用read.csv将数据导入数据框架,因为我的计算机无法处理(或者不可行,我相信这两者在某种意义上可能都是正确的)。为了解决我的问题,我创建了一个新的SQLite数据库,其中包含我想尝试加载到R中的数据表,并且当我使用DPLYR连接到我的SQLITE数据库时,可以访问数据。但是,当我尝试使用数据拟合随机孔模型时,它会引发错误,并说“数据(x)具有0行”,我认为这是由于Nrow()无法与表一起使用的事实从sqlite拉出。谁能告诉我我的选择解决这个问题?
在R中的SQLITE数据库中不起作用,请提前
tl; dr:dr:nrow(),因此RandomForest分类不起作用。帮助!
编辑#1:@hamagaust建议我邮寄我拥有的代码以及我尝试过的内容。这是我正在使用的代码:
con <- dbConnect(RSQLite::SQLite(), "MyData.db")
X_train_all <- tbl(con, "X_train_all")
X_test_all <- tbl(con, "X_test_all")
head(X_train_all) # Returns the table I see in SQLite, as in I know the data is correct.
nrow(X_train_all) # Returns NA
classifier <- randomForest(X_train_all, Y_train_all, ntree = 100) # Throws error (Y_train_all isn't the problem, that is a small .txt file I was able to import easily.
为了进一步的背景,我正在接管一个项目,并试图将Python中已经完成的内容转换为R,而X_Train_All,x_test_all,x_test_all,y_train_all,y_test_all已经创建并成功地实现了Python中的随机福特模型。
编辑#2:道歉,我的数据在R中的表中,而不是数据框架。
new here so apologies for errors.
I am working with a massive set of data (16GB) and trying to create a randomForest classification model using R in RStudio. I can't use read.csv to import the data into a data frame because my computer can't handle it (or it's not feasible, I believe that both are probably true in some sense). To remedy my issue, I created a new SQLite database with the tables of data that I want to try and load into R, and have no problem getting access to the data when I use dplyr to connect to my SQLite database. However, when I attempt to fit a randomForest model using the data, it throws an error, saying that "data (x) has 0 rows," which I believe is due to the fact that nrow() won't work with a table pulled from SQLite. Can anyone tell me what my options are to fix this issue?
Thanks in advance
tl;dr: nrow() doesn't work on SQLite databases in R, so randomForest classification doesn't work. Help!
Edit #1: @hamagaust suggested I post code that I have, and what I have tried. Here is the code I am working with:
con <- dbConnect(RSQLite::SQLite(), "MyData.db")
X_train_all <- tbl(con, "X_train_all")
X_test_all <- tbl(con, "X_test_all")
head(X_train_all) # Returns the table I see in SQLite, as in I know the data is correct.
nrow(X_train_all) # Returns NA
classifier <- randomForest(X_train_all, Y_train_all, ntree = 100) # Throws error (Y_train_all isn't the problem, that is a small .txt file I was able to import easily.
For further background, I am taking over a project and attempting to translate what has been done already in Python into R, with X_train_all, X_test_all, y_train_all, y_test_all having already been created and used to successfully implement a randomForest model in python.
Edit #2: Apologies, my data is in a table in R, not a data frame.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论