是否可以在我从sqlite中提取的数据库上进行RandomForest分类?

发布于 2025-02-09 07:38:04 字数 1067 浏览 0 评论 0原文

这里的新事物很抱歉错误。

我正在使用一组大量数据(16GB),并尝试使用RSTUDIO中的R创建RandomForest分类模型。我无法使用read.csv将数据导入数据框架,因为我的计算机无法处理(或者不可行,我相信这两者在某种意义上可能都是正确的)。为了解决我的问题,我创建了一个新的SQLite数据库,其中包含我想尝试加载到R中的数据表,并且当我使用DPLYR连接到我的SQLITE数据库时,可以访问数据。但是,当我尝试使用数据拟合随机孔模型时,它会引发错误,并说“数据(x)具有0行”,我认为这是由于Nrow()无法与表一起使用的事实从sqlite拉出。谁能告诉我我的选择解决这个问题?

在R中的SQLITE数据库中不起作用,请提前

tl; dr:dr:nrow(),因此RandomForest分类不起作用。帮助!

编辑#1:@hamagaust建议我邮寄我拥有的代码以及我尝试过的内容。这是我正在使用的代码:

con <- dbConnect(RSQLite::SQLite(), "MyData.db")
X_train_all <- tbl(con, "X_train_all") 
X_test_all <- tbl(con, "X_test_all")
head(X_train_all) # Returns the table I see in SQLite, as in I know the data is correct. 
nrow(X_train_all) # Returns NA

classifier <- randomForest(X_train_all, Y_train_all, ntree = 100) # Throws error (Y_train_all isn't the problem, that is a small .txt file I was able to import easily. 

为了进一步的背景,我正在接管一个项目,并试图将Python中已经完成的内容转换为R,而X_Train_All,x_test_all,x_test_all,y_train_all,y_test_all已经创建并成功地实现了Python中的随机福特模型。

编辑#2:道歉,我的数据在R中的表中,而不是数据框架。

new here so apologies for errors.

I am working with a massive set of data (16GB) and trying to create a randomForest classification model using R in RStudio. I can't use read.csv to import the data into a data frame because my computer can't handle it (or it's not feasible, I believe that both are probably true in some sense). To remedy my issue, I created a new SQLite database with the tables of data that I want to try and load into R, and have no problem getting access to the data when I use dplyr to connect to my SQLite database. However, when I attempt to fit a randomForest model using the data, it throws an error, saying that "data (x) has 0 rows," which I believe is due to the fact that nrow() won't work with a table pulled from SQLite. Can anyone tell me what my options are to fix this issue?

Thanks in advance

tl;dr: nrow() doesn't work on SQLite databases in R, so randomForest classification doesn't work. Help!

Edit #1: @hamagaust suggested I post code that I have, and what I have tried. Here is the code I am working with:

con <- dbConnect(RSQLite::SQLite(), "MyData.db")
X_train_all <- tbl(con, "X_train_all") 
X_test_all <- tbl(con, "X_test_all")
head(X_train_all) # Returns the table I see in SQLite, as in I know the data is correct. 
nrow(X_train_all) # Returns NA

classifier <- randomForest(X_train_all, Y_train_all, ntree = 100) # Throws error (Y_train_all isn't the problem, that is a small .txt file I was able to import easily. 

For further background, I am taking over a project and attempting to translate what has been done already in Python into R, with X_train_all, X_test_all, y_train_all, y_test_all having already been created and used to successfully implement a randomForest model in python.

Edit #2: Apologies, my data is in a table in R, not a data frame.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文