使用具有 4000 条记录和 13 个属性的 rpart 的 R 问题

发布于 2024-08-29 17:45:57 字数 471 浏览 6 评论 0原文

我尝试给这个包的作者发电子邮件但没有成功，只是想知道是否还有其他人经历过这种情况。

我正在使用 rpart 处理具有 13 个属性的 4000 行数据。我可以对 300 行相同数据运行相同的测试，没有任何问题。当我运行 4000 行时，Rgui.exe 始终以 50% CPU 运行，并且用户界面挂起；如果我让它这样的话，它会保持这种状态至少4-5小时运行，并且永远不会退出或变得有响应。

这是我在 300 和 4000 大小子集上使用的代码：

train <- read.csv("input.csv", header=T)
y <- train[, 18]
x <- train[, 3:17]
library(rpart)
fit <- rpart(y ~ ., x)

这是 rpart 的已知限制吗？我做错了什么吗？潜在的解决方法？

原文

I have attempted to email the author of this package without success,
just wondering if anybody else has experienced this.

I am having an using rpart on 4000 rows of data with 13 attributes.
I can run the same test on 300 rows of the same data with no issue.
When I run on 4000 rows, Rgui.exe runs consistently at 50% CPU and the
UI hangs; it will stay like this for at least 4-5hours if I let it
run, and never exit or become responsive.

here is the code I am using both on the 300 and 4000 size subset:

train <- read.csv("input.csv", header=T)
y <- train[, 18]
x <- train[, 3:17]
library(rpart)
fit <- rpart(y ~ ., x)

Is this a known limitation of rpart, am I doing something wrong?
potential workarounds?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

陌生 2024-09-05 17:45:57

当您提供类似尺寸的 rpart 随机数据而不是真实数据（来自 input.csv）时，您可以重现错误消息吗？如果不是，则可能是您的数据有问题（可能是格式问题？）。使用 read.csv 导入数据后，通过查看输出来检查数据是否存在格式问题
str（火车）。

#How to do an equivalent rpart fit one some random data of equivalent dimension
dats<-data.frame(matrix(rnorm(4000*14), nrow=4000))

y<-dats[,1]
x<-dats[,-1]
library(rpart)
system.time(fit<-rpart(y~.,x))

Can you reproduce the error message when you feed rpart random data of similar dimensions, rather than your real data (from input.csv)? If not, it's probably a problem with your data (formatting perhaps?). After importing your data using read.csv, check the data for format issues by looking at the output from
str(train).

#How to do an equivalent rpart fit one some random data of equivalent dimension
dats<-data.frame(matrix(rnorm(4000*14), nrow=4000))

y<-dats[,1]
x<-dats[,-1]
library(rpart)
system.time(fit<-rpart(y~.,x))

回复收藏 0 原文