R将多个randomForest对象放入一个向量中

发布于 2024-12-10 13:18:27 字数 3014 浏览 1 评论 0原文

我很好奇 R 是否有能力将对象放入向量/列表/数组/等中。我正在使用 randomforest 包来处理较大数据的子集,并希望将每个版本存储在列表中。它会类似于这样:

answers <- c()
for(i in 1:10){
x <- round((1/i), 3)
answers <- (rbind(answers, x))
}

理想情况下,我想做这样的事情:

answers <- c()
for(i in 1:10){
RF <- randomForest(training, training$data1, sampsize=c(100), do.trace=TRUE, importance=TRUE, ntree=50,,forest=TRUE)
answers <- (rbind(answers, RF))
}

这种方法可行,但这是单个 RF 对象的输出:

> RF 

Call:
 randomForest(x = training, y = training$data1, ntree = 50, sampsize = c(100), importance = TRUE, do.trace = TRUE,      forest = TRUE) 
               Type of random forest: regression
                     Number of trees: 10
No. of variables tried at each split: 2

          Mean of squared residuals: 0.05343956
                    % Var explained: 14.32

虽然这是“答案”列表的输出:

> answers 
   call       type         predicted      mse        rsq        oob.times      importance importanceSD
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
   localImportance proximity ntree mtry forest  coefs y              test inbag
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 

有谁知道如何存储所有 RF 对象或调用它们以便存储的信息与单个 RF 对象相同?感谢您的建议。

I am curious if R has the ability to place objects into vectors/lists/arrays/etc. I am using the randomforest package to work on subsets of a larger piece of data and would like to store each version in a list. It would be similar to this:

answers <- c()
for(i in 1:10){
x <- round((1/i), 3)
answers <- (rbind(answers, x))
}

Ideally I'd like to do something like this:

answers <- c()
for(i in 1:10){
RF <- randomForest(training, training$data1, sampsize=c(100), do.trace=TRUE, importance=TRUE, ntree=50,,forest=TRUE)
answers <- (rbind(answers, RF))
}

This kind of works but here's the output for a single RF object:

> RF 

Call:
 randomForest(x = training, y = training$data1, ntree = 50, sampsize = c(100), importance = TRUE, do.trace = TRUE,      forest = TRUE) 
               Type of random forest: regression
                     Number of trees: 10
No. of variables tried at each split: 2

          Mean of squared residuals: 0.05343956
                    % Var explained: 14.32

While this is the out put for the 'answers' list:

> answers 
   call       type         predicted      mse        rsq        oob.times      importance importanceSD
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
   localImportance proximity ntree mtry forest  coefs y              test inbag
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 

Does anyone know how to store all the RF objects or call them so that the info stored is the same as a single RF object? Thanks for suggestions.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

青春有你 2024-12-17 13:18:27

不要一次增加向量或列出一个元素。预分配它们并将对象分配给特定部分:

answers <- vector("list",10)
for (i in 1:10){
    answers[[i]] <- randomForest(training, training$data1, sampsize=c(100), 
                                 do.trace=TRUE, importance=TRUE, ntree=50,
                                 forest=TRUE)
}

附带说明一下,rbind 向量不会创建另一个向量或列表;而是创建另一个向量。如果您检查第一个示例中的输出,您会发现它是一个只有一列的矩阵。这解释了您在尝试将 randomForest 对象rbind 在一起时观察到的奇怪行为。

Don't grow vectors or lists one element at a time. Pre-allocate them and assign objects to specific parts:

answers <- vector("list",10)
for (i in 1:10){
    answers[[i]] <- randomForest(training, training$data1, sampsize=c(100), 
                                 do.trace=TRUE, importance=TRUE, ntree=50,
                                 forest=TRUE)
}

As a side note, rbinding vectors doesn't create another vector or list; if you check your output in your first example you'll see that it is a matrix with one column. That explains the strange behavior you observe when trying to rbind randomForest objects together.

手长情犹 2024-12-17 13:18:27

使用lapply

lapply(1:10,function(i) randomForest(<your parameters>))

您将获得随机森林对象的列表;然后您可以使用 [[]] 运算符访问其中的 i 个。

Use lapply:

lapply(1:10,function(i) randomForest(<your parameters>))

You will get a list of random forest objects; you can then access i-th of them using [[]] operator.

各自安好 2024-12-17 13:18:27

初始化一个列表:

mylist <- vector("list")  # technically all objects in R are vectors

添加到它:

new_element <- 5
mylist <- c(mylist, new_element)

@joran 关于预分配的建议在列表很大时是相关的,但在列表很小时并不完全必要。您还可以访问您在原始代码中构建的矩阵。看起来有点奇怪,但是信息都在里面。例如,列表矩阵的第一个元素可以通过以下方式恢复:

answers[1, ]

Initialize a list with:

mylist <- vector("list")  # technically all objects in R are vectors

Add to it with:

new_element <- 5
mylist <- c(mylist, new_element)

@joran's advice about pre-allocation is pertinent when the lists are large, but not entirely necessary when they are small. You could also have access the matrix you build in your original code. It looks a bit strange but the information is all in there. For example the first element of that matrix of lists could have been recovered with:

answers[1, ]
风苍溪 2024-12-17 13:18:27

其他答案提供了将随机森林对象存储在 list 中的解决方案,但它们没有解释它们为什么起作用。

正如 @42- 暗示的那样,这不是解决这里问题的预分配步骤。

真正的问题是 randomForest 对象本质上是一个 list (检查 is.list(randomForest(...))。当你写例如这样的语句:

list_of_rf = c()                                       # ... or list_of_rf = NULL
list_of_rf = rbind(list_of_rf, randomForest(...))      # ... or list_of_rf = c(list_of_rf, randomForest(...))

您本质上是要求将一个空对象与一个列表连接起来,而不是产生一个长度为 1 的列表(随机森林模型),该语句会产生一个包含所有随机森林模型组件的列表!通过在 R 控制台中输入以下内容来验证这一点:

>长度(list_of_rf)

[1]19

有几种方法可以强制 R 执行您想要的操作:

  1. 在列表中显式做作(参见 @joran 答案,尽管不需要预先分配):

    <前><代码>list_of_rf = NULL
    list_of_rf[[1]] = randomForest(...)

  2. let lapply (或类似)构建列表(参见@mbq答案):

    list_of_rf = lapply(..., function(i) randomForest(...))
    
  3. 将随机森林封装在列表中,这将在串联过程中被简化:

    <前><代码>list_of_rf = NULL
    list_of_rf = c(list_of_rf, 列表(randomForest(...)))

最后,如果您犯了一个错误,并且未列出您的随机森林模型(该模型花费了 10 个小时来计算),请不要担心,您仍然可以恢复如下:

list_of_rf = NULL
list_of_rf = c(list_of_rf, randomForest(...)) # oups, mistake
rf = as.vector(list_of_rf)[1:19]
class(rf) = 'randomForest'

Other answers provide solutions to store random forest objects in a list, but they don't explain why they are working.

As @42- hints, this is not the pre-allocation step that solves the issue here.

The real problem is that a randomForest object is fundamentally a list (check is.list(randomForest(...)). When you write a statement such as:

list_of_rf = c()                                       # ... or list_of_rf = NULL
list_of_rf = rbind(list_of_rf, randomForest(...))      # ... or list_of_rf = c(list_of_rf, randomForest(...))

you are essentially asking to concatenate an empty object with a list. Instead of resulting in a list of length 1 (the random forest model), this statement results in a list containing all the random forest model components! You can verify this by typing in you R console:

> length(list_of_rf)

[1] 19

There are several ways to force R to perform the operation that you want:

  1. explicit affectation in the list (cf @joran answer, although there is no need to pre-allocate):

    list_of_rf = NULL
    list_of_rf[[1]] = randomForest(...)
    
  2. let lapply (or similar) build the list (cf @mbq answer):

    list_of_rf = lapply(..., function(i) randomForest(...))
    
  3. encapsulate the random forest within a list, which will be simplified during the concatenation:

    list_of_rf = NULL
    list_of_rf = c(list_of_rf, list(randomForest(...)))
    

Finally, if you made a mistake and unlisted your randomForest model which took 10 hours to be computed, don't sweat, you can still restore it as follows:

list_of_rf = NULL
list_of_rf = c(list_of_rf, randomForest(...)) # oups, mistake
rf = as.vector(list_of_rf)[1:19]
class(rf) = 'randomForest'
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文