预测()随机森林 - 从单个树中提取预测

发布于 2025-02-05 14:15:02 字数 2616 浏览 3 评论 0原文

我构建了一个随机森林模型,称为iris_class *。

set.seed(10)
index_row <- sample(2, 
                    nrow(iris), 
                    replace = T, 
                    prob = c(0.7, 0.3)
)  

train_data <- iris[index_row == 1,]
test_data <- iris[index_row == 2,]

iris_class <- randomForest(Species ~., 
                                data = train_data)

这就是iris_class的样子:

> iris_class

Call:
 randomForest(formula = Species ~ ., data = train_data) 
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 2

        OOB estimate of  error rate: 4.5%
Confusion matrix:
           setosa versicolor virginica class.error
setosa         38          0         0  0.00000000
versicolor      0         39         2  0.04878049
virginica       0          3        29  0.09375000

然后,我使用它使用precadion()函数进行预测。

predictions<- predict(iris_class, test_data[,-5], type = "response")

iris_class由500个单独的树制成。如果我正确理解时,当我运行precadive()使用iris_class时,正在生成500棵树,每棵树都会给出一个分类,我被证明了这500个500的平均结果树木。

我的问题是:

是否有一种方法可以提取500棵树的预测?

,换句话说,predictive()函数可以返回一个对象,每个对象被分类的项目将有500行说setosaversicolorvirginica。或这样的对象的汇总版本(如下所示)。目的是:我想知道该模型实际上是如何“自信”的。当它预测植物为setosa时,是450棵树所说的setosa和50所说的话,还是251 vs 249?

哪一行代码将提取单个树的预测?

我的理想输出看起来像这样:

> predictions_info
       setosa versicolor  virginica       pred
1  0.01517536 0.55449239 0.43033225 versicolor
2  0.21957988 0.71962024 0.06079987 versicolor
3  0.28146250 0.36777757 0.35075993 versicolor
4  0.51503150 0.41750308 0.06746543     setosa
5  0.25832598 0.10796878 0.63370523  virginica
6  0.24603616 0.07558151 0.67838233  virginica
7  0.02323489 0.41547464 0.56129047  virginica
8  0.41155830 0.49214444 0.09629726 versicolor
9  0.30217529 0.39852784 0.29929686 versicolor
10 0.45923782 0.49147493 0.04928725 versicolor
11 0.70479092 0.27648912 0.01871996     setosa
12 0.34489442 0.02606726 0.62903832  virginica
13 0.15553471 0.18903000 0.65543530  virginica
[...]

pred> pred列是predict> predict函数当前返回,前3列显示了500棵树的预测比例。 (这些数字和预测已组成!并且它们不匹配模型输出)

*此示例最初来自此网站(由我修改): https://rpubs.com/jay2548/519589

I built a random forest model called iris_class *.

set.seed(10)
index_row <- sample(2, 
                    nrow(iris), 
                    replace = T, 
                    prob = c(0.7, 0.3)
)  

train_data <- iris[index_row == 1,]
test_data <- iris[index_row == 2,]

iris_class <- randomForest(Species ~., 
                                data = train_data)

This is how iris_class looks like:

> iris_class

Call:
 randomForest(formula = Species ~ ., data = train_data) 
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 2

        OOB estimate of  error rate: 4.5%
Confusion matrix:
           setosa versicolor virginica class.error
setosa         38          0         0  0.00000000
versicolor      0         39         2  0.04878049
virginica       0          3        29  0.09375000

I then use it to make predictions using the predict() function.

predictions<- predict(iris_class, test_data[,-5], type = "response")

iris_class is made of 500 individual trees. If I understand correctly, when I run predict() using iris_class, 500 trees are being generated, each gives a classification, and I am being shown the average result of those 500 trees.

My questions is:

is there a way to extract the prediction of each of the 500 trees?

In other words, can the predict() function return an object that, for each item being classified, will have 500 rows saying setosa, versicolor or virginica. Or a summarised version of such an object (shown below). The purpose is: I want to know how "confident" the model actually is. When it predicts a plant to be setosa, is it 450 trees said setosa and 50 said something else, or is it 251 vs 249?

What line of code will extract predictions for individual trees?

My ideal output would look something like this:

> predictions_info
       setosa versicolor  virginica       pred
1  0.01517536 0.55449239 0.43033225 versicolor
2  0.21957988 0.71962024 0.06079987 versicolor
3  0.28146250 0.36777757 0.35075993 versicolor
4  0.51503150 0.41750308 0.06746543     setosa
5  0.25832598 0.10796878 0.63370523  virginica
6  0.24603616 0.07558151 0.67838233  virginica
7  0.02323489 0.41547464 0.56129047  virginica
8  0.41155830 0.49214444 0.09629726 versicolor
9  0.30217529 0.39852784 0.29929686 versicolor
10 0.45923782 0.49147493 0.04928725 versicolor
11 0.70479092 0.27648912 0.01871996     setosa
12 0.34489442 0.02606726 0.62903832  virginica
13 0.15553471 0.18903000 0.65543530  virginica
[...]

Where the pred column is what the predict function currently returns, and the first 3 columns show what proportion of the 500 trees gave which prediction. (These numbers and predictions are made up! and they don't match model output)

*This example is originally from this website (modified by me):
https://rpubs.com/Jay2548/519589

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

陌伤浅笑 2025-02-12 14:15:02

使用preadive.all = t。然后在所有预测中计算您想要的任何内容。小心,您的大小矩阵:长度(数据集)x树的数量

Use predict.all = T. Then compute whatever you want with all the predictions. Be carefull, you will have a matrix of size : length(dataset) x number of trees

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文