使用分类响应变量的辐射森林预测间隔

发布于 2025-01-21 16:14:42 字数 1413 浏览 0 评论 0原文

我试图从具有分类响应变量的随机森林模型中获取预测间隔。理想情况下,我想看看该模型对将观察结果分类为给定的响应类别有多信心。

在我的代码的最后一行中,您会看到predict()在不包括interval =参数时可行。当我包含“ Interval =”时,我会出现错误。关于如何获得输出的预测间隔有任何想法吗?

# Load libraries
library(data.table)
library(randomForest)
library(caret)

# Set seed
set.seed(123)

# Load the necessary data
df.0 <- diamonds
setDT(df.0)

# set up the cross-validation parameters
control <- trainControl(method = "repeatedcv",
               number = 10)
metric <- "Accuracy"
mtry <- seq(from = 1,
            to = length(unique(df.0$cut)),
            by = 1)
tunegrid <- expand.grid(mtry = mtry)

# Add rownames so we can use as index
df.0[, indexNum := .I]
trainer <- df.0[ ,.SD[sample(x = .N, size = (.N * 0.9))], by = cut] # Pull 90% of each cut into training
tester <- df.0[!trainer, on = c("indexNum")]

# Remove index number
tester <- tester[, ":=" (indexNum = NULL)] 
trainer <- trainer[, ":=" (indexNum = NULL)] 

# build a model and assess its accuracy via 10-fold cross validation
rf_mod <- 
  train(
    x = trainer[, .(x, y, z, depth, table)],
    y = trainer$cut,
    method = "rf",
    metric = "Accuracy",
    tuneGrid = tunegrid
  )

# check out which mtry value was best
plot(rf_mod)

# test the model against the test data
cut_pred <- predict(rf_mod, newdata = tester[, .(x, y, z, depth, table), interval = "prediction")

I'm trying to get prediction intervals from a random forest model that has a categorical response variable. Ideally, I would like to see how confident the model is for classifying an observation into a given response category.

On the last line of my code you'll see a predict() that works when the interval = argument is not included. When I include the "interval =" I get an error. Any idea on how to get prediction intervals for the output?

# Load libraries
library(data.table)
library(randomForest)
library(caret)

# Set seed
set.seed(123)

# Load the necessary data
df.0 <- diamonds
setDT(df.0)

# set up the cross-validation parameters
control <- trainControl(method = "repeatedcv",
               number = 10)
metric <- "Accuracy"
mtry <- seq(from = 1,
            to = length(unique(df.0$cut)),
            by = 1)
tunegrid <- expand.grid(mtry = mtry)

# Add rownames so we can use as index
df.0[, indexNum := .I]
trainer <- df.0[ ,.SD[sample(x = .N, size = (.N * 0.9))], by = cut] # Pull 90% of each cut into training
tester <- df.0[!trainer, on = c("indexNum")]

# Remove index number
tester <- tester[, ":=" (indexNum = NULL)] 
trainer <- trainer[, ":=" (indexNum = NULL)] 

# build a model and assess its accuracy via 10-fold cross validation
rf_mod <- 
  train(
    x = trainer[, .(x, y, z, depth, table)],
    y = trainer$cut,
    method = "rf",
    metric = "Accuracy",
    tuneGrid = tunegrid
  )

# check out which mtry value was best
plot(rf_mod)

# test the model against the test data
cut_pred <- predict(rf_mod, newdata = tester[, .(x, y, z, depth, table), interval = "prediction")

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文