MLR3:提取每个重采样迭代的重要性

发布于 2025-01-28 03:42:42 字数 849 浏览 4 评论 0原文

我想用MLR3提取每次重采样迭代的可变重要性。到目前为止,我只找到了一种“手动”的方法,所以我想知道是否有包装器功能或其他方法。以下是一个随机森林的玩具示例:

library(mlr3verse)
data("mtcars")
mtcars

task <- as_task_regr(mtcars, target = "mpg")

learner.rf <- lrn("regr.ranger", importance = "permutation", num.trees = 1000)

cv10 <- rsmp("cv", folds = 10)

resamp.rf <- resample(
  task = task, 
  learner = learner.rf, 
  resampling = cv10, 
  store_models = TRUE
)

天真,我尝试了以下操作,但它不起作用(这可能不足为奇,我仍然对R6对象感到困惑):

resamp.rf$importance()

以下作品,但我必须为每次重新采样而做迭代:

resamp.rf$learners[[1]]$importance()

或创建我自己的提取功能:

aa <- resamp.rf$learners
dd  <- data.frame()
for (i in 1:length(aa)) {
  tmp <- data.frame(ITER = i, IMPORTANCE = aa[[i]]$importance())
  dd <- rbind(dd, tmp)
  rm(tmp)
}

I would like to extract variable importance for each resampling iteration with mlr3. So far, I have only found a "manual" way of doing it so I was wondering if there was a wrapper function or some other way to do it. Below is a toy example with random forest:

library(mlr3verse)
data("mtcars")
mtcars

task <- as_task_regr(mtcars, target = "mpg")

learner.rf <- lrn("regr.ranger", importance = "permutation", num.trees = 1000)

cv10 <- rsmp("cv", folds = 10)

resamp.rf <- resample(
  task = task, 
  learner = learner.rf, 
  resampling = cv10, 
  store_models = TRUE
)

Naively, I have tried the following and it does not work (that might no be surprising, I am still confused by the R6 objects):

resamp.rf$importance()

The following works but I have to do it for each resampling iteration:

resamp.rf$learners[[1]]$importance()

Or create my own extracting function:

aa <- resamp.rf$learners
dd  <- data.frame()
for (i in 1:length(aa)) {
  tmp <- data.frame(ITER = i, IMPORTANCE = aa[[i]]$importance())
  dd <- rbind(dd, tmp)
  rm(tmp)
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

糖粟与秋泊 2025-02-04 03:42:42

我认为没有必要找到每个折叠的可变重要性。我实际上并不认为这是可能的,当看一下Ranger或RF之类的软件包时,您不会做到这一点,您只会像已经做过一样的训练后将其重视。您还在查看1个重新样本。

如果将RSMP定义为以下内容以允许更多的重新示例:

cv10 <- rsmp("repeatedcv", folds = 10, repeats = 50)

那么也许可以为每个引导程序示例研究它,但是我仍然建议仅查看总体结果。

I don't think it would be necessary to find the variable importance for each fold. I don't actually think it would be possible, you wouldn't do it when looking at packages like ranger or rf, you would just take the importance after training like you've done already. Also you are looking at 1 resample.

If you defined your rsmp as the following to allow more resamples:

cv10 <- rsmp("repeatedcv", folds = 10, repeats = 50)

Then maybe it would be possible to look into it for each bootstrap sample, but I would still recommend just looking at the overall result.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文