MLR3:提取每个重采样迭代的重要性
我想用MLR3提取每次重采样迭代的可变重要性。到目前为止,我只找到了一种“手动”的方法,所以我想知道是否有包装器功能或其他方法。以下是一个随机森林的玩具示例:
library(mlr3verse)
data("mtcars")
mtcars
task <- as_task_regr(mtcars, target = "mpg")
learner.rf <- lrn("regr.ranger", importance = "permutation", num.trees = 1000)
cv10 <- rsmp("cv", folds = 10)
resamp.rf <- resample(
task = task,
learner = learner.rf,
resampling = cv10,
store_models = TRUE
)
天真,我尝试了以下操作,但它不起作用(这可能不足为奇,我仍然对R6对象感到困惑):
resamp.rf$importance()
以下作品,但我必须为每次重新采样而做迭代:
resamp.rf$learners[[1]]$importance()
或创建我自己的提取功能:
aa <- resamp.rf$learners
dd <- data.frame()
for (i in 1:length(aa)) {
tmp <- data.frame(ITER = i, IMPORTANCE = aa[[i]]$importance())
dd <- rbind(dd, tmp)
rm(tmp)
}
I would like to extract variable importance for each resampling iteration with mlr3. So far, I have only found a "manual" way of doing it so I was wondering if there was a wrapper function or some other way to do it. Below is a toy example with random forest:
library(mlr3verse)
data("mtcars")
mtcars
task <- as_task_regr(mtcars, target = "mpg")
learner.rf <- lrn("regr.ranger", importance = "permutation", num.trees = 1000)
cv10 <- rsmp("cv", folds = 10)
resamp.rf <- resample(
task = task,
learner = learner.rf,
resampling = cv10,
store_models = TRUE
)
Naively, I have tried the following and it does not work (that might no be surprising, I am still confused by the R6 objects):
resamp.rf$importance()
The following works but I have to do it for each resampling iteration:
resamp.rf$learners[[1]]$importance()
Or create my own extracting function:
aa <- resamp.rf$learners
dd <- data.frame()
for (i in 1:length(aa)) {
tmp <- data.frame(ITER = i, IMPORTANCE = aa[[i]]$importance())
dd <- rbind(dd, tmp)
rm(tmp)
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为没有必要找到每个折叠的可变重要性。我实际上并不认为这是可能的,当看一下Ranger或RF之类的软件包时,您不会做到这一点,您只会像已经做过一样的训练后将其重视。您还在查看1个重新样本。
如果将RSMP定义为以下内容以允许更多的重新示例:
那么也许可以为每个引导程序示例研究它,但是我仍然建议仅查看总体结果。
I don't think it would be necessary to find the variable importance for each fold. I don't actually think it would be possible, you wouldn't do it when looking at packages like ranger or rf, you would just take the importance after training like you've done already. Also you are looking at 1 resample.
If you defined your rsmp as the following to allow more resamples:
Then maybe it would be possible to look into it for each bootstrap sample, but I would still recommend just looking at the overall result.