tidymodels:如何从培训数据中提取重要性
我有以下代码,在其中进行一些网格搜索以寻找不同的MTRY和MIN_N。我知道如何提取具有最高精度的参数(请参见第二个代码框)。如何在培训数据集中提取每个功能的重要性?我在网上找到的指南仅在测试数据集中使用“ last_fit”来显示如何进行。例如指南: https://wwww.tidymodels.orgg/start/case/case/case/case/case - 研究/#数据分割
set.seed(seed_number)
data_split <- initial_split(node_strength,prop = 0.8,strata = Group)
train <- training(data_split)
test <- testing(data_split)
train_folds <- vfold_cv(train,v = 10)
rfc <- rand_forest(mode = "classification", mtry = tune(),
min_n = tune(), trees = 1500) %>%
set_engine("ranger", num.threads = 48, importance = "impurity")
rfc_recipe <- recipe(data = train, Group~.)
rfc_workflow <- workflow() %>% add_model(rfc) %>%
add_recipe(rfc_recipe)
rfc_result <- rfc_workflow %>%
tune_grid(train_folds, grid = 40, control = control_grid(save_pred = TRUE),
metrics = metric_set(accuracy))
。
best <-
rfc_result %>%
select_best(metric = "accuracy")
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
为此,您将需要创建一个自定义
提取
函数,例如在本文档中概述了。对于随机森林的重要性,您的功能将看起来像这样:
然后您可以将其应用于这样的重塑(请注意,您将获得新的
.entracts .entracts
列):在2022上创建-06-19由 preprex package (v2.0.1)
您具有这些可变的重要性分数提取,您可以
unnest()
他们(现在,您必须两次这样做,因为它是深嵌套的),然后您可以按照自己的要求进行汇总和可视化:在2022-06-19创建的(v2.0.1)
To do this, you will want to create a custom
extract
function, as outlined in this documentation.For random forest variable importance, your function will look something like this:
And then you can apply it to your resamples like so (notice that you get a new
.extracts
column):Created on 2022-06-19 by the reprex package (v2.0.1)
Once you have those variable importance score extracts, you can
unnest()
them (right now, you have to do this twice because it is deeply nested) and then you can summarize and visualize as you prefer:Created on 2022-06-19 by the reprex package (v2.0.1)