这个问题是指中的tidymodels模型。鉴于下面的评论,OP找到了解决方案,但到目前为止没有与社区共享。
我想分析装有 tidymodels
包装的树的合奏,带有形状值图,例如
并总结所有功能的效果我的数据集中的数据集,例如
dalextra
提供了一个函数来创建tidymodels divell.tidymodels()
。 force_plot
从 fastshap
软件包提供了一个包装器,用于基础python软件包的绘图函数 shap
。但是我不明白如何使该函数与 divell.tidymodels()
函数的输出一起使用。
问题:如何使用 tidymodels
和 dimend.tidymodels
在R中生成此类塑形图?
MWE(用于的Shap值> divell.tidymodels
)
library(MASS)
library(tidyverse)
library(tidymodels)
library(parsnip)
library(treesnip)
library(catboost)
library(fastshap)
library(DALEXtra)
set.seed(1337)
rec <- recipe(crim ~ ., data = Boston)
split <- initial_split(Boston)
train_data <- training(split)
test_data <- testing(split) %>% dplyr::select(-crim) %>% as.matrix()
model_default<-
parsnip::boost_tree(
mode = "regression"
) %>%
set_engine(engine = 'catboost', loss_function = 'RMSE')
#sometimes catboost is not loaded correctly the following two lines
#ensure prevent fitting errors
#https://github.com/curso-r/treesnip/issues/21 error is mentioned on last post
set_dependency("boost_tree", eng = "catboost", "catboost")
set_dependency("boost_tree", eng = "catboost", "treesnip")
model_fit_wf <- model_fit_wf <- workflow() %>% add_model(model_tune) %>% add_recipe(rec) %>% {parsnip::fit(object = ., data = train_data)}
SHAP_wf <- explain_tidymodels(model_fit_wf, data = X, y = train_data$crim, new_data = test_data
This question refers to Obtaining summary shap plot for catboost model with tidymodels in R. Given the comment below the question, the OP found a solution but did not share it with the community so far.
I want to analyze my tree ensembles fitted with the tidymodels
package with SHAP value plots such as plots for single observations like

and to summarize the effect of all features of my dataset like

DALEXtra
provides a function to create SHAP values for tidymodels explain.tidymodels()
. force_plot
from the fastshap
package provide a wrapper for the plot function of the underlying python package SHAP
. But I can't understand how to make the function work with the output of the explain.tidymodels()
function.
Question : How can one generate such SHAP plots in R using tidymodels
and explain.tidymodels
?
MWE (for SHAP values with explain.tidymodels
)
library(MASS)
library(tidyverse)
library(tidymodels)
library(parsnip)
library(treesnip)
library(catboost)
library(fastshap)
library(DALEXtra)
set.seed(1337)
rec <- recipe(crim ~ ., data = Boston)
split <- initial_split(Boston)
train_data <- training(split)
test_data <- testing(split) %>% dplyr::select(-crim) %>% as.matrix()
model_default<-
parsnip::boost_tree(
mode = "regression"
) %>%
set_engine(engine = 'catboost', loss_function = 'RMSE')
#sometimes catboost is not loaded correctly the following two lines
#ensure prevent fitting errors
#https://github.com/curso-r/treesnip/issues/21 error is mentioned on last post
set_dependency("boost_tree", eng = "catboost", "catboost")
set_dependency("boost_tree", eng = "catboost", "treesnip")
model_fit_wf <- model_fit_wf <- workflow() %>% add_model(model_tune) %>% add_recipe(rec) %>% {parsnip::fit(object = ., data = train_data)}
SHAP_wf <- explain_tidymodels(model_fit_wf, data = X, y = train_data$crim, new_data = test_data
发布评论
评论(1)
也许这会有所帮助。至少,这是朝正确方向迈出的一步。
首先,确保您安装了快速塑料和网状(即install.packages(“ ...”))。接下来,设置虚拟环境并安装形状(PIP安装...)。另外,为依赖关系图安装matplotlib 3.2.2(请在此查看GitHub问题 - 较旧的Matplotlib版本是必需的)。
Rstudio在虚拟环境设置方面有很好的信息。也就是说,虚拟环境设置需要或多或少的故障排除,具体取决于使用的IDE。 (可悲的是,某些工作设置限制了由于许可而引起的开源rstudio的使用。)
图书馆的文档(fastshap)在这方面也很有帮助。
这是LightGBM的工作流程(来自Treemnip Docs,经过轻微修改)。
在预测之前,我们希望适合我们的工作流程
,现在我们有一个合适的工作流程,并且可以预测。要使用FastShap ::解释功能,我们需要创建一个预测函数(这并不总是存在:根据所使用的引擎,它可能会或可能无法奏效 - 请参阅文档)。
在我们使用时,让我们获取平均预测值(下面使用)。这也是确保功能运行的检查。
现在,我们创建我们的解释(Shap值)。在此处注意Pred_wrapper和X参数(有关其他示例,请参见FastShap GitHub问题 - 即Glmnet)。
这应该产生力图。
这允许多个垂直堆叠:
添加link =“ logit”进行分类。将显示为“ html”以进行rmarkDown渲染。
现在以获取摘要图和依赖图。
诀窍是使用网状直接访问功能。请注意,对于依赖项图,诸如变形金刚,numpy等库的逻辑保留
。
有关等级的说明(3) - 等级(1)等也将起作用。
令人难以置信的是,当我尝试直接命名该功能(即“剪切”)时,它丢了错误。
现在以摘要图:
最终注意:反复渲染该图将产生错误的可视化。希望这为Catboost可视化提供了depture点。
Perhaps this will help. At the very least, it is a step in the right direction.
First, ensure you have fastshap and reticulate installed (i.e., install.packages("...")). Next, set up a virtual environment and install shap (pip install ...). Also, install matplotlib 3.2.2 for the dependency plots (check out GitHub issues on this -- an older version of matplotlib is necessary).
RStudio has great information on virtual environment setup. That said, virtual environment setup requires more or less troubleshooting depending on the IDE of use. (Sadly, some work settings restrict the use of open source RStudio due to licensing.)
Docs for library(fastshap) are also helpful on this front.
Here's a workflow for lightgbm (from treesnip docs, lightly modified).
Prior to prediction we want to fit our workflow
Now we have a fit workflow and can predict. To use the fastshap::explain function, we need to create a predict function (this doesn't always hold: depending on the engine used it may or may not work out of the box -- see docs).
Let's get the mean prediction value (used below) while we're at it. This also serves as a check to ensure the function is functioning.
Now we create our explanations (shap values). Note the pred_wrapper and X arguments here (see fastshap github issues for other examples -- i.e. glmnet).
This should produce a force plot.
This allows multiple, vertically stacked:
Add link = "logit" for classification. Change display to "html" for Rmarkdown rendering.
Now for summary plots and dependency plots.
The trick is using reticulate to access the functions directly. Note that the same logic hold for libraries like transformers, numpy, etc.
First, for dependency plot.
See shap docs for explanation of rank(3) -- rank(1) etc will also work.
Unforunately it threw an error when I attempted naming the feature directly (i.e., "cut").
Now for the summary plot:
Final note: rendering the plot repeatedly will produce buggy visualizations. Hopefully this provides a point of depature for catboost visualizations.