使用Shap软件包在数据框中获取功能的瀑布图值
我正在使用随机森林模型(神经网络)进行二元分类,其中正在使用Shap来解释模型预测。我遵循教程,并编写了以下代码,
以在Sergey Bushmanaov的So Post 在这里,我设法将瀑布图导出到DataFrame。但这不会复制列的特征值。它仅复制形状值,预期_value和特征名称。但是我也想要特征名称。所以,我尝试了下面
shap.waterfall_plot(shap.Explanation(values=shap_values[1])[4],base_values=explainer.expected_value[1],data=ord_test_t.iloc[4],feature_names=ord_test_t.columns.tolist())
,但这丢了一个错误
TypeError:Waterfall()有一个意外的关键字参数 'Base_values'
我希望我的输出如下。我使用1点的背景来计算基本价值。但是您也可以免费使用背景1,10或100。在下面的输出中,我将值和功能存储在一个名为功能
的列中。这是类似于lime
的东西。但是不确定Shap是否具有这种灵活性?
更新 - plot
更新代码 - 内核解释器瀑布到dataframe
masker = Independent(X_train, max_samples=100)
explainer = KernelExplainer(rf_boruta.predict,X_train)
bv = explainer.expected_value
sv = explainer.shap_values(X_train)
sdf_train = pd.DataFrame({
'row_id': X_train.index.values.repeat(X_train.shape[1]),
'feature': X_train.columns.to_list() * X_train.shape[0],
'feature_value': X_train.values.flatten(),
'base_value': bv,
'shap_values': sv.values[:,:,1].flatten() # i changed this to pd.DataFrame(sv).values[:,1].flatten()
})
I am working on a binary classification using random forest model, neural networks in which am using SHAP to explain the model predictions. I followed the tutorial and wrote the below code to get the waterfall plot shown below
With the help of Sergey Bushmanaov's SO post here, I managed to export the waterfall plot to dataframe. But this doesn't copy the feature values of the columns. It only copies the shap values, expected_value and feature names. But I want feature names as well. So, I tried the below
shap.waterfall_plot(shap.Explanation(values=shap_values[1])[4],base_values=explainer.expected_value[1],data=ord_test_t.iloc[4],feature_names=ord_test_t.columns.tolist())
but this threw an error
TypeError: waterfall() got an unexpected keyword argument
'base_values'
I expect my output to be like as below. I have used background of 1 point to compute base value. But you are free to use background 1,10 or 100 as well. In the below output, I have stored the values and feature in one column called Feature
. This is something similar to LIME
. But not sure whether SHAP has this flexibility to do this?
update - plot
update code - kernel explainer waterfall to dataframe
masker = Independent(X_train, max_samples=100)
explainer = KernelExplainer(rf_boruta.predict,X_train)
bv = explainer.expected_value
sv = explainer.shap_values(X_train)
sdf_train = pd.DataFrame({
'row_id': X_train.index.values.repeat(X_train.shape[1]),
'feature': X_train.columns.to_list() * X_train.shape[0],
'feature_value': X_train.values.flatten(),
'base_value': bv,
'shap_values': sv.values[:,:,1].flatten() # i changed this to pd.DataFrame(sv).values[:,1].flatten()
})
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
尝试以下操作:
然后:
RandomForest
有点特殊,这就是原因。当新的绘图 API 出现问题时,请尝试提供Explanation
对象。更新
解释单个数据点
exp_id
与单个背景数据点back_id
(即回答问题“为什么预测exp_id
> 与back_id
的预测不同”):最后,正如您要求以建议格式提供所有内容:
但我绝对不会显示此内容给我妈妈。
Try following:
Then:
RandomForest
is a bit special, this is why. When something fails with the new plots API, try to feedExplanation
object.UPDATE
To explain a single datapoint
exp_id
vs a single background datapointback_id
(i.e. to answer question "why prediction forexp_id
differes from prediction forback_id
"):Finally, as you asked for everything in the suggested format:
but I'd definitely not show this to my mom.