导出造型瀑布图到数据框

发布于 2025-01-18 02:13:50 字数 955 浏览 4 评论 0原文

我正在使用随机森林模型和神经网络进行二元分类,其中使用 SHAP 来解释模型预测。我按照教程编写了下面的代码来获取如下所示的瀑布图

row_to_show = 20
data_for_prediction = ord_test_t.iloc[row_to_show]  # use 1 row of data here. Could use multiple rows if desired
data_for_prediction_array = data_for_prediction.values.reshape(1, -1)
rf_boruta.predict_proba(data_for_prediction_array)
explainer = shap.TreeExplainer(rf_boruta)
# Calculate Shap values
shap_values = explainer.shap_values(data_for_prediction)
shap.plots._waterfall.waterfall_legacy(explainer.expected_value[0], shap_values[0],ord_test_t.iloc[row_to_show])

这生成了如下所示的图

< img src="https://i.sstatic.net/Ftxu7.png" alt="在此处输入图像描述">

但是,我想将其导出到数据框,我该怎么做?

我希望我的输出如下所示。我想将其导出为完整的数据框。你能帮我吗?

输入图片此处描述

I am working on a binary classification using random forest model, neural networks in which am using SHAP to explain the model predictions. I followed the tutorial and wrote the below code to get the waterfall plot shown below

row_to_show = 20
data_for_prediction = ord_test_t.iloc[row_to_show]  # use 1 row of data here. Could use multiple rows if desired
data_for_prediction_array = data_for_prediction.values.reshape(1, -1)
rf_boruta.predict_proba(data_for_prediction_array)
explainer = shap.TreeExplainer(rf_boruta)
# Calculate Shap values
shap_values = explainer.shap_values(data_for_prediction)
shap.plots._waterfall.waterfall_legacy(explainer.expected_value[0], shap_values[0],ord_test_t.iloc[row_to_show])

This generated the plot as shown below

enter image description here

However, I want to export this to dataframe and how can I do it?

I expect my output to be like as shown below. I want to export this for the full dataframe. Can you help me please?

enter image description here

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

墨小墨 2025-01-25 02:13:50

我们来做一个小实验:

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from shap import TreeExplainer

X, y = load_breast_cancer(return_X_y=True)
model = RandomForestClassifier(max_depth=5, n_estimators=100).fit(X, y)
explainer = TreeExplainer(model)

这里的 explainer 是什么?如果您执行dir(explainer),您会发现它有一些方法和属性,其中包括:

explainer.expected_value

您对此感兴趣,因为这是 SHAP 值相加的基础。

此外:

sv = explainer.shap_values(X)
len(sv)

将给出提示 sv 是一个由 2 个对象组成的列表,它们很可能是 10 的 SHAP 值,它们必须是对称的(因为向 1 移动的东西会以完全相同的量移动,但符号相反,向 0 移动)。

因此:

sv1 = sv[1]

现在您已拥有将其打包为所需格式的一切:

df = pd.DataFrame(sv1, columns=X.columns)
df.insert(0, 'bv', explainer.expected_value[1])

:我怎么知道?
A:阅读文档和源代码。

Let's do a small experiment:

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from shap import TreeExplainer

X, y = load_breast_cancer(return_X_y=True)
model = RandomForestClassifier(max_depth=5, n_estimators=100).fit(X, y)
explainer = TreeExplainer(model)

What is explainer here? If you do dir(explainer) you'll find out it has some methods and attributes among which is:

explainer.expected_value

which is of interest to you because this is base on which SHAP values add up.

Furthermore:

sv = explainer.shap_values(X)
len(sv)

will give a hint sv is a list consisting of 2 objects which are most probably SHAP values for 1 and 0, which must be symmetric (because what moves towards 1 moves exactly by the same amount, but with opposite sign, towards 0).

Hence:

sv1 = sv[1]

Now you have everything to pack it to the desired format:

df = pd.DataFrame(sv1, columns=X.columns)
df.insert(0, 'bv', explainer.expected_value[1])

Q: How do I know?
A: Read docs and source code.

寂寞花火° 2025-01-25 02:13:50

如果我没记错的话,您可以使用pandas

import pandas as pd
    
shap_values = explainer.shap_values(data_for_prediction)
shap_values_df = pd.DataFrame(shap_values)

来获取功能名称,您应该执行这样的操作(如果data_for_frediction是dataframe):

feature_names = data_for_prediction.columns.tolist()
shap_df = pd.DataFrame(shap_values.values, columns=feature_names)

If I recall correctly, you can do something like this with pandas

import pandas as pd
    
shap_values = explainer.shap_values(data_for_prediction)
shap_values_df = pd.DataFrame(shap_values)

to get the feature names, you should do something like this (if data_for_prediction is a dataframe):

feature_names = data_for_prediction.columns.tolist()
shap_df = pd.DataFrame(shap_values.values, columns=feature_names)
旧话新听 2025-01-25 02:13:50

我目前正在使用它:

def getShapReport(classifier,X_test):
   shap_values = shap.TreeExplainer(classifier).shap_values(X_test)
   shap.summary_plot(shap_values, X_test)
   shap.summary_plot(shap_values[1], X_test)
   return pd.DataFrame(shap_values[1])

它首先显示模型的形状值,然后显示每个预测的形状值,最后返回正类的数据帧(我处于不平衡上下文中)

它用于树解释器不是瀑布,但基本是一样的。

I'm a currenty using that :

def getShapReport(classifier,X_test):
   shap_values = shap.TreeExplainer(classifier).shap_values(X_test)
   shap.summary_plot(shap_values, X_test)
   shap.summary_plot(shap_values[1], X_test)
   return pd.DataFrame(shap_values[1])

It first displays the shap values for the model, and for each prediction after that, and finally it returns the dataframe for the positive class(i'm on an imbalance context)

It is for a Tree explainer and not a waterfall, but it is basically the same.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文