如何输出概率中的 Shap 值并从二元分类器制作force_plot
我需要绘制每个特征如何影响我的 LightGBM 二元分类器中每个样本的预测概率。所以我需要以概率的形式输出Shap值,而不是正常的Shap值。它似乎没有任何概率输出选项。
下面的示例代码是我用来生成 Shap 值的数据帧并为第一个数据样本执行 force_plot
的代码。有谁知道我应该如何修改代码来改变输出? 我是 Shap 值和 Shap 包的新手。预先非常感谢。
import pandas as pd
import numpy as np
import shap
import lightgbm as lgbm
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = lgbm.LGBMClassifier()
model.fit(X_train, y_train)
explainer = shap.TreeExplainer(model)
shap_values = explainer(X_train)
# force plot of first row for class 1
class_idx = 1
row_idx = 0
expected_value = explainer.expected_value[class_idx]
shap_value = shap_values[:,:,class_idx].values[row_idx]
shap.force_plot (base_value = expected_value, shap_values = shap_value, features = X_train.iloc[row_idx, :], matplotlib=True)
# dataframe of shap values for class 1
shap_df = pd.DataFrame(shap_values[:,:, 1 ].values, columns = shap_values.feature_names)
I need to plot how each feature impacts the predicted probability for each sample from my LightGBM
binary classifier. So I need to output Shap values in probability, instead of normal Shap values. It does not appear to have any options to output in term of probability.
The example code below is what I use to generate dataframe of Shap values and do a force_plot
for the first data sample. Does anyone know how I should modify the code to change the output?
I'm new to Shap value and the Shap package. Thanks a lot in advance.
import pandas as pd
import numpy as np
import shap
import lightgbm as lgbm
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = lgbm.LGBMClassifier()
model.fit(X_train, y_train)
explainer = shap.TreeExplainer(model)
shap_values = explainer(X_train)
# force plot of first row for class 1
class_idx = 1
row_idx = 0
expected_value = explainer.expected_value[class_idx]
shap_value = shap_values[:,:,class_idx].values[row_idx]
shap.force_plot (base_value = expected_value, shap_values = shap_value, features = X_train.iloc[row_idx, :], matplotlib=True)
# dataframe of shap values for class 1
shap_df = pd.DataFrame(shap_values[:,:, 1 ].values, columns = shap_values.feature_names)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
TL;DR:
您可以使用
force_plot
方法中的link="logit"
在概率空间中获得绘图结果:预期输出:
或者,您可以通过以下明确实现相同的效果指定您有兴趣解释的
model_output="probability"
:预期输出:
< img src="https://i.sstatic.net/YJRYv.png" alt="在此处输入图像描述">
但是,了解此处发生的情况以找出这些数字的来源可能更有趣来自:
X_train
作为背景的模型的基本案例原始数据(注意,LightGBM
输出类1
的原始数据):X_train
的原始基本案例code>SHAP (注意,它们是对称的):SHAP
值:SHAP
值推断的 Proba(通过 sigmoid):如果有不清楚的地方,请提问。
旁注
您可能会发现有用:
二元分类中的特征重要性并仅提取其中一个类的 SHAP 值答案< /p>
使用SHAP时如何解释GBT分类器的base_value? 答案< /p>
TL;DR:
You can achieve plotting results in probability space with
link="logit"
in theforce_plot
method:Expected output:
Alternatively, you may achieve the same with the following, explicitly specifying
model_output="probability"
you're interested in to explain:Expected output:
However, it might be more interesting for understanding what's happening here to find out where these figures come from:
X_train
as background (note,LightGBM
outputs raw for class1
):SHAP
(note, they are symmetric):SHAP
values for the point of interest:SHAP
values (via sigmoid):Please ask questions if something is not clear.
Side notes
You may find useful:
Feature importance in a binary classification and extracting SHAP values for one of the classes only answer
How to interpret base_value of GBT classifier when using SHAP? answer
您可以考虑通过 softmax() 函数运行输出值。作为参考,它被定义为:
并且还有一个 scipy 实现:
softmax() 的输出将是与向量 x 中的(相对)值成比例的概率,即您的商店值。
You can consider running your output values through a softmax() function. For reference, it is defined as :
and there is a scipy implementation as well:
The output from softmax() will be probabilities proportional to the (relative) values in vector x, which are your shop values.
我帮助您实现了这一目标并验证了结果的可靠性。
I helped you achieve it and verified the reliability of the results.