使用形状值选择对每个班级有积极贡献的功能

发布于 2025-02-08 12:29:38 字数 602 浏览 4 评论 0原文

我试图获得对班级重要并具有积极贡献的功能(在Shap图的正面具有红点)。

我可以使用以下代码获得shap_values ,并绘制每个类(例如2类)的形状摘要:

import shap 
explainer = shap.TreeExplainer(clf) 
shap_values = explainer.shap_values(X) 
shap.summary_plot(shap_values[2], X) 

从图中,我可以理解哪些功能对该类很重要。在下面的情节中,我可以说酒精和硫酸盐是主要特征(我对此更感兴趣)。

但是,我想自动化此过程,因此代码可以对功能进行排名(在积极方面很重要)并返回顶部N。关于如何自动化此解释的任何想法?

我需要自动确定每个班级的重要功能。任何其他可以处理此过程的方法而不是摇摆的方法都是理想的。

I am trying to get the features which are important for a class and have a positive contribution (having red points on the positive side of the SHAP plot).

I can get the shap_values and plot the shap summary for each class (e.g. class 2 here) using the following code:

import shap 
explainer = shap.TreeExplainer(clf) 
shap_values = explainer.shap_values(X) 
shap.summary_plot(shap_values[2], X) 

From the plot I can understand which features are important to that class. In the below plot, I can say alcohol and sulphates are the main features (that I am more interested in).

shap summary plot

However, I want to automate this process, so the code can rank the features (which are important on the positive side) and return the top N. Any idea on how to automate this interpretation?

I need to automatically identify those important features for each class. Any other method rather than shap that can handle this process would be ideal.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

人事已非 2025-02-15 12:29:38

您可以执行以下步骤 - 基本上我们只是在尝试仅获得对分类呈现分类的值(shap_values> 0)时,当shap_values< 0时,这意味着不要分类
稍后,您卑鄙并对结果进行排序。
如果您喜欢全局值,则使用.abs()而不是[shap_df> 0]
对于孔模型,仅使用shap_values而不是shap_values ['your_class_number']

import shap 
import pandas as pd 

explainer = shap.TreeExplainer(clf) 
shap_values = explainer.shap_values(X) 
shap_df = pd.DataFrame(shap_values['your_class_number'],columns=X.columns) 
    
feature_importance = (shap_df
                                    [shap_df>0]
                                    .mean()
                                    .sort_values(ascending=False)
                                    .reset_index()
                                    .rename(columns={'index':'feature',0:'weight'})
                                    .head(n)
                                )

You can do the following steps - where basically we are trying to get only the values that effect the classification positively (shap_values>0) when shap_values<0 it means don't classify
Later you take mean and sort the results.
If you prefers the global values then use .abs() instead of [shap_df>0]
and for the hole model use only shap_values instead of shap_values['your_class_number']

import shap 
import pandas as pd 

explainer = shap.TreeExplainer(clf) 
shap_values = explainer.shap_values(X) 
shap_df = pd.DataFrame(shap_values['your_class_number'],columns=X.columns) 
    
feature_importance = (shap_df
                                    [shap_df>0]
                                    .mean()
                                    .sort_values(ascending=False)
                                    .reset_index()
                                    .rename(columns={'index':'feature',0:'weight'})
                                    .head(n)
                                )
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文