如何解决此shap.waterfall_plot错误？

发布于 2025-02-04 02:23:09 字数 5427 浏览 5 评论 0原文

我正在尝试制作瀑布图，以代表类似模型的预测的实例：

ex = shap.Explanation(shap_values[0], 
                      explainer.expected_value,
                      X.iloc[0],  
                      columns)

ex

ex返回：

.values =
array([-2.27243590e-01,  5.41666667e-02,  3.33333333e-03,  2.21153846e-02,
        1.92307692e-04, -7.17948718e-02])

.base_values =
0.21923076923076923

.data =
BMI                                          18.716444
ROM-PADF-KE_D                                       33
Asym-ROM-PHIR(≥8)_discr                              1
Asym_SLCMJLanding-pVGRF(10percent)_discr             1
Asym_TJ_Valgus_FPPA(10percent)_discr                 1
DVJ_Valgus_KneeMedialDisplacement_D_discr            0
Name: 0, dtype: object

但是当我尝试绘制瀑布图时，我会收到该错误

shap.waterfall_plot(ex)

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/tmp/ipykernel_4785/3628025354.py in <module>
----> 1 shap.waterfall_plot(ex)

/usr/local/lib/python3.8/dist-packages/shap/plots/_waterfall.py in waterfall(shap_values, max_display, show)
    120             yticklabels[rng[i]] = feature_names[order[i]]
    121         else:
--> 122             yticklabels[rng[i]] = format_value(features[order[i]], "%0.03f") + " = " + feature_names[order[i]]
    123 
    124     # add a last grouped feature to represent the impact of all the features we didn't show

/usr/local/lib/python3.8/dist-packages/shap/utils/_general.py in format_value(s, format_str)
    232         s = format_str % s
    233     s = re.sub(r'\.?0+$', '', s)
--> 234     if s[0] == "-":
    235         s = u"\u2212" + s[1:]
    236     return s

IndexError: string index out of range**strong text**

编辑，以确保最小可重复的错误

：解释器是一个内核解释器：

explainer_2 = shap.KernelExplainer(sci_Model_2.predict, X)
shap_values_2 = explainer.shap_values(X)

x和y是从dataFrames中收取的列表：

y = data_modelo_1_2_csv_encoded['Soft-Tissue_injury_≥4days']
y_list = label_encoder.fit_transform(y)

X = data_modelo_1_2_csv_encoded.drop('Soft-Tissue_injury_≥4days',axis=1)
X_list = X.to_numpy()

该模型是Python的小WEKA模型包装器，将使用python库与Shap这样的WEKA模型使用，这样完成了：

class weka_classifier(BaseEstimator, ClassifierMixin):
    
    def __init__(self, classifier = None, dataset = None):
        if classifier is not None:
            self.classifier = classifier
        if dataset is not None:
            self.dataset = dataset
            self.dataset.class_is_last()
        if index is not None:
            self.index = index
               
    def fit(self, X, y):
        return self.fit2()
    
    def fit2(self):
        return self.classifier.build_classifier(self.dataset)
    
    def predict_instance(self,x):
        x.append(0.0)
        inst = Instance.create_instance(x,classname='weka.core.DenseInstance', weight=1.0)
        inst.dataset = self.dataset
        
        return self.classifier.classify_instance(inst)
    
    def predict_proba_instance(self,x):
        x.append(0.0)
        inst = Instance.create_instance(x,classname='weka.core.DenseInstance', weight=1.0)
        inst.dataset = self.dataset
        
        return self.classifier.distribution_for_instance(inst)
    
    def predict_proba(self,X):
        prediction = []

        for i in range(X.shape[0]):
            instance = []
            for j in range(X.shape[1]):
                instance.append(X[i][j])
            instance.append(0.0)
            instance = Instance.create_instance(instance,classname='weka.core.DenseInstance', weight=1.0)
            instance.dataset=self.dataset
            prediction.append(self.classifier.distribution_for_instance(instance))

        return np.asarray(prediction)    
    
    def predict(self,X):
        prediction = []
        
        for i in range(X.shape[0]):
            instance = []
            for j in range(X.shape[1]):
                instance.append(X[i][j])
            instance.append(0.0)
            instance = Instance.create_instance(instance,classname='weka.core.DenseInstance', weight=1.0)
            instance.dataset=self.dataset
            prediction.append(self.classifier.classify_instance(instance))
            
        return np.asarray(prediction)
    

    def set_data(self,dataset):
        self.dataset = dataset
        self.dataset.class_is_last()

数据库是一个ARFF，负责ARFF，负责ARFF。一个CSV并像数据框一样上传，并具有此变量：

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 260 entries, 0 to 259
Data columns (total 7 columns):
 #   Column                                     Non-Null Count  Dtype   
---  ------                                     --------------  -----   
 0   BMI                                        260 non-null    float64 
 1   ROM-PADF-KE_D                              260 non-null    int64   
 2   Asym-ROM-PHIR(≥8)_discr                    260 non-null    int64   
 3   Asym_SLCMJLanding-pVGRF(10percent)_discr   260 non-null    int64   
 4   Asym_TJ_Valgus_FPPA(10percent)_discr       260 non-null    int64   
 5   DVJ_Valgus_KneeMedialDisplacement_D_discr  260 non-null    int64   
 6   Soft-Tissue_injury_≥4days                  260 non-null    category
dtypes: category(1), float64(1), int64(5)

原文

I'm trying to do a waterfall plot form the SHAP library to represent an instance of the predictions of a model like that:

ex = shap.Explanation(shap_values[0], 
                      explainer.expected_value,
                      X.iloc[0],  
                      columns)

ex

ex returns that:

.values =
array([-2.27243590e-01,  5.41666667e-02,  3.33333333e-03,  2.21153846e-02,
        1.92307692e-04, -7.17948718e-02])

.base_values =
0.21923076923076923

.data =
BMI                                          18.716444
ROM-PADF-KE_D                                       33
Asym-ROM-PHIR(≥8)_discr                              1
Asym_SLCMJLanding-pVGRF(10percent)_discr             1
Asym_TJ_Valgus_FPPA(10percent)_discr                 1
DVJ_Valgus_KneeMedialDisplacement_D_discr            0
Name: 0, dtype: object

but when I try to plot the waterfall plot I receive that error

shap.waterfall_plot(ex)

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/tmp/ipykernel_4785/3628025354.py in <module>
----> 1 shap.waterfall_plot(ex)

/usr/local/lib/python3.8/dist-packages/shap/plots/_waterfall.py in waterfall(shap_values, max_display, show)
    120             yticklabels[rng[i]] = feature_names[order[i]]
    121         else:
--> 122             yticklabels[rng[i]] = format_value(features[order[i]], "%0.03f") + " = " + feature_names[order[i]]
    123 
    124     # add a last grouped feature to represent the impact of all the features we didn't show

/usr/local/lib/python3.8/dist-packages/shap/utils/_general.py in format_value(s, format_str)
    232         s = format_str % s
    233     s = re.sub(r'\.?0+
Edit for minimal reproducible error:
the explainer is a kernel explainer:
explainer_2 = shap.KernelExplainer(sci_Model_2.predict, X)
shap_values_2 = explainer.shap_values(X)

X and y are lists from dataFrames charged like that:
y = data_modelo_1_2_csv_encoded['Soft-Tissue_injury_≥4days']
y_list = label_encoder.fit_transform(y)

X = data_modelo_1_2_csv_encoded.drop('Soft-Tissue_injury_≥4days',axis=1)
X_list = X.to_numpy()

and the model is a little weka model wrapper for python, to use python libraries with weka models like SHAP, done like that:
class weka_classifier(BaseEstimator, ClassifierMixin):
    
    def __init__(self, classifier = None, dataset = None):
        if classifier is not None:
            self.classifier = classifier
        if dataset is not None:
            self.dataset = dataset
            self.dataset.class_is_last()
        if index is not None:
            self.index = index
               
    def fit(self, X, y):
        return self.fit2()
    
    def fit2(self):
        return self.classifier.build_classifier(self.dataset)
    
    def predict_instance(self,x):
        x.append(0.0)
        inst = Instance.create_instance(x,classname='weka.core.DenseInstance', weight=1.0)
        inst.dataset = self.dataset
        
        return self.classifier.classify_instance(inst)
    
    def predict_proba_instance(self,x):
        x.append(0.0)
        inst = Instance.create_instance(x,classname='weka.core.DenseInstance', weight=1.0)
        inst.dataset = self.dataset
        
        return self.classifier.distribution_for_instance(inst)
    
    def predict_proba(self,X):
        prediction = []

        for i in range(X.shape[0]):
            instance = []
            for j in range(X.shape[1]):
                instance.append(X[i][j])
            instance.append(0.0)
            instance = Instance.create_instance(instance,classname='weka.core.DenseInstance', weight=1.0)
            instance.dataset=self.dataset
            prediction.append(self.classifier.distribution_for_instance(instance))

        return np.asarray(prediction)    
    
    def predict(self,X):
        prediction = []
        
        for i in range(X.shape[0]):
            instance = []
            for j in range(X.shape[1]):
                instance.append(X[i][j])
            instance.append(0.0)
            instance = Instance.create_instance(instance,classname='weka.core.DenseInstance', weight=1.0)
            instance.dataset=self.dataset
            prediction.append(self.classifier.classify_instance(instance))
            
        return np.asarray(prediction)
    

    def set_data(self,dataset):
        self.dataset = dataset
        self.dataset.class_is_last()

the database is an arff charged to a csv and uploaded like a dataframe with this variables:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 260 entries, 0 to 259
Data columns (total 7 columns):
 #   Column                                     Non-Null Count  Dtype   
---  ------                                     --------------  -----   
 0   BMI                                        260 non-null    float64 
 1   ROM-PADF-KE_D                              260 non-null    int64   
 2   Asym-ROM-PHIR(≥8)_discr                    260 non-null    int64   
 3   Asym_SLCMJLanding-pVGRF(10percent)_discr   260 non-null    int64   
 4   Asym_TJ_Valgus_FPPA(10percent)_discr       260 non-null    int64   
 5   DVJ_Valgus_KneeMedialDisplacement_D_discr  260 non-null    int64   
 6   Soft-Tissue_injury_≥4days                  260 non-null    category
dtypes: category(1), float64(1), int64(5)

, '', s)
--> 234     if s[0] == "-":
    235         s = u"\u2212" + s[1:]
    236     return s

IndexError: string index out of range**strong text**

Edit for minimal reproducible error:

the explainer is a kernel explainer:

X and y are lists from dataFrames charged like that:

and the model is a little weka model wrapper for python, to use python libraries with weka models like SHAP, done like that:

the database is an arff charged to a csv and uploaded like a dataframe with this variables:

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

短叹 2025-02-11 02:23:09

您的问题可能是0 .data 字段中的是字符串，如果是数字。
我可以使用format_value（'0'，“％0.03F”）重现相同的错误。

查看当前我们可以看到，它从字符串中删除了所有尾随零，尤其是format_value（'100'，“％0.03F”）给出1。
这是一个错误，应替换正则义务（例如，使用： https：//stackoverflow.com/A ）

请注意，当您提供一个数字（例如100或0）时，该数字首先替换为字符串（100.000或0.000），因此该函数不会用数字（int或float）调用时显示其错误。

此外，shap的开发版本（尚未发布）不会遭受此问题的困扰，因为当调用非数字值时，函数waterfall_plot不会调用format_value，请参阅：

/Github.com/slundberg/shap/issues/2581#issuecomment-11555134604“ rel =“ nofollow noreferrer”> https：//github.com/slundberg/slundberg/shap/shap/shap/shap/shap/shap/shap/2581#issuecomment-11555134604604