如何解决此shap.waterfall_plot错误?
我正在尝试制作瀑布图,以代表类似模型的预测的实例:
ex = shap.Explanation(shap_values[0],
explainer.expected_value,
X.iloc[0],
columns)
ex
ex返回:
.values =
array([-2.27243590e-01, 5.41666667e-02, 3.33333333e-03, 2.21153846e-02,
1.92307692e-04, -7.17948718e-02])
.base_values =
0.21923076923076923
.data =
BMI 18.716444
ROM-PADF-KE_D 33
Asym-ROM-PHIR(≥8)_discr 1
Asym_SLCMJLanding-pVGRF(10percent)_discr 1
Asym_TJ_Valgus_FPPA(10percent)_discr 1
DVJ_Valgus_KneeMedialDisplacement_D_discr 0
Name: 0, dtype: object
但是当我尝试绘制瀑布图时,我会收到该错误
shap.waterfall_plot(ex)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/tmp/ipykernel_4785/3628025354.py in <module>
----> 1 shap.waterfall_plot(ex)
/usr/local/lib/python3.8/dist-packages/shap/plots/_waterfall.py in waterfall(shap_values, max_display, show)
120 yticklabels[rng[i]] = feature_names[order[i]]
121 else:
--> 122 yticklabels[rng[i]] = format_value(features[order[i]], "%0.03f") + " = " + feature_names[order[i]]
123
124 # add a last grouped feature to represent the impact of all the features we didn't show
/usr/local/lib/python3.8/dist-packages/shap/utils/_general.py in format_value(s, format_str)
232 s = format_str % s
233 s = re.sub(r'\.?0+$', '', s)
--> 234 if s[0] == "-":
235 s = u"\u2212" + s[1:]
236 return s
IndexError: string index out of range**strong text**
编辑,以确保最小可重复的错误
:解释器是一个内核解释器:
explainer_2 = shap.KernelExplainer(sci_Model_2.predict, X)
shap_values_2 = explainer.shap_values(X)
x和y是从dataFrames中收取的列表:
y = data_modelo_1_2_csv_encoded['Soft-Tissue_injury_≥4days']
y_list = label_encoder.fit_transform(y)
X = data_modelo_1_2_csv_encoded.drop('Soft-Tissue_injury_≥4days',axis=1)
X_list = X.to_numpy()
该模型是Python的小WEKA模型包装器,将使用python库与Shap这样的WEKA模型使用,这样完成了:
class weka_classifier(BaseEstimator, ClassifierMixin):
def __init__(self, classifier = None, dataset = None):
if classifier is not None:
self.classifier = classifier
if dataset is not None:
self.dataset = dataset
self.dataset.class_is_last()
if index is not None:
self.index = index
def fit(self, X, y):
return self.fit2()
def fit2(self):
return self.classifier.build_classifier(self.dataset)
def predict_instance(self,x):
x.append(0.0)
inst = Instance.create_instance(x,classname='weka.core.DenseInstance', weight=1.0)
inst.dataset = self.dataset
return self.classifier.classify_instance(inst)
def predict_proba_instance(self,x):
x.append(0.0)
inst = Instance.create_instance(x,classname='weka.core.DenseInstance', weight=1.0)
inst.dataset = self.dataset
return self.classifier.distribution_for_instance(inst)
def predict_proba(self,X):
prediction = []
for i in range(X.shape[0]):
instance = []
for j in range(X.shape[1]):
instance.append(X[i][j])
instance.append(0.0)
instance = Instance.create_instance(instance,classname='weka.core.DenseInstance', weight=1.0)
instance.dataset=self.dataset
prediction.append(self.classifier.distribution_for_instance(instance))
return np.asarray(prediction)
def predict(self,X):
prediction = []
for i in range(X.shape[0]):
instance = []
for j in range(X.shape[1]):
instance.append(X[i][j])
instance.append(0.0)
instance = Instance.create_instance(instance,classname='weka.core.DenseInstance', weight=1.0)
instance.dataset=self.dataset
prediction.append(self.classifier.classify_instance(instance))
return np.asarray(prediction)
def set_data(self,dataset):
self.dataset = dataset
self.dataset.class_is_last()
数据库是一个ARFF,负责ARFF,负责ARFF。一个CSV并像数据框一样上传,并具有此变量:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 260 entries, 0 to 259
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 BMI 260 non-null float64
1 ROM-PADF-KE_D 260 non-null int64
2 Asym-ROM-PHIR(≥8)_discr 260 non-null int64
3 Asym_SLCMJLanding-pVGRF(10percent)_discr 260 non-null int64
4 Asym_TJ_Valgus_FPPA(10percent)_discr 260 non-null int64
5 DVJ_Valgus_KneeMedialDisplacement_D_discr 260 non-null int64
6 Soft-Tissue_injury_≥4days 260 non-null category
dtypes: category(1), float64(1), int64(5)
I'm trying to do a waterfall plot form the SHAP library to represent an instance of the predictions of a model like that:
ex = shap.Explanation(shap_values[0],
explainer.expected_value,
X.iloc[0],
columns)
ex
ex returns that:
.values =
array([-2.27243590e-01, 5.41666667e-02, 3.33333333e-03, 2.21153846e-02,
1.92307692e-04, -7.17948718e-02])
.base_values =
0.21923076923076923
.data =
BMI 18.716444
ROM-PADF-KE_D 33
Asym-ROM-PHIR(≥8)_discr 1
Asym_SLCMJLanding-pVGRF(10percent)_discr 1
Asym_TJ_Valgus_FPPA(10percent)_discr 1
DVJ_Valgus_KneeMedialDisplacement_D_discr 0
Name: 0, dtype: object
but when I try to plot the waterfall plot I receive that error
shap.waterfall_plot(ex)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/tmp/ipykernel_4785/3628025354.py in <module>
----> 1 shap.waterfall_plot(ex)
/usr/local/lib/python3.8/dist-packages/shap/plots/_waterfall.py in waterfall(shap_values, max_display, show)
120 yticklabels[rng[i]] = feature_names[order[i]]
121 else:
--> 122 yticklabels[rng[i]] = format_value(features[order[i]], "%0.03f") + " = " + feature_names[order[i]]
123
124 # add a last grouped feature to represent the impact of all the features we didn't show
/usr/local/lib/python3.8/dist-packages/shap/utils/_general.py in format_value(s, format_str)
232 s = format_str % s
233 s = re.sub(r'\.?0+
Edit for minimal reproducible error:
the explainer is a kernel explainer:
explainer_2 = shap.KernelExplainer(sci_Model_2.predict, X)
shap_values_2 = explainer.shap_values(X)
X and y are lists from dataFrames charged like that:
y = data_modelo_1_2_csv_encoded['Soft-Tissue_injury_≥4days']
y_list = label_encoder.fit_transform(y)
X = data_modelo_1_2_csv_encoded.drop('Soft-Tissue_injury_≥4days',axis=1)
X_list = X.to_numpy()
and the model is a little weka model wrapper for python, to use python libraries with weka models like SHAP, done like that:
class weka_classifier(BaseEstimator, ClassifierMixin):
def __init__(self, classifier = None, dataset = None):
if classifier is not None:
self.classifier = classifier
if dataset is not None:
self.dataset = dataset
self.dataset.class_is_last()
if index is not None:
self.index = index
def fit(self, X, y):
return self.fit2()
def fit2(self):
return self.classifier.build_classifier(self.dataset)
def predict_instance(self,x):
x.append(0.0)
inst = Instance.create_instance(x,classname='weka.core.DenseInstance', weight=1.0)
inst.dataset = self.dataset
return self.classifier.classify_instance(inst)
def predict_proba_instance(self,x):
x.append(0.0)
inst = Instance.create_instance(x,classname='weka.core.DenseInstance', weight=1.0)
inst.dataset = self.dataset
return self.classifier.distribution_for_instance(inst)
def predict_proba(self,X):
prediction = []
for i in range(X.shape[0]):
instance = []
for j in range(X.shape[1]):
instance.append(X[i][j])
instance.append(0.0)
instance = Instance.create_instance(instance,classname='weka.core.DenseInstance', weight=1.0)
instance.dataset=self.dataset
prediction.append(self.classifier.distribution_for_instance(instance))
return np.asarray(prediction)
def predict(self,X):
prediction = []
for i in range(X.shape[0]):
instance = []
for j in range(X.shape[1]):
instance.append(X[i][j])
instance.append(0.0)
instance = Instance.create_instance(instance,classname='weka.core.DenseInstance', weight=1.0)
instance.dataset=self.dataset
prediction.append(self.classifier.classify_instance(instance))
return np.asarray(prediction)
def set_data(self,dataset):
self.dataset = dataset
self.dataset.class_is_last()
the database is an arff charged to a csv and uploaded like a dataframe with this variables:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 260 entries, 0 to 259
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 BMI 260 non-null float64
1 ROM-PADF-KE_D 260 non-null int64
2 Asym-ROM-PHIR(≥8)_discr 260 non-null int64
3 Asym_SLCMJLanding-pVGRF(10percent)_discr 260 non-null int64
4 Asym_TJ_Valgus_FPPA(10percent)_discr 260 non-null int64
5 DVJ_Valgus_KneeMedialDisplacement_D_discr 260 non-null int64
6 Soft-Tissue_injury_≥4days 260 non-null category
dtypes: category(1), float64(1), int64(5)
, '', s)
--> 234 if s[0] == "-":
235 s = u"\u2212" + s[1:]
236 return s
IndexError: string index out of range**strong text**
Edit for minimal reproducible error:
the explainer is a kernel explainer:
X and y are lists from dataFrames charged like that:
and the model is a little weka model wrapper for python, to use python libraries with weka models like SHAP, done like that:
the database is an arff charged to a csv and uploaded like a dataframe with this variables:
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的问题可能是
0
.data 字段中的是字符串,如果是数字。我可以使用
format_value('0',“%0.03F”)
重现相同的错误。查看当前我们可以看到,它从字符串中删除了所有尾随零,尤其是
format_value('100',“%0.03F”)
给出1
。这是一个错误,应替换正则义务(例如,使用: https://stackoverflow.com/A )
请注意,当您提供一个数字(例如100或0)时,该数字首先替换为字符串(
100.000
或0.000
),因此该函数不会用数字(int或float)调用时显示其错误。此外,
shap
的开发版本(尚未发布)不会遭受此问题的困扰,因为当调用非数字值时,函数waterfall_plot
不会调用format_value
,请参阅:/Github.com/slundberg/shap/issues/2581#issuecomment-11555134604“ rel =“ nofollow noreferrer”> https://github.com/slundberg/slundberg/shap/shap/shap/shap/shap/shap/shap/2581#issuecomment-11555134604604
likely your issue is that
0
in your.data
field is a string instead if a number.I can reproduce the same error with
format_value('0', "%0.03f")
.Looking at current
format_value
we can see that it removes all trailing zeros from a string and in particularformat_value('100', "%0.03f")
gives1
.This is a bug and that the regex should be replaced (for example with this: https://stackoverflow.com/a/26299205/4178189)
Note that when you supply a number (e.g. 100 or 0) the number is first replaced with a string (
100.000
or0.000
) so the function does not show its bug when called with a number (int or float).Also the development version of
shap
(not yet released), would not suffer from this issue since when called with a non number value the functionwaterfall_plot
would not callformat_value
, see: https://github.com/slundberg/shap/blob/8926cd0122d0a1b3cca0768f2c386de706090668/shap/plots/_waterfall.py#L127note: this question is also a github issue, see https://github.com/slundberg/shap/issues/2581#issuecomment-1155134604