使用石灰来解释深神经网进行欺诈检测

发布于 2025-02-05 16:31:13 字数 1777 浏览 3 评论 0原文

我建立了一个深层神经网络，该网络对欺诈性交易进行了分类。我正在尝试使用石灰进行解释，但是正在面临driventor.explain_instance（）函数的错误。

完整的代码如下：

import lime
from lime import lime_tabular

interpretor = lime_tabular.LimeTabularExplainer(
    training_data=x_train_scaled,
    feature_names=X_train.columns,
    mode='classification'
)

exp = interpretor.explain_instance(
    data_row=x_test_scaled[:1], ##new data
    predict_fn=model.predict,num_features=11
)
xp.show_in_notebook(show_table=True)

这引发错误：

--
IndexError                                Traceback (most recent call last)
/tmp/ipykernel_33/1730959582.py in <module>
      1 exp = interpretor.explain_instance(
      2     data_row=x_test_scaled[1], ##new data
----> 3     predict_fn=model.predict
      4 )
      5 

/opt/conda/lib/python3.7/site-packages/lime/lime_tabular.py in explain_instance(self, data_row, predict_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor)
    457                     num_features,
    458                     model_regressor=model_regressor,
--> 459                     feature_selection=self.feature_selection)
    460 
    461         if self.mode == "regression":

/opt/conda/lib/python3.7/site-packages/lime/lime_base.py in explain_instance_with_data(self, neighborhood_data, neighborhood_labels, distances, label, num_features, feature_selection, model_regressor)
    180 
    181         weights = self.kernel_fn(distances)
--> 182         labels_column = neighborhood_labels[:, label]
    183         used_features = self.feature_selection(neighborhood_data,
    184                                                labels_column,

IndexError: index 1 is out of bounds for axis 1 with size 1

原文

I have built a deep neural network which classifies fraudulent transactions. I am trying to use LIME for explanation, but am facing an error from the interpretor.explain_instance() function.

The complete code is as follows:

import lime
from lime import lime_tabular

interpretor = lime_tabular.LimeTabularExplainer(
    training_data=x_train_scaled,
    feature_names=X_train.columns,
    mode='classification'
)

exp = interpretor.explain_instance(
    data_row=x_test_scaled[:1], ##new data
    predict_fn=model.predict,num_features=11
)
xp.show_in_notebook(show_table=True)

This throws the error:

--
IndexError                                Traceback (most recent call last)
/tmp/ipykernel_33/1730959582.py in <module>
      1 exp = interpretor.explain_instance(
      2     data_row=x_test_scaled[1], ##new data
----> 3     predict_fn=model.predict
      4 )
      5 

/opt/conda/lib/python3.7/site-packages/lime/lime_tabular.py in explain_instance(self, data_row, predict_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor)
    457                     num_features,
    458                     model_regressor=model_regressor,
--> 459                     feature_selection=self.feature_selection)
    460 
    461         if self.mode == "regression":

/opt/conda/lib/python3.7/site-packages/lime/lime_base.py in explain_instance_with_data(self, neighborhood_data, neighborhood_labels, distances, label, num_features, feature_selection, model_regressor)
    180 
    181         weights = self.kernel_fn(distances)
--> 182         labels_column = neighborhood_labels[:, label]
    183         used_features = self.feature_selection(neighborhood_data,
    184                                                labels_column,

IndexError: index 1 is out of bounds for axis 1 with size 1

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

三人与歌 2025-02-12 16:31:13

添加labels =（0，） in exp = eplainer.explain_instance（）可能可以解决您的问题。

exp = interpretor.explain_instance(
data_row=x_test_scaled[:1], ##new data
predict_fn=model.predict,
num_features=11,
labels=(0,)
)

我在试图预测良性或恶性肿瘤的乳腺癌数据上也有类似的问题。包含记录的样本的列标题为benign_0_malignant_1，每行放置0或1。

Adding labels=(0,) in exp = eplainer.explain_instance() might resolve your issue.

exp = interpretor.explain_instance(
data_row=x_test_scaled[:1], ##new data
predict_fn=model.predict,
num_features=11,
labels=(0,)
)

I had a similar issue with breast cancer data trying to predict benign or malignant tumors. The column that contains the recorded sample is titled benign_0_malignant_1 with either 0 or 1 placed in each row.

回复收藏 0 原文

机场等船 2025-02-12 16:31:13

我认为问题是您正在使用2D数组，但是根据文档 dimend_instance（）期望该实例是1D数组。

请注意，来自2D数组的单行切片本身是2D：

>>> import numpy as np
>>> arr = np.array([[1, 2], [3, 4], [5, 6]])
>>> arr[:1]
array([[1, 2]])

另一个问题是预测函数，预计会产生一系列概率，而不是单个预测。同样，文档解释（我的重点）：

classifier_fn - 分类器预测概率函数，它采用numpy数组并输出预测概率。对于ScikitClassifiers，这是classifier.predict_proba。

要解决这些问题，请使用简单的索引中的x_test_scaled而不是切片，然后将model.predict_proba作为预测函数：

exp = interpretor.explain_instance(data_row=x_test_scaled[0],
                                   predict_fn=model.predict_proba,
                                   num_features=11
                                   )

I think the problem is that you are passing in a 2D array, but according to the docs explain_instance() is expecting the instance as a 1D array.

Note that a single-row slice from a 2D array is itself 2D:

>>> import numpy as np
>>> arr = np.array([[1, 2], [3, 4], [5, 6]])
>>> arr[:1]
array([[1, 2]])

The other issue is the prediction function, which is expected to produce an array of probabilities, not a single prediction. Again, the docs explain (my emphasis):

classifier_fn – classifier prediction probability function, which takes a numpy array and outputs prediction probabilities. For ScikitClassifiers , this is classifier.predict_proba.

To fix these things, use a simple index into x_test_scaled instead of a slice, and pass model.predict_proba as the prediction function:

exp = interpretor.explain_instance(data_row=x_test_scaled[0],
                                   predict_fn=model.predict_proba,
                                   num_features=11
                                   )

回复收藏 0 原文

~没有更多了~