使用石灰来解释深神经网进行欺诈检测

发布于 2025-02-05 16:31:13 字数 1777 浏览 3 评论 0原文

我建立了一个深层神经网络,该网络对欺诈性交易进行了分类。我正在尝试使用石灰进行解释,但是正在面临driventor.explain_instance()函数的错误。

完整的代码如下:

import lime
from lime import lime_tabular

interpretor = lime_tabular.LimeTabularExplainer(
    training_data=x_train_scaled,
    feature_names=X_train.columns,
    mode='classification'
)

exp = interpretor.explain_instance(
    data_row=x_test_scaled[:1], ##new data
    predict_fn=model.predict,num_features=11
)
xp.show_in_notebook(show_table=True)

这引发错误:


--
IndexError                                Traceback (most recent call last)
/tmp/ipykernel_33/1730959582.py in <module>
      1 exp = interpretor.explain_instance(
      2     data_row=x_test_scaled[1], ##new data
----> 3     predict_fn=model.predict
      4 )
      5 

/opt/conda/lib/python3.7/site-packages/lime/lime_tabular.py in explain_instance(self, data_row, predict_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor)
    457                     num_features,
    458                     model_regressor=model_regressor,
--> 459                     feature_selection=self.feature_selection)
    460 
    461         if self.mode == "regression":

/opt/conda/lib/python3.7/site-packages/lime/lime_base.py in explain_instance_with_data(self, neighborhood_data, neighborhood_labels, distances, label, num_features, feature_selection, model_regressor)
    180 
    181         weights = self.kernel_fn(distances)
--> 182         labels_column = neighborhood_labels[:, label]
    183         used_features = self.feature_selection(neighborhood_data,
    184                                                labels_column,

IndexError: index 1 is out of bounds for axis 1 with size 1

I have built a deep neural network which classifies fraudulent transactions. I am trying to use LIME for explanation, but am facing an error from the interpretor.explain_instance() function.

The complete code is as follows:

import lime
from lime import lime_tabular

interpretor = lime_tabular.LimeTabularExplainer(
    training_data=x_train_scaled,
    feature_names=X_train.columns,
    mode='classification'
)

exp = interpretor.explain_instance(
    data_row=x_test_scaled[:1], ##new data
    predict_fn=model.predict,num_features=11
)
xp.show_in_notebook(show_table=True)

This throws the error:


--
IndexError                                Traceback (most recent call last)
/tmp/ipykernel_33/1730959582.py in <module>
      1 exp = interpretor.explain_instance(
      2     data_row=x_test_scaled[1], ##new data
----> 3     predict_fn=model.predict
      4 )
      5 

/opt/conda/lib/python3.7/site-packages/lime/lime_tabular.py in explain_instance(self, data_row, predict_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor)
    457                     num_features,
    458                     model_regressor=model_regressor,
--> 459                     feature_selection=self.feature_selection)
    460 
    461         if self.mode == "regression":

/opt/conda/lib/python3.7/site-packages/lime/lime_base.py in explain_instance_with_data(self, neighborhood_data, neighborhood_labels, distances, label, num_features, feature_selection, model_regressor)
    180 
    181         weights = self.kernel_fn(distances)
--> 182         labels_column = neighborhood_labels[:, label]
    183         used_features = self.feature_selection(neighborhood_data,
    184                                                labels_column,

IndexError: index 1 is out of bounds for axis 1 with size 1

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

三人与歌 2025-02-12 16:31:13

添加labels =(0,) in exp = eplainer.explain_instance()可能可以解决您的问题。

exp = interpretor.explain_instance(
data_row=x_test_scaled[:1], ##new data
predict_fn=model.predict,
num_features=11,
labels=(0,)
)

我在试图预测良性或恶性肿瘤的乳腺癌数据上也有类似的问题。包含记录的样本的列标题为benign_0_malignant_1,每行放置0或1。

Adding labels=(0,) in exp = eplainer.explain_instance() might resolve your issue.

exp = interpretor.explain_instance(
data_row=x_test_scaled[:1], ##new data
predict_fn=model.predict,
num_features=11,
labels=(0,)
)

I had a similar issue with breast cancer data trying to predict benign or malignant tumors. The column that contains the recorded sample is titled benign_0_malignant_1 with either 0 or 1 placed in each row.

机场等船 2025-02-12 16:31:13

我认为问题是您正在使用2D数组,但是根据文档 dimend_instance()期望该实例是1D数组。

请注意,来自2D数组的单行切片本身是2D:

>>> import numpy as np
>>> arr = np.array([[1, 2], [3, 4], [5, 6]])
>>> arr[:1]
array([[1, 2]])

另一个问题是预测函数,预计会产生一系列概率,而不是单个预测。同样,文档解释(我的重点):

classifier_fn - 分类器预测概率函数,它采用numpy数组并输出预测概率。对于ScikitClassifiers,这是classifier.predict_proba

要解决这些问题,请使用简单的索引中的x_test_scaled而不是切片,然后将model.predict_proba作为预测函数:

exp = interpretor.explain_instance(data_row=x_test_scaled[0],
                                   predict_fn=model.predict_proba,
                                   num_features=11
                                   )

I think the problem is that you are passing in a 2D array, but according to the docs explain_instance() is expecting the instance as a 1D array.

Note that a single-row slice from a 2D array is itself 2D:

>>> import numpy as np
>>> arr = np.array([[1, 2], [3, 4], [5, 6]])
>>> arr[:1]
array([[1, 2]])

The other issue is the prediction function, which is expected to produce an array of probabilities, not a single prediction. Again, the docs explain (my emphasis):

classifier_fn – classifier prediction probability function, which takes a numpy array and outputs prediction probabilities. For ScikitClassifiers , this is classifier.predict_proba.

To fix these things, use a simple index into x_test_scaled instead of a slice, and pass model.predict_proba as the prediction function:

exp = interpretor.explain_instance(data_row=x_test_scaled[0],
                                   predict_fn=model.predict_proba,
                                   num_features=11
                                   )
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文