make_scorer(roc_auc_score) 不等于预定义的记分器“roc_auc”;
我一直在使用 GridSearchCV 来优化二元分类器的一些参数。我希望在几乎不会产生任何误报但仍然达到很高的真阳性率的情况下操作分类器。简而言之:优化 TPR,同时将 FPR 限制为 0(或接近)。
因此,我想稍微修改一下 roc_auc_score 作为 GridSearchCV 中的记分器参数。
clf1= SVC()
# define grid-space (obviously i would use a biger grid for the actual optimization)
grid1 = {'C':[1, 1000], 'kernel': ['poly'], 'degree' : [3], 'class_weight': ['balanced'], 'probability':[True]}
#define scoring function: Since we want to keep FPR = 0 we calculate the roc curve only between FPR = [0, 0.0001] (instead of [0, 1]
roc_spec = make_scorer(roc_auc_score, max_fpr=0.001)#define roc score for the unsave class
grid_clf_acc = GridSearchCV(clf1, param_grid = grid1 , scoring = roc_spec, n_jobs = -1, cv=cross_validation_folds)
grid_clf_acc.fit(X_train, y_train)
正如您所看到的,我通过将 max_fpr 设置为 0.001 来调整 sklearn 的标准 roc_auc_score。
如果我现在运行网格搜索,不幸的是该算法不再使用多个置信阈值来计算 roc_score,而是仅使用一个置信阈值。
另一方面,如果我不使用“自制”记分器并使用 Gridsearch 和预先实现的 roc_auc_score,则该算法确实使用多个阈值来计算 auc_roc_score。
grid_clf_acc = GridSearchCV(clf1, param_grid = grid1 , scoring = 'roc_auc', n_jobs = -1, cv=cross_validation_folds)
因此,稍微修改后的 roc_auc_score 与原始 roc_auc_score 的功能不同。这是一个错误,还是我在定义自己的得分手时犯了一个错误?
(备注:
- 在这个例子中我使用了 max_fpr=0.001。即使我将其设置为 1,它仍然只根据一个阈值计算 roc_auc 分数。
- 我还尝试了 make_scorer 函数的两个参数(needs_thresh & 或need_proba),但他们都没有解决问题。
最后我分享了一张图像,它显示了我为定位问题而制作的两个 ROC。左边的图像显示了一个 ROC。对于使用多个阈值生成的分类器,顶部的数字是计算出的 ROC 分数。该分数与我使用自定义评分器时在 GridSearch 中获得的分数不匹配。在右侧,我绘制了仅使用一个阈值生成的分类器的 ROC(--> 我使用了 Predict 而不是 Predict_prob)。使用自定义评分器时 GridSearchCV 的 ROC_AUC 分数。 net/H7H6B.png" alt="在此处输入图像描述">
I've been using GridSearchCV to optimize some parameters for a binary classifier. I want to operate the classifier at a point where it barely makes any false positives but reaches still a high true positive rate. So in short: Optimize TPR while restricting FPR to 0 (or close to).
Therefore I wanted to a slightly adapted roc_auc_score as scorer argument in the GridSearchCV.
clf1= SVC()
# define grid-space (obviously i would use a biger grid for the actual optimization)
grid1 = {'C':[1, 1000], 'kernel': ['poly'], 'degree' : [3], 'class_weight': ['balanced'], 'probability':[True]}
#define scoring function: Since we want to keep FPR = 0 we calculate the roc curve only between FPR = [0, 0.0001] (instead of [0, 1]
roc_spec = make_scorer(roc_auc_score, max_fpr=0.001)#define roc score for the unsave class
grid_clf_acc = GridSearchCV(clf1, param_grid = grid1 , scoring = roc_spec, n_jobs = -1, cv=cross_validation_folds)
grid_clf_acc.fit(X_train, y_train)
As you can see, I've adapted sklearn's standard roc_auc_score by setting it's max_fpr to 0.001.
If I now run the grid search, unfortunately the algorithm does not use multiple confidence thresholds anymore to compute the roc_score but uses only one confidence threshold instead.
On the other hand if I dont use the 'selfmade' scorer and use Gridsearch with the preimplemented roc_auc_score, the algorithm does indeed use multiple thresholds to compute the auc_roc_score.
grid_clf_acc = GridSearchCV(clf1, param_grid = grid1 , scoring = 'roc_auc', n_jobs = -1, cv=cross_validation_folds)
So somehow, the slightly adapted roc_auc_score has not the same capabilites as the original roc_auc_score. Is this a bug, or am I making a mistake when I define my own scorer?
(Remarks:
- In this example I've used max_fpr=0.001. Even if I set it to 1, it still does calculate the roc_auc score based on one threshold only.
- I also tried the two arguments of the make_scorer function (needs_thresh & or needs_proba) but neither of them solved the problem.
Finally I share an image, that shows two ROC's that I made to localize the problem. The left one shows an ROC for the classifier that was generated with multiple thresholds. The number on top is the calculated ROC score. This score did not match the score i've got in the GridSearch when using the customized scorer. However it did match the the score when I used the preimplemented scorer. On the right i plotted an ROC for the classifer that was generated with one threshold only (--> I've used predict instead of predict_prob). The calculated ROC did indeed match the calculated but "faulty" ROC_AUC score of the GridSearchCV when using the customized scorer.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我发现了我的错误。最终起作用的是按如下方式初始化记分器:
然后我还必须在 SVC 中设置 probabilty=True:
这使其起作用。
I have found my mistake. What finally worked was initializing the scorer as following:
Then i also had to set probabilty=True in the SVC:
This made it work.