手动roc曲线doen; t与sklearn.metrics匹配

发布于 2025-01-24 10:33:26 字数 2748 浏览 5 评论 0原文

import numpy as np
from sklearn.metrics import roc_curve
from sklearn.preprocessing import binarize
import matplotlib.pyplot as plt 
from sklearn.metrics import confusion_matrix
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
from sklearn.metrics import accuracy_score, recall_score, precision_score

数据

y_pred = np.array([0.4, 0.2, 0.3, 0.6, 0.1, 0.3, 0.7, 0.2, 0.3, 0.8, 0.3, 0.9, 0.3, 0.2, 0.2, 
                   0.4, 0.9, 0.4, 0.3, 0.6, 0.7, 0.2, 0.8, 0.2, 0.6, 0.1, 0.1])

y_test =np.array(["No","No","No","Yes","No","No","Yes","No","No","Yes","No","Yes",
                  "No","No","No", "No","Yes","No","No","No","No","Yes",
                  "No","Yes","No","No","No"])

主程序

我调整了脱节，并将召回和FPR保存在列表中以绘制它。下端，我保存在另一个列表中，从Sklearntrics返回的值，以确保我获得正确的值。

def recall_fpr(confusion_matrix):
    """Given a confusion matrix will return the recall and teh false positive reate"""
    cm = confusion_matrix
    Recall = round(cm[0, 0] / (cm[0, 0] + cm[0, 1]), 3) # TP /(TP + FN)
    Precision = round(cm[0, 0] / (cm[0, 0] + cm[1, 0]), 3) #TP /(TP + FP)
    False_Positive_rate = round((1 - Precision), 3)

    return Recall, False_Positive_rate

list_recall = []
list_fpr = []
list_recall_sk = []
list_fpr_sk = []
for i in range (1, 10):
    y_pred = y_pred.reshape(-1, 1)
    y_pred2 = binarize(y_pred, i/10)
    y_pred2 = np.where(y_pred2 == 1, 'Yes', 'No')
    cm = confusion_matrix(y_test, y_pred2, labels=["Yes", "No"])

    Recall, fpr = recall_fpr(cm)
    list_recall.append(Recall)
    list_fpr.append(fpr)
   
   # I just add that to check I m getting right the results
   recall_sk = round(recall_score(y_test, y_pred2, pos_label="Yes"), 3)
   list_recall_sk.append(recall_sk)

   fpr_sk = round(1 - round(precision_score(y_test, y_pred2, pos_label="Yes"), 3),3)
   list_fpr_sk.append(fpr_sk)

绘图值

df_threshold = pd.DataFrame({"Recall":list_recall, "False_Positives_rate": list_fpr})
df_threshold.plot(x='False_Positives_rate', y='Recall', style='o')

通过Sklearn方法计算指标。

fpr_2, tpr_2, thresholds_2 = roc_curve(y_test, y_pred, pos_label = "Yes")
plt.plot(fpr_2, tpr_2, linewidth=2)
plt.plot([0, 1], [0, 1], 'k--' )

ax = plt.subplot(1, 1, 1)
ax.scatter(list_fpr, list_recall,  c='red')
plt.show()

为什么这些值在我计算的和Sklearn指标之间不匹配？

原文

import numpy as np
from sklearn.metrics import roc_curve
from sklearn.preprocessing import binarize
import matplotlib.pyplot as plt 
from sklearn.metrics import confusion_matrix
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
from sklearn.metrics import accuracy_score, recall_score, precision_score

data

y_pred = np.array([0.4, 0.2, 0.3, 0.6, 0.1, 0.3, 0.7, 0.2, 0.3, 0.8, 0.3, 0.9, 0.3, 0.2, 0.2, 
                   0.4, 0.9, 0.4, 0.3, 0.6, 0.7, 0.2, 0.8, 0.2, 0.6, 0.1, 0.1])

y_test =np.array(["No","No","No","Yes","No","No","Yes","No","No","Yes","No","Yes",
                  "No","No","No", "No","Yes","No","No","No","No","Yes",
                  "No","Yes","No","No","No"])

main program

I adjust the thresold and save recall and fpr in a list to plot it.
addinionally I save in another list the values returned from sklearn metrics to ensure that I m getting the right values.

def recall_fpr(confusion_matrix):
    """Given a confusion matrix will return the recall and teh false positive reate"""
    cm = confusion_matrix
    Recall = round(cm[0, 0] / (cm[0, 0] + cm[0, 1]), 3) # TP /(TP + FN)
    Precision = round(cm[0, 0] / (cm[0, 0] + cm[1, 0]), 3) #TP /(TP + FP)
    False_Positive_rate = round((1 - Precision), 3)

    return Recall, False_Positive_rate

list_recall = []
list_fpr = []
list_recall_sk = []
list_fpr_sk = []
for i in range (1, 10):
    y_pred = y_pred.reshape(-1, 1)
    y_pred2 = binarize(y_pred, i/10)
    y_pred2 = np.where(y_pred2 == 1, 'Yes', 'No')
    cm = confusion_matrix(y_test, y_pred2, labels=["Yes", "No"])

    Recall, fpr = recall_fpr(cm)
    list_recall.append(Recall)
    list_fpr.append(fpr)
   
   # I just add that to check I m getting right the results
   recall_sk = round(recall_score(y_test, y_pred2, pos_label="Yes"), 3)
   list_recall_sk.append(recall_sk)

   fpr_sk = round(1 - round(precision_score(y_test, y_pred2, pos_label="Yes"), 3),3)
   list_fpr_sk.append(fpr_sk)

Plot values

df_threshold = pd.DataFrame({"Recall":list_recall, "False_Positives_rate": list_fpr})
df_threshold.plot(x='False_Positives_rate', y='Recall', style='o')

Calculate the metric by sklearn methodes.

fpr_2, tpr_2, thresholds_2 = roc_curve(y_test, y_pred, pos_label = "Yes")
plt.plot(fpr_2, tpr_2, linewidth=2)
plt.plot([0, 1], [0, 1], 'k--' )

ax = plt.subplot(1, 1, 1)
ax.scatter(list_fpr, list_recall,  c='red')
plt.show()

Why the values does not match between what I calculate and the sklearn metrics ?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不知在何时 2025-01-31 10:33:26

FPR不是1个精确。前者是fp/（fp+tn），后者是fp/（fp+tp）。

纠正Recker_fpr具有

    False_Positive_rate = round(cm[1, 0] / (cm[1, 0] + cm[1, 1]), 3) #FP /(FP + TN)

正确的ROC曲线的功能：

FPR is not 1-precision. The former is FP/(FP+TN), the latter is FP/(FP+TP).

Correcting the recall_fpr function to have

    False_Positive_rate = round(cm[1, 0] / (cm[1, 0] + cm[1, 1]), 3) #FP /(FP + TN)

gives the correct ROC curve:

回复收藏 0 原文

~没有更多了~

关于作者

染火枫林

暂无简介

文章

26 人气

关注发私信

牛↙奶布丁

文章 0 评论 0

关注

COSO

文章 0 评论 0

关注

落叶

文章 0 评论 0

关注

暗地喜欢

文章 0 评论 0

关注

qq_i8qOEG

文章 0 评论 0

关注

qq_Wl4Sbi

文章 0 评论 0

友情链接

文江博客

手动roc曲线doen; t与sklearn.metrics匹配

数据

主程序

绘图值

data

main program

Plot values

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

牛↙奶布丁

COSO

落叶

暗地喜欢

qq_i8qOEG

qq_Wl4Sbi

友情链接

手动roc曲线doen; t与sklearn.metrics匹配

数据

主程序

绘图值

data

main program

Plot values

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

牛↙奶布丁

COSO

落叶

暗地喜欢

qq_i8qOEG

qq_Wl4Sbi

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。