Ntlk& Python,绘制ROC曲线

发布于 2024-12-16 18:57:21 字数 130 浏览 0 评论 0原文

我正在使用 nltk 和 Python,我想绘制分类器(朴素贝叶斯)的 ROC 曲线。是否有任何函数可以绘制它,或者我应该跟踪真阳性率和假阳性率?

如果有人能指出我已经在做的一些代码,那就太好了......

谢谢。

I am using nltk with Python and I would like to plot the ROC curve of my classifier (Naive Bayes). Is there any function for plotting it or should I have to track the True Positive rate and False Positive rate ?

It would be great if someone would point me to some code already doing it...

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

李白 2024-12-23 18:57:21

PyROC 看起来很简单:教程源代码

这就是它与 NLTK 的配合方式朴素贝叶斯分类器:

# class labels are 0 and 1
labeled_data = [
    (1, featureset_1),
    (0, featureset_2),
    (1, featureset_3),
    # ...
]

# naive_bayes is your already trained classifier,
# preferrably not on the data you're testing on :)

from pyroc import ROCData

roc_data = ROCData(
    (label, naive_bayes.prob_classify(featureset).prob(1))
    for label, featureset
    in labeled_data
)
roc_data.plot()

编辑:

  • ROC 仅适用于二元分类器。如果您有三个类别,您可以分别衡量正类别和负类别的表现(通过将其他两个类别计为 0,就像您建议的那样)。
  • 该库期望决策函数的输出作为每个元组的第二个值。然后它尝试所有可能的阈值,例如 f(x) >= 0.8 =>分类为 1,并为每个阈值绘制一个点(这就是最终得到一条曲线的原因)。因此,如果您的分类器猜测类别 0,您实际上需要一个接近于零的值。这就是我提出 .prob(1) 的原因

PyROC looks simple enough: tutorial, source code

This is how it would work with the NLTK naive bayes classifier:

# class labels are 0 and 1
labeled_data = [
    (1, featureset_1),
    (0, featureset_2),
    (1, featureset_3),
    # ...
]

# naive_bayes is your already trained classifier,
# preferrably not on the data you're testing on :)

from pyroc import ROCData

roc_data = ROCData(
    (label, naive_bayes.prob_classify(featureset).prob(1))
    for label, featureset
    in labeled_data
)
roc_data.plot()

Edits:

  • ROC is for binary classifiers only. If you have three classes, you can measure the performance of your positive and negative class separately (by counting the other two classes as 0, like you proposed).
  • The library expects the output of a decision function as the second value of each tuple. It then tries all possible thresholds, e.g. f(x) >= 0.8 => classify as 1, and plots a point for each threshold (that's why you get a curve in the end). So if your classifier guesses class 0, you actually want a value closer to zero. That's why I proposed .prob(1)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文