当前位置：文江博客话题详情

多类多标签分类的精度/召回率

发布于 2024-12-28 19:10:25 字数 61 浏览 2 评论 0原文

我想知道如何计算多类多标签分类的精度和召回率测量，即有两个以上标签的分类，以及每个实例可以有多个标签的分类？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夏末的微笑 2025-01-04 19:10:25

对于多标签分类，您有两种方法可以选择
首先考虑以下几点。

是示例数。
是 $i^{th}$ 示例..
是 $i^{th }$ 示例。
是 $i^{th}$ 示例。

基于示例

指标以每个数据点的方式计算。对于每个预测标签，仅计算其分数，然后将这些分数汇总到所有数据点上。

精度 = $\frac{1}{n}\sum_{i=1}^{n}\frac{|Y_{i}\cap h(x_{i})|}{|h(x_{i})|}$
，预测正确的比例有多少。分子找出预测向量中有多少标签与地面实况相同，并且比率计算有多少预测的真实标签实际上在地面实况中。
召回 = $\frac{1}{n}\sum_{i=1}^{n}\frac{|Y_{i}\cap h(x_{i})|}{|Y_{i}|}$ < /a>
，预测实际标签数量的比率。分子找出预测向量中有多少标签与真实标签有共同点（如上所述），然后找到与实际标签数量的比率，从而得出预测的实际标签的比例。

还有其他指标。

基于标签

这里的事情是按标签完成的。对于每个标签，计算指标（例如精度、召回率），然后聚合这些标签指标。因此，在这种情况下，您最终会计算整个数据集上每个标签的精度/召回率，就像二元分类一样（因为每个标签都有一个二元分配），然后对其进行聚合。

简单的方法是呈现一般形式。

这只是标准多类等效项的扩展。

这里的 $TP_{j} ,FP_{j},TN_{j},FN_{j}$ 分别为仅 $j^{th }$ 标签。

这里 $B$ 代表任何基于混淆矩阵的度量。在您的情况下，您将插入标准精度和召回公式。对于宏观平均，您传入每个标签计数，然后求和，对于微观平均，您首先对计数进行平均，然后应用度量函数。

您可能有兴趣查看多标签指标的代码这里，它是包 mldr 的一部分在 R。此外，您可能有兴趣了解 Java 多标签库 MULAN。

这是一篇很好的论文，可以了解不同的指标：多标签学习算法综述

For multi-label classification you have two ways to go
First consider the following.

is the number of examples.
is the ground truth label assignment of the $i^{th}$ example..
is the $i^{th}$ example.
is the predicted labels for the $i^{th}$ example.

Example based

The metrics are computed in a per datapoint manner. For each predicted label its only its score is computed, and then these scores are aggregated over all the datapoints.

Precision = $\frac{1}{n}\sum_{i=1}^{n}\frac{|Y_{i}\cap h(x_{i})|}{|h(x_{i})|}$
, The ratio of how much of the predicted is correct. The numerator finds how many labels in the predicted vector has common with the ground truth, and the ratio computes, how many of the predicted true labels are actually in the ground truth.
Recall = $\frac{1}{n}\sum_{i=1}^{n}\frac{|Y_{i}\cap h(x_{i})|}{|Y_{i}|}$
, The ratio of how many of the actual labels were predicted. The numerator finds how many labels in the predicted vector has common with the ground truth (as above), then finds the ratio to the number of actual labels, therefore getting what fraction of the actual labels were predicted.

There are other metrics as well.

Label based

Here the things are done labels-wise. For each label the metrics (eg. precision, recall) are computed and then these label-wise metrics are aggregated. Hence, in this case you end up computing the precision/recall for each label over the entire dataset, as you do for a binary classification (as each label has a binary assignment), then aggregate it.

The easy way is to present the general form.

This is just an extension of the standard multi-class equivalent.

Macro averaged $\frac{1}{q}\sum_{j=1}^{q}B(TP_{j},FP_{j},TN_{j},FN_{j})$
Micro averaged $B(\sum_{j=1}^{q}TP_{j},\sum_{j=1}^{q}FP_{j},\sum_{j=1}^{q}TN_{j},\sum_{j=1}^{q}FN_{j})$

Here the $TP_{j},FP_{j},TN_{j},FN_{j}$ are the true positive, false positive, true negative and false negative counts respectively for only the $j^{th}$ label.

Here $B$ stands for any of the confusion-matrix based metric. In your case you would plug in the standard precision and recall formulas. For macro average you pass in the per label count and then sum, for micro average you average the counts first, then apply your metric function.

You might be interested to have a look into the code for the mult-label metrics here , which a part of the package mldr in R. Also you might be interested to look into the Java multi-label library MULAN.

This is a nice paper to get into the different metrics: A Review on Multi-Label Learning Algorithms

回复收藏 0 原文

嘿看小鸭子会跑 2025-01-04 19:10:25

答案是，您必须计算每个类别的精度和召回率，然后将它们平均起来。例如，如果您对 A、B 和 C 进行分类，那么您的精确度是：

(precision(A) + precision(B) + precision(C)) / 3

召回率相同。

我不是专家，但这是我根据以下来源确定的：

https://list.scms.waikato.ac.nz/pipermail/wekalist/2011-March/051575.html
http://stats.stackexchange.com/questions/21551/how-to-compute- precision-recall-for-multiclass-multilabel-classification

The answer is that you have to compute precision and recall for each class, then average them together. E.g. if you classes A, B, and C, then your precision is:

(precision(A) + precision(B) + precision(C)) / 3

Same for recall.

I'm no expert, but this is what I have determined based on the following sources:

https://list.scms.waikato.ac.nz/pipermail/wekalist/2011-March/051575.html
http://stats.stackexchange.com/questions/21551/how-to-compute-precision-recall-for-multiclass-multilabel-classification

回复收藏 0 原文

紫罗兰の梦幻 2025-01-04 19:10:25

假设我们有一个带有标签 A、B 和 C 的 3 类多分类问题。
首先要做的是生成混淆矩阵。请注意，对角线中的值始终是真阳性 (TP)。
现在，要计算标签 A 的召回，您可以从混淆矩阵中读取值并计算：
<预置><代码>= TP_A/(TP_A+FN_A)
= TP_A/(A 的金标签总数)
现在，让我们计算精度
<预置><代码>= TP_A/(TP_A+FP_A)
= TP_A/(预测为 A 的总和)
您只需对剩余标签 B 和 C。这适用于任何多类分类问题。

这里是完整的文章关于如何计算任何多类分类问题的精度和召回率，包括示例。

Let us assume that we have a 3-class multi classification problem with labels A, B and C.
The first thing to do is to generate a confusion matrix. Note that the values in the diagonal are always the true positives (TP).
Now, to compute recall for label A you can read off the values from the confusion matrix and compute:
```
= TP_A/(TP_A+FN_A)
= TP_A/(Total gold labels for A)
```
Now, let us compute precision for label A, you can read off the values from the confusion matrix and compute:
```
= TP_A/(TP_A+FP_A)
= TP_A/(Total predicted as A)
```
You just need to do the same for the remaining labels B and C. This applies to any multi-class classification problem.

Here is the full article that talks about how to compute precision and recall for any multi-class classification problem, including examples.

回复收藏 0 原文

倦话 2025-01-04 19:10:25

在 python 中使用 sklearn 和 numpy：

from sklearn.metrics import confusion_matrix
import numpy as np

labels = ...
predictions = ...

cm = confusion_matrix(labels, predictions)
recall = np.diag(cm) / np.sum(cm, axis = 1)
precision = np.diag(cm) / np.sum(cm, axis = 0)

In python using sklearn and numpy:

from sklearn.metrics import confusion_matrix
import numpy as np

labels = ...
predictions = ...

cm = confusion_matrix(labels, predictions)
recall = np.diag(cm) / np.sum(cm, axis = 1)
precision = np.diag(cm) / np.sum(cm, axis = 0)

回复收藏 0 原文