多类多标签分类的精度/召回率
我想知道如何计算多类多标签分类的精度和召回率测量,即有两个以上标签的分类,以及每个实例可以有多个标签的分类?
I'm wondering how to calculate precision and recall measures for multiclass multilabel classification, i.e. classification where there are more than two labels, and where each instance can have multiple labels?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
对于多标签分类,您有两种方法可以选择
首先考虑以下几点。
基于示例
指标以每个数据点的方式计算。对于每个预测标签,仅计算其分数,然后将这些分数汇总到所有数据点上。
,预测正确的比例有多少。分子找出预测向量中有多少标签与地面实况相同,并且比率计算有多少预测的真实标签实际上在地面实况中。
,预测实际标签数量的比率。分子找出预测向量中有多少标签与真实标签有共同点(如上所述),然后找到与实际标签数量的比率,从而得出预测的实际标签的比例。
还有其他指标。
基于标签
这里的事情是按标签完成的。对于每个标签,计算指标(例如精度、召回率),然后聚合这些标签指标。因此,在这种情况下,您最终会计算整个数据集上每个标签的精度/召回率,就像二元分类一样(因为每个标签都有一个二元分配),然后对其进行聚合。
简单的方法是呈现一般形式。
这只是标准多类等效项的扩展。
宏观平均
微平均
这里的 分别为仅 标签。
这里 $B$ 代表任何基于混淆矩阵的度量。在您的情况下,您将插入标准精度和召回公式。对于宏观平均,您传入每个标签计数,然后求和,对于微观平均,您首先对计数进行平均,然后应用度量函数。
您可能有兴趣查看多标签指标的代码 这里,它是包 mldr 的一部分在 R。此外,您可能有兴趣了解 Java 多标签库 MULAN。
这是一篇很好的论文,可以了解不同的指标:多标签学习算法综述
For multi-label classification you have two ways to go
First consider the following.
Example based
The metrics are computed in a per datapoint manner. For each predicted label its only its score is computed, and then these scores are aggregated over all the datapoints.
, The ratio of how much of the predicted is correct. The numerator finds how many labels in the predicted vector has common with the ground truth, and the ratio computes, how many of the predicted true labels are actually in the ground truth.
, The ratio of how many of the actual labels were predicted. The numerator finds how many labels in the predicted vector has common with the ground truth (as above), then finds the ratio to the number of actual labels, therefore getting what fraction of the actual labels were predicted.
There are other metrics as well.
Label based
Here the things are done labels-wise. For each label the metrics (eg. precision, recall) are computed and then these label-wise metrics are aggregated. Hence, in this case you end up computing the precision/recall for each label over the entire dataset, as you do for a binary classification (as each label has a binary assignment), then aggregate it.
The easy way is to present the general form.
This is just an extension of the standard multi-class equivalent.
Macro averaged
Micro averaged
Here the are the true positive, false positive, true negative and false negative counts respectively for only the label.
Here $B$ stands for any of the confusion-matrix based metric. In your case you would plug in the standard precision and recall formulas. For macro average you pass in the per label count and then sum, for micro average you average the counts first, then apply your metric function.
You might be interested to have a look into the code for the mult-label metrics here , which a part of the package mldr in R. Also you might be interested to look into the Java multi-label library MULAN.
This is a nice paper to get into the different metrics: A Review on Multi-Label Learning Algorithms
答案是,您必须计算每个类别的精度和召回率,然后将它们平均起来。例如,如果您对 A、B 和 C 进行分类,那么您的精确度是:
召回率相同。
我不是专家,但这是我根据以下来源确定的:
https://list.scms.waikato.ac.nz/pipermail/wekalist/2011-March/051575.html
http://stats.stackexchange.com/questions/21551/how-to-compute- precision-recall-for-multiclass-multilabel-classification
The answer is that you have to compute precision and recall for each class, then average them together. E.g. if you classes A, B, and C, then your precision is:
Same for recall.
I'm no expert, but this is what I have determined based on the following sources:
https://list.scms.waikato.ac.nz/pipermail/wekalist/2011-March/051575.html
http://stats.stackexchange.com/questions/21551/how-to-compute-precision-recall-for-multiclass-multilabel-classification
现在,要计算标签 A 的召回,您可以从混淆矩阵中读取值并计算:
<预置><代码>= TP_A/(TP_A+FN_A)
= TP_A/(A 的金标签总数)
现在,让我们计算精度
<预置><代码>= TP_A/(TP_A+FP_A)
= TP_A/(预测为 A 的总和)
您只需对剩余标签 B 和 C。这适用于任何多类分类问题。
这里是完整的文章关于如何计算任何多类分类问题的精度和召回率,包括示例。
Now, to compute recall for label A you can read off the values from the confusion matrix and compute:
Now, let us compute precision for label A, you can read off the values from the confusion matrix and compute:
You just need to do the same for the remaining labels B and C. This applies to any multi-class classification problem.
Here is the full article that talks about how to compute precision and recall for any multi-class classification problem, including examples.
在 python 中使用 sklearn 和 numpy:
In python using
sklearn
andnumpy
:如果类别平衡,则简单平均即可。
否则,每个真实类别的召回率需要根据该类别的流行程度进行加权,每个预测标签的精度需要根据每个标签的偏差(概率)进行加权。无论哪种方式,您都会获得兰德精度。
更直接的方法是制作归一化列联表(除以 N,因此对于标签和类别的每个组合,表加起来为 1)并添加对角线以获得 Rand 准确度。
但是,如果类别不平衡,偏差仍然存在,并且机会校正方法(例如 kappa)更合适,或者更好的是 ROC 分析或机会校正测量(例如知情度(ROC 机会线上方的高度))。
Simple averaging will do if the classes are balanced.
Otherwise, recall for each real class needs to be weighted by prevalence of the class, and precision for each predicted label needs to be weighted by the bias (probability) for each label. Either way you get Rand Accuracy.
A more direct way is to make a normalized contingency table (divide by N so table adds up to 1 for each combination of label and class) and add the diagonal to get Rand Accuracy.
But if classes aren't balanced, the bias remains and a chance corrected method such as kappa is more appropriate, or better still ROC analysis or a chance correct measure such as informedness (height above the chance line in ROC).