当前位置：文江博客话题详情

在 R 中计算精确度、召回率和 F1 分数的简单方法

发布于 2024-12-21 08:33:17 字数 180 浏览 7 评论 0原文

我在 R 中使用 rpart 分类器。问题是 - 我想在测试数据上测试经过训练的分类器。这很好 - 我可以使用 predict.rpart 函数。

但我还想计算精确率、召回率和 F1 分数。

我的问题是 - 我是否必须自己为这些函数编写函数，或者 R 或任何 CRAN 库中有任何函数吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

安穩 2024-12-28 08:33:17

使用 caret 包：

library(caret)

y <- ... # factor of positive / negative cases
predictions <- ... # factor of predictions

precision <- posPredValue(predictions, y, positive="1")
recall <- sensitivity(predictions, y, positive="1")

F1 <- (2 * precision * recall) / (precision + recall)

适用于二元和多类分类的通用函数，无需使用任何包是：

f1_score <- function(predicted, expected, positive.class="1") {
    predicted <- factor(as.character(predicted), levels=unique(as.character(expected)))
    expected  <- as.factor(expected)
    cm = as.matrix(table(expected, predicted))

    precision <- diag(cm) / colSums(cm)
    recall <- diag(cm) / rowSums(cm)
    f1 <-  ifelse(precision + recall == 0, 0, 2 * precision * recall / (precision + recall))

    #Assuming that F1 is zero when it's not possible compute it
    f1[is.na(f1)] <- 0

    #Binary F1 or Multi-class macro-averaged F1
    ifelse(nlevels(expected) == 2, f1[positive.class], mean(f1))
}

关于该函数的一些注释：

假设 F1 = NA 为零
positive.class 仅用于
如果
对于多类问题，计算宏观平均 F1
predicted 和 expected 具有不同的级别，predicted 将收到 预期水平

using the caret package:

library(caret)

y <- ... # factor of positive / negative cases
predictions <- ... # factor of predictions

precision <- posPredValue(predictions, y, positive="1")
recall <- sensitivity(predictions, y, positive="1")

F1 <- (2 * precision * recall) / (precision + recall)

A generic function that works for binary and multi-class classification without using any package is:

f1_score <- function(predicted, expected, positive.class="1") {
    predicted <- factor(as.character(predicted), levels=unique(as.character(expected)))
    expected  <- as.factor(expected)
    cm = as.matrix(table(expected, predicted))

    precision <- diag(cm) / colSums(cm)
    recall <- diag(cm) / rowSums(cm)
    f1 <-  ifelse(precision + recall == 0, 0, 2 * precision * recall / (precision + recall))

    #Assuming that F1 is zero when it's not possible compute it
    f1[is.na(f1)] <- 0

    #Binary F1 or Multi-class macro-averaged F1
    ifelse(nlevels(expected) == 2, f1[positive.class], mean(f1))
}

Some comments about the function:

It's assumed that an F1 = NA is zero
positive.class is used only in
binary f1
for multi-class problems, the macro-averaged F1 is computed
If predicted and expected had different levels, predicted will receive the expected levels

回复收藏 0 原文

眉黛浅 2024-12-28 08:33:17

ROCR 库计算所有这些以及更多内容（另请参阅http://rocr.bioinf.mpi-sb.mpg.de）：

library (ROCR);
...

y <- ... # logical array of positive / negative cases
predictions <- ... # array of predictions

pred <- prediction(predictions, y);

# Recall-Precision curve             
RP.perf <- performance(pred, "prec", "rec");

plot (RP.perf);

# ROC curve
ROC.perf <- performance(pred, "tpr", "fpr");
plot (ROC.perf);

# ROC area under the curve
auc.tmp <- performance(pred,"auc");
auc <- as.numeric([email protected])

...

The ROCR library calculates all these and more (see also http://rocr.bioinf.mpi-sb.mpg.de):

library (ROCR);
...

y <- ... # logical array of positive / negative cases
predictions <- ... # array of predictions

pred <- prediction(predictions, y);

# Recall-Precision curve             
RP.perf <- performance(pred, "prec", "rec");

plot (RP.perf);

# ROC curve
ROC.perf <- performance(pred, "tpr", "fpr");
plot (ROC.perf);

# ROC area under the curve
auc.tmp <- performance(pred,"auc");
auc <- as.numeric([email protected])

...

回复收藏 0 原文

蹲墙角沉默 2024-12-28 08:33:17

只是为了更新这一点，因为我现在遇到了这个线程，caret 中的 confusionMatrix 函数会自动为您计算所有这些内容。

cm <- confusionMatrix(prediction, reference = test_set$label)

# extract F1 score for all classes
cm[["byClass"]][ , "F1"] #for multiclass classification problems

您也可以用以下任意一项替换“F1”以提取相关值：

“灵敏度”、“特异性”、“正预测值”、“负预测值”、“精确度”、“召回率”、“F1”、“患病率”、“检测率”、“比率”、“检测率” ", "平衡精度"

我认为当你只做二元分类问题时，这表现得略有不同，但在这两种情况下，当你查看混淆矩阵内部时，所有这些值都会为你计算出来对象，位于 $byClass 下

Just to update this as I came across this thread now, the confusionMatrix function in caretcomputes all of these things for you automatically.

cm <- confusionMatrix(prediction, reference = test_set$label)

# extract F1 score for all classes
cm[["byClass"]][ , "F1"] #for multiclass classification problems

You can substitute any of the following for "F1" to extract the relevant values as well:

"Sensitivity", "Specificity", "Pos Pred Value", "Neg Pred Value", "Precision", "Recall", "F1", "Prevalence", "Detection", "Rate", "Detection Prevalence", "Balanced Accuracy"

I think this behaves slightly differently when you're only doing a binary classifcation problem, but in both cases, all of these values are computed for you when you look inside the confusionMatrix object, under $byClass

回复收藏 0 原文

尐籹人 2024-12-28 08:33:17

caret 包中的 fusionMatrix() 可以与适当的可选字段“Positive”一起使用，指定哪个因子应被视为正因子。

confusionMatrix(predicted, Funded, mode = "prec_recall", positive="1")

此代码还将给出附加值，例如 F 统计量、准确度等。

confusionMatrix() from caret package can be used along with a proper optional field "Positive" specifying which factor should be taken as positive factor.

confusionMatrix(predicted, Funded, mode = "prec_recall", positive="1")

This code will also give additional values such as F-statistic, Accuracy, etc.

回复收藏 0 原文

人事已非 2024-12-28 08:33:17

我注意到关于二元类需要 F1 分数的评论。我怀疑通常是这样。但不久前我写了这篇文章，其中我将其分类为用数字表示的几个组。这可能对你有用...

calcF1Scores=function(act,prd){
  #treats the vectors like classes
  #act and prd must be whole numbers
  df=data.frame(act=act,prd=prd);
  scores=list();
  for(i in seq(min(act),max(act))){
    tp=nrow(df[df$prd==i & df$act==i,]);        
    fp=nrow(df[df$prd==i & df$act!=i,]);
    fn=nrow(df[df$prd!=i & df$act==i,]);
    f1=(2*tp)/(2*tp+fp+fn)
    scores[[i]]=f1;
  }      
  print(scores)
  return(scores);
}

print(mean(unlist(calcF1Scores(c(1,1,3,4,5),c(1,2,3,4,5)))))
print(mean(unlist(calcF1Scores(c(1,2,3,4,5),c(1,2,3,4,5)))))

I noticed the comment about F1 score being needed for binary classes. I suspect that it usually is. But a while ago I wrote this in which I was doing classification into several groups denoted by number. This may be of use to you...

calcF1Scores=function(act,prd){
  #treats the vectors like classes
  #act and prd must be whole numbers
  df=data.frame(act=act,prd=prd);
  scores=list();
  for(i in seq(min(act),max(act))){
    tp=nrow(df[df$prd==i & df$act==i,]);        
    fp=nrow(df[df$prd==i & df$act!=i,]);
    fn=nrow(df[df$prd!=i & df$act==i,]);
    f1=(2*tp)/(2*tp+fp+fn)
    scores[[i]]=f1;
  }      
  print(scores)
  return(scores);
}

print(mean(unlist(calcF1Scores(c(1,1,3,4,5),c(1,2,3,4,5)))))
print(mean(unlist(calcF1Scores(c(1,2,3,4,5),c(1,2,3,4,5)))))

回复收藏 0 原文

我三岁 2024-12-28 08:33:17

我们可以简单地从caret的confusionMatrix函数中获取F1值

result <- confusionMatrix(Prediction, Lable)

# View confusion matrix overall
result 

# F1 value
result$byClass[7]

We can simply get F1 value from caret's confusionMatrix function

result <- confusionMatrix(Prediction, Lable)

# View confusion matrix overall
result 

# F1 value
result$byClass[7]

回复收藏 0 原文

老旧海报 2024-12-28 08:33:17

您还可以使用 caret 包提供的 confusionMatrix() 。输出包括灵敏度（也称为召回率）和预测预测值（也称为精度）。那么 F1 可以很容易地计算出来，如上所述，如下：
F1 <-（2 * 精度 * 召回率）/（精度 + 召回率）

回复收藏 0 原文

鸵鸟症 2024-12-28 08:33:17

库（插入符）

结果<-confusionMatrix（预测，标签）

#这显示了您需要的所有度量，包括精度、召回率和F1

结果$byClass

回复收藏 0 原文

~没有更多了~

关于作者

煞人兵器

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

在 R 中计算精确度、召回率和 F1 分数的简单方法

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（8）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

在 R 中计算精确度、召回率和 F1 分数的简单方法

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（8）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。