在 R 中计算精确度、召回率和 F1 分数的简单方法

发布于 2024-12-21 08:33:17 字数 180 浏览 0 评论 0原文

我在 R 中使用 rpart 分类器。问题是 - 我想在测试数据上测试经过训练的分类器。这很好 - 我可以使用 predict.rpart 函数。

但我还想计算精确率、召回率和 F1 分数。

我的问题是 - 我是否必须自己为这些函数编写函数,或者 R 或任何 CRAN 库中有任何函数吗?

I am using an rpart classifier in R. The question is - I would want to test the trained classifier on a test data. This is fine - I can use the predict.rpart function.

But I also want to calculate precision, recall and F1 score.

My question is - do I have to write functions for those myself, or is there any function in R or any of CRAN libraries for that?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

安穩 2024-12-28 08:33:17

使用 caret 包:

library(caret)

y <- ... # factor of positive / negative cases
predictions <- ... # factor of predictions

precision <- posPredValue(predictions, y, positive="1")
recall <- sensitivity(predictions, y, positive="1")

F1 <- (2 * precision * recall) / (precision + recall)

适用于二元和多类分类的通用函数,无需使用任何包是:

f1_score <- function(predicted, expected, positive.class="1") {
    predicted <- factor(as.character(predicted), levels=unique(as.character(expected)))
    expected  <- as.factor(expected)
    cm = as.matrix(table(expected, predicted))

    precision <- diag(cm) / colSums(cm)
    recall <- diag(cm) / rowSums(cm)
    f1 <-  ifelse(precision + recall == 0, 0, 2 * precision * recall / (precision + recall))

    #Assuming that F1 is zero when it's not possible compute it
    f1[is.na(f1)] <- 0

    #Binary F1 or Multi-class macro-averaged F1
    ifelse(nlevels(expected) == 2, f1[positive.class], mean(f1))
}

关于该函数的一些注释:

  • 假设 F1 = NA 为零
  • positive.class 仅用于
    如果
  • 对于多类问题,计算宏观平均 F1
  • predictedexpected 具有不同的级别,predicted 将收到 预期水平

using the caret package:

library(caret)

y <- ... # factor of positive / negative cases
predictions <- ... # factor of predictions

precision <- posPredValue(predictions, y, positive="1")
recall <- sensitivity(predictions, y, positive="1")

F1 <- (2 * precision * recall) / (precision + recall)

A generic function that works for binary and multi-class classification without using any package is:

f1_score <- function(predicted, expected, positive.class="1") {
    predicted <- factor(as.character(predicted), levels=unique(as.character(expected)))
    expected  <- as.factor(expected)
    cm = as.matrix(table(expected, predicted))

    precision <- diag(cm) / colSums(cm)
    recall <- diag(cm) / rowSums(cm)
    f1 <-  ifelse(precision + recall == 0, 0, 2 * precision * recall / (precision + recall))

    #Assuming that F1 is zero when it's not possible compute it
    f1[is.na(f1)] <- 0

    #Binary F1 or Multi-class macro-averaged F1
    ifelse(nlevels(expected) == 2, f1[positive.class], mean(f1))
}

Some comments about the function:

  • It's assumed that an F1 = NA is zero
  • positive.class is used only in
    binary f1
  • for multi-class problems, the macro-averaged F1 is computed
  • If predicted and expected had different levels, predicted will receive the expected levels
眉黛浅 2024-12-28 08:33:17

ROCR 库计算所有这些以及更多内容(另请参阅http://rocr.bioinf.mpi-sb.mpg.de):

library (ROCR);
...

y <- ... # logical array of positive / negative cases
predictions <- ... # array of predictions

pred <- prediction(predictions, y);

# Recall-Precision curve             
RP.perf <- performance(pred, "prec", "rec");

plot (RP.perf);

# ROC curve
ROC.perf <- performance(pred, "tpr", "fpr");
plot (ROC.perf);

# ROC area under the curve
auc.tmp <- performance(pred,"auc");
auc <- as.numeric([email protected])

...

The ROCR library calculates all these and more (see also http://rocr.bioinf.mpi-sb.mpg.de):

library (ROCR);
...

y <- ... # logical array of positive / negative cases
predictions <- ... # array of predictions

pred <- prediction(predictions, y);

# Recall-Precision curve             
RP.perf <- performance(pred, "prec", "rec");

plot (RP.perf);

# ROC curve
ROC.perf <- performance(pred, "tpr", "fpr");
plot (ROC.perf);

# ROC area under the curve
auc.tmp <- performance(pred,"auc");
auc <- as.numeric([email protected])

...
蹲墙角沉默 2024-12-28 08:33:17

只是为了更新这一点,因为我现在遇到了这个线程,caret 中的 confusionMatrix 函数会自动为您计算所有这些内容。

cm <- confusionMatrix(prediction, reference = test_set$label)

# extract F1 score for all classes
cm[["byClass"]][ , "F1"] #for multiclass classification problems

您也可以用以下任意一项替换“F1”以提取相关值:

“灵敏度”、“特异性”、“正预测值”、“负预测值”、“精确度”、“召回率”、“F1”、“患病率”、“检测率”、“比率”、“检测率” ", "平衡精度"

我认为当你只做二元分类问题时,这表现得略有不同,但在这两种情况下,当你查看混淆矩阵内部时,所有这些值都会为你计算出来对象,位于 $byClass

Just to update this as I came across this thread now, the confusionMatrix function in caretcomputes all of these things for you automatically.

cm <- confusionMatrix(prediction, reference = test_set$label)

# extract F1 score for all classes
cm[["byClass"]][ , "F1"] #for multiclass classification problems

You can substitute any of the following for "F1" to extract the relevant values as well:

"Sensitivity", "Specificity", "Pos Pred Value", "Neg Pred Value", "Precision", "Recall", "F1", "Prevalence", "Detection", "Rate", "Detection Prevalence", "Balanced Accuracy"

I think this behaves slightly differently when you're only doing a binary classifcation problem, but in both cases, all of these values are computed for you when you look inside the confusionMatrix object, under $byClass

尐籹人 2024-12-28 08:33:17

caret 包中的 fusionMatrix() 可以与适当的可选字段“Positive”一起使用,指定哪个因子应被视为正因子。

confusionMatrix(predicted, Funded, mode = "prec_recall", positive="1")

此代码还将给出附加值,例如 F 统计量、准确度等。

confusionMatrix() from caret package can be used along with a proper optional field "Positive" specifying which factor should be taken as positive factor.

confusionMatrix(predicted, Funded, mode = "prec_recall", positive="1")

This code will also give additional values such as F-statistic, Accuracy, etc.

人事已非 2024-12-28 08:33:17

我注意到关于二元类需要 F1 分数的评论。我怀疑通常是这样。但不久前我写了这篇文章,其中我将其分类为用数字表示的几个组。这可能对你有用...

calcF1Scores=function(act,prd){
  #treats the vectors like classes
  #act and prd must be whole numbers
  df=data.frame(act=act,prd=prd);
  scores=list();
  for(i in seq(min(act),max(act))){
    tp=nrow(df[df$prd==i & df$act==i,]);        
    fp=nrow(df[df$prd==i & df$act!=i,]);
    fn=nrow(df[df$prd!=i & df$act==i,]);
    f1=(2*tp)/(2*tp+fp+fn)
    scores[[i]]=f1;
  }      
  print(scores)
  return(scores);
}

print(mean(unlist(calcF1Scores(c(1,1,3,4,5),c(1,2,3,4,5)))))
print(mean(unlist(calcF1Scores(c(1,2,3,4,5),c(1,2,3,4,5)))))

I noticed the comment about F1 score being needed for binary classes. I suspect that it usually is. But a while ago I wrote this in which I was doing classification into several groups denoted by number. This may be of use to you...

calcF1Scores=function(act,prd){
  #treats the vectors like classes
  #act and prd must be whole numbers
  df=data.frame(act=act,prd=prd);
  scores=list();
  for(i in seq(min(act),max(act))){
    tp=nrow(df[df$prd==i & df$act==i,]);        
    fp=nrow(df[df$prd==i & df$act!=i,]);
    fn=nrow(df[df$prd!=i & df$act==i,]);
    f1=(2*tp)/(2*tp+fp+fn)
    scores[[i]]=f1;
  }      
  print(scores)
  return(scores);
}

print(mean(unlist(calcF1Scores(c(1,1,3,4,5),c(1,2,3,4,5)))))
print(mean(unlist(calcF1Scores(c(1,2,3,4,5),c(1,2,3,4,5)))))
我三岁 2024-12-28 08:33:17

我们可以简单地从caret的confusionMatrix函数中获取F1值

result <- confusionMatrix(Prediction, Lable)

# View confusion matrix overall
result 

# F1 value
result$byClass[7] 

We can simply get F1 value from caret's confusionMatrix function

result <- confusionMatrix(Prediction, Lable)

# View confusion matrix overall
result 

# F1 value
result$byClass[7] 
老旧海报 2024-12-28 08:33:17

您还可以使用 caret 包提供的 confusionMatrix() 。输出包括灵敏度(也称为召回率)和预测预测值(也称为精度)。那么 F1 可以很容易地计算出来,如上所述,如下:
F1 <-(2 * 精度 * 召回率)/(精度 + 召回率)

You can also use the confusionMatrix() provided by caret package. The output includes,between others, Sensitivity (also known as recall) and Pos Pred Value(also known as precision). Then F1 can be easily computed, as stated above, as:
F1 <- (2 * precision * recall) / (precision + recall)

鸵鸟症 2024-12-28 08:33:17

库(插入符)

结果<-confusionMatrix(预测,标签)

#这显示了您需要的所有度量,包括精度、召回率和F1

结果$byClass

library(caret)

result <- confusionMatrix(Prediction, label)

#This shows all the measures you need including precision, recall and F1

result$byClass

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文