重复交叉验证的敏感性和特异性计算方法

发布于 2025-01-09 02:50:29 字数 2668 浏览 0 评论 0原文

我想了解使用 summaryFunction=twoClassSummary 时如何在 caret::train 中计算灵敏度和特异性，以及是否/如何可以更改该方法。

示例：

library(caret)
library(mlbench)

data(Sonar)

trControl = trainControl("repeatedcv",
                         number = 3,
                         repeats =10,
                         classProbs = TRUE,
                         savePredictions = TRUE,
                         summaryFunction = twoClassSummary)

fit <- train(Class ~ ., data=Sonar, 
               method = "glm",
               family=binomial(),
               metric="ROC",
               trControl=trControl)

fit$results
#>   parameter       ROC      Sens      Spec      ROCSD     SensSD     SpecSD
#> 1      none 0.7350712 0.7225225 0.6744949 0.04365911 0.06805355 0.06238508

在 twoClassSummary 函数中，我看到了 Sens 和 Spec 的计算方式（通过 caret::sensitivity 和 caret::specificity）。

print(twoClassSummary)
#> function (data, lev = NULL, model = NULL) 
#> {
#>     if (length(lev) > 2) {
#>         stop(paste("Your outcome has", length(lev), "levels. The twoClassSummary() function isn't appropriate."))
#>     }
#>     requireNamespaceQuietStop("pROC")
#>     if (!all(levels(data[, "pred"]) == lev)) {
#>         stop("levels of observed and predicted data do not match")
#>     }
#>     rocObject <- try(pROC::roc(data$obs, data[, lev[1]], direction = ">", 
#>         quiet = TRUE), silent = TRUE)
#>     rocAUC <- if (inherits(rocObject, "try-error")) 
#>         NA
#>     else rocObject$auc
#>     out <- c(rocAUC, sensitivity(data[, "pred"], data[, "obs"], 
#>         lev[1]), specificity(data[, "pred"], data[, "obs"], lev[2]))
#>     names(out) <- c("ROC", "Sens", "Spec")
#>     out
#> }
#> <bytecode: 0x000000002f048120>
#> <environment: namespace:caret>

^{由 reprex 软件包 (v2.0.1) 于 2022 年 2 月 22 日创建}

作为据我所知，有不同的方法可以找到获得 Sens 和 Spec 的最佳阈值（例如 youden）。 TwoClassSummary 的 Sens 和 Spec 是根据什么方法计算的？

我如何更改 TwoClassSummary 以根据 youden 或“最接近的左上角”方法获取 Sens 和 Spec？

更新：

根据插入符训练如何确定最大化特异性的概率阈值插入符使用0.5的阈值来计算Sens和Spec，我认为对于metric="ROC"也是如此就我而言。

我仍然坚持使用修改后的summaryFunction 根据 Youden 获得 Sens/Spec 所需的更改。

原文

I would like to understand how the Sensitivity and Specificity is caluculated in caret::train when using summaryFunction=twoClassSummary and if/how it is possible to change the method.

Example:

library(caret)
library(mlbench)

data(Sonar)

trControl = trainControl("repeatedcv",
                         number = 3,
                         repeats =10,
                         classProbs = TRUE,
                         savePredictions = TRUE,
                         summaryFunction = twoClassSummary)

fit <- train(Class ~ ., data=Sonar, 
               method = "glm",
               family=binomial(),
               metric="ROC",
               trControl=trControl)

fit$results
#>   parameter       ROC      Sens      Spec      ROCSD     SensSD     SpecSD
#> 1      none 0.7350712 0.7225225 0.6744949 0.04365911 0.06805355 0.06238508

In the twoClassSummary function I see how Sens and Spec are calculated (via caret::sensitivity and caret::specificity).

print(twoClassSummary)
#> function (data, lev = NULL, model = NULL) 
#> {
#>     if (length(lev) > 2) {
#>         stop(paste("Your outcome has", length(lev), "levels. The twoClassSummary() function isn't appropriate."))
#>     }
#>     requireNamespaceQuietStop("pROC")
#>     if (!all(levels(data[, "pred"]) == lev)) {
#>         stop("levels of observed and predicted data do not match")
#>     }
#>     rocObject <- try(pROC::roc(data$obs, data[, lev[1]], direction = ">", 
#>         quiet = TRUE), silent = TRUE)
#>     rocAUC <- if (inherits(rocObject, "try-error")) 
#>         NA
#>     else rocObject$auc
#>     out <- c(rocAUC, sensitivity(data[, "pred"], data[, "obs"], 
#>         lev[1]), specificity(data[, "pred"], data[, "obs"], lev[2]))
#>     names(out) <- c("ROC", "Sens", "Spec")
#>     out
#> }
#> <bytecode: 0x000000002f048120>
#> <environment: namespace:caret>

^{Created on 2022-02-22 by the reprex package (v2.0.1)}

As far as I know, there are different methods to find optimal thresholds to get Sens and Spec (e.g., youden). According to which method is the Sens and Spec calculated with TwoClassSummary?

How could I change TwoClassSummary to get Sens and Spec according to youden or "closest topleft" method?

Update:

According to How does caret train determine the probability threshold to maximise Specificity caret uses a threshold of 0.5 to calculate Sens and Spec, I think this is also true for metric="ROC" as in my case.

I am still stuck with changes needed to get Sens/Spec according to Youden using a modified summaryFunction.

分享到QQ

分享到微博