重复交叉验证的敏感性和特异性计算方法

发布于 2025-01-09 02:50:29 字数 2668 浏览 0 评论 0原文

我想了解使用 summaryFunction=twoClassSummary 时如何在 caret::train 中计算灵敏度和特异性,以及是否/如何可以更改该方法。

示例:

library(caret)
library(mlbench)

data(Sonar)

trControl = trainControl("repeatedcv",
                         number = 3,
                         repeats =10,
                         classProbs = TRUE,
                         savePredictions = TRUE,
                         summaryFunction = twoClassSummary)

fit <- train(Class ~ ., data=Sonar, 
               method = "glm",
               family=binomial(),
               metric="ROC",
               trControl=trControl)

fit$results
#>   parameter       ROC      Sens      Spec      ROCSD     SensSD     SpecSD
#> 1      none 0.7350712 0.7225225 0.6744949 0.04365911 0.06805355 0.06238508

twoClassSummary 函数中,我看到了 Sens 和 Spec 的计算方式(通过 caret::sensitivitycaret::specificity)。

print(twoClassSummary)
#> function (data, lev = NULL, model = NULL) 
#> {
#>     if (length(lev) > 2) {
#>         stop(paste("Your outcome has", length(lev), "levels. The twoClassSummary() function isn't appropriate."))
#>     }
#>     requireNamespaceQuietStop("pROC")
#>     if (!all(levels(data[, "pred"]) == lev)) {
#>         stop("levels of observed and predicted data do not match")
#>     }
#>     rocObject <- try(pROC::roc(data$obs, data[, lev[1]], direction = ">", 
#>         quiet = TRUE), silent = TRUE)
#>     rocAUC <- if (inherits(rocObject, "try-error")) 
#>         NA
#>     else rocObject$auc
#>     out <- c(rocAUC, sensitivity(data[, "pred"], data[, "obs"], 
#>         lev[1]), specificity(data[, "pred"], data[, "obs"], lev[2]))
#>     names(out) <- c("ROC", "Sens", "Spec")
#>     out
#> }
#> <bytecode: 0x000000002f048120>
#> <environment: namespace:caret>

reprex 软件包 (v2.0.1) 于 2022 年 2 月 22 日创建

作为据我所知,有不同的方法可以找到获得 Sens 和 Spec 的最佳阈值(例如 youden)。 TwoClassSummary 的 Sens 和 Spec 是根据什么方法计算的?

我如何更改 TwoClassSummary 以根据 youden 或“最接近的左上角”方法获取 Sens 和 Spec?

更新:

根据插入符训练如何确定最大化特异性的概率阈值插入符使用0.5的阈值来计算Sens和Spec,我认为对于metric="ROC"也是如此就我而言。

我仍然坚持使用修改后的summaryFunction 根据 Youden 获得 Sens/Spec 所需的更改。

I would like to understand how the Sensitivity and Specificity is caluculated in caret::train when using summaryFunction=twoClassSummary and if/how it is possible to change the method.

Example:

library(caret)
library(mlbench)

data(Sonar)

trControl = trainControl("repeatedcv",
                         number = 3,
                         repeats =10,
                         classProbs = TRUE,
                         savePredictions = TRUE,
                         summaryFunction = twoClassSummary)

fit <- train(Class ~ ., data=Sonar, 
               method = "glm",
               family=binomial(),
               metric="ROC",
               trControl=trControl)

fit$results
#>   parameter       ROC      Sens      Spec      ROCSD     SensSD     SpecSD
#> 1      none 0.7350712 0.7225225 0.6744949 0.04365911 0.06805355 0.06238508

In the twoClassSummary function I see how Sens and Spec are calculated (via caret::sensitivity and caret::specificity).

print(twoClassSummary)
#> function (data, lev = NULL, model = NULL) 
#> {
#>     if (length(lev) > 2) {
#>         stop(paste("Your outcome has", length(lev), "levels. The twoClassSummary() function isn't appropriate."))
#>     }
#>     requireNamespaceQuietStop("pROC")
#>     if (!all(levels(data[, "pred"]) == lev)) {
#>         stop("levels of observed and predicted data do not match")
#>     }
#>     rocObject <- try(pROC::roc(data$obs, data[, lev[1]], direction = ">", 
#>         quiet = TRUE), silent = TRUE)
#>     rocAUC <- if (inherits(rocObject, "try-error")) 
#>         NA
#>     else rocObject$auc
#>     out <- c(rocAUC, sensitivity(data[, "pred"], data[, "obs"], 
#>         lev[1]), specificity(data[, "pred"], data[, "obs"], lev[2]))
#>     names(out) <- c("ROC", "Sens", "Spec")
#>     out
#> }
#> <bytecode: 0x000000002f048120>
#> <environment: namespace:caret>

Created on 2022-02-22 by the reprex package (v2.0.1)

As far as I know, there are different methods to find optimal thresholds to get Sens and Spec (e.g., youden). According to which method is the Sens and Spec calculated with TwoClassSummary?

How could I change TwoClassSummary to get Sens and Spec according to youden or "closest topleft" method?

Update:

According to How does caret train determine the probability threshold to maximise Specificity caret uses a threshold of 0.5 to calculate Sens and Spec, I think this is also true for metric="ROC" as in my case.

I am still stuck with changes needed to get Sens/Spec according to Youden using a modified summaryFunction.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文