重复交叉验证的敏感性和特异性计算方法
我想了解使用 summaryFunction=twoClassSummary
时如何在 caret::train
中计算灵敏度和特异性,以及是否/如何可以更改该方法。
示例:
library(caret)
library(mlbench)
data(Sonar)
trControl = trainControl("repeatedcv",
number = 3,
repeats =10,
classProbs = TRUE,
savePredictions = TRUE,
summaryFunction = twoClassSummary)
fit <- train(Class ~ ., data=Sonar,
method = "glm",
family=binomial(),
metric="ROC",
trControl=trControl)
fit$results
#> parameter ROC Sens Spec ROCSD SensSD SpecSD
#> 1 none 0.7350712 0.7225225 0.6744949 0.04365911 0.06805355 0.06238508
在 twoClassSummary
函数中,我看到了 Sens 和 Spec 的计算方式(通过 caret::sensitivity
和 caret::specificity
)。
print(twoClassSummary)
#> function (data, lev = NULL, model = NULL)
#> {
#> if (length(lev) > 2) {
#> stop(paste("Your outcome has", length(lev), "levels. The twoClassSummary() function isn't appropriate."))
#> }
#> requireNamespaceQuietStop("pROC")
#> if (!all(levels(data[, "pred"]) == lev)) {
#> stop("levels of observed and predicted data do not match")
#> }
#> rocObject <- try(pROC::roc(data$obs, data[, lev[1]], direction = ">",
#> quiet = TRUE), silent = TRUE)
#> rocAUC <- if (inherits(rocObject, "try-error"))
#> NA
#> else rocObject$auc
#> out <- c(rocAUC, sensitivity(data[, "pred"], data[, "obs"],
#> lev[1]), specificity(data[, "pred"], data[, "obs"], lev[2]))
#> names(out) <- c("ROC", "Sens", "Spec")
#> out
#> }
#> <bytecode: 0x000000002f048120>
#> <environment: namespace:caret>
由 reprex 软件包 (v2.0.1) 于 2022 年 2 月 22 日创建
作为据我所知,有不同的方法可以找到获得 Sens 和 Spec 的最佳阈值(例如 youden)。 TwoClassSummary
的 Sens 和 Spec 是根据什么方法计算的?
我如何更改 TwoClassSummary
以根据 youden 或“最接近的左上角”方法获取 Sens 和 Spec?
更新:
根据插入符训练如何确定最大化特异性的概率阈值插入符使用0.5的阈值来计算Sens和Spec,我认为对于metric="ROC"
也是如此就我而言。
我仍然坚持使用修改后的summaryFunction 根据 Youden 获得 Sens/Spec 所需的更改。
I would like to understand how the Sensitivity and Specificity is caluculated in caret::train
when using summaryFunction=twoClassSummary
and if/how it is possible to change the method.
Example:
library(caret)
library(mlbench)
data(Sonar)
trControl = trainControl("repeatedcv",
number = 3,
repeats =10,
classProbs = TRUE,
savePredictions = TRUE,
summaryFunction = twoClassSummary)
fit <- train(Class ~ ., data=Sonar,
method = "glm",
family=binomial(),
metric="ROC",
trControl=trControl)
fit$results
#> parameter ROC Sens Spec ROCSD SensSD SpecSD
#> 1 none 0.7350712 0.7225225 0.6744949 0.04365911 0.06805355 0.06238508
In the twoClassSummary
function I see how Sens and Spec are calculated (via caret::sensitivity
and caret::specificity
).
print(twoClassSummary)
#> function (data, lev = NULL, model = NULL)
#> {
#> if (length(lev) > 2) {
#> stop(paste("Your outcome has", length(lev), "levels. The twoClassSummary() function isn't appropriate."))
#> }
#> requireNamespaceQuietStop("pROC")
#> if (!all(levels(data[, "pred"]) == lev)) {
#> stop("levels of observed and predicted data do not match")
#> }
#> rocObject <- try(pROC::roc(data$obs, data[, lev[1]], direction = ">",
#> quiet = TRUE), silent = TRUE)
#> rocAUC <- if (inherits(rocObject, "try-error"))
#> NA
#> else rocObject$auc
#> out <- c(rocAUC, sensitivity(data[, "pred"], data[, "obs"],
#> lev[1]), specificity(data[, "pred"], data[, "obs"], lev[2]))
#> names(out) <- c("ROC", "Sens", "Spec")
#> out
#> }
#> <bytecode: 0x000000002f048120>
#> <environment: namespace:caret>
Created on 2022-02-22 by the reprex package (v2.0.1)
As far as I know, there are different methods to find optimal thresholds to get Sens and Spec (e.g., youden). According to which method is the Sens and Spec calculated with TwoClassSummary
?
How could I change TwoClassSummary
to get Sens and Spec according to youden or "closest topleft" method?
Update:
According to How does caret train determine the probability threshold to maximise Specificity caret uses a threshold of 0.5 to calculate Sens and Spec, I think this is also true for metric="ROC"
as in my case.
I am still stuck with changes needed to get Sens/Spec according to Youden using a modified summaryFunction.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论