rf_optimal = randomForest(Revenue~., data=train, ntree=250, mtry=4, do.trace=T)
roc(train$Revenue, rf_optimal$votes[,2],
plot = TRUE, legacy.axes = TRUE, percent = TRUE,
xlab = "False Positive Percentage", ylab = "True positive percentage",
col = "blue", lwd = 3, print.auc = TRUE, print.auc.y = 45)
roc(train$Revenue, predict(rf_optimal,newdata = train, type = "prob")[,2],
plot = TRUE, legacy.axes = TRUE, percent = TRUE,
xlab = "False Positive Percentage", ylab = "True positive percentage",
col = "blue", lwd = 3, print.auc = TRUE, print.auc.y = 45)
Noreferrer“> ROC曲线
I've got a problem regarding predict() function and $votes in r when i try to calcualte AUC and draw ROC curve. My model is randomForest and code looks like this:
rf_optimal = randomForest(Revenue~., data=train, ntree=250, mtry=4, do.trace=T)
The ROC curve on a train set I get using the code below looks like this:
roc(train$Revenue, rf_optimal$votes[,2],
plot = TRUE, legacy.axes = TRUE, percent = TRUE,
xlab = "False Positive Percentage", ylab = "True positive percentage",
col = "blue", lwd = 3, print.auc = TRUE, print.auc.y = 45)
Roc curve using $votes
And when I use predict() function instead the AUC equals to 100%
roc(train$Revenue, predict(rf_optimal,newdata = train, type = "prob")[,2],
plot = TRUE, legacy.axes = TRUE, percent = TRUE,
xlab = "False Positive Percentage", ylab = "True positive percentage",
col = "blue", lwd = 3, print.auc = TRUE, print.auc.y = 45)
Roc curve using predcit()
What is the difference between those two aproaches? Is it possible to obtain such a value of AUC measure at all, or does it mean that the model is overfitted? AUC for the test set is about 90%.