使用随机森林方法进行分类来训练我的模型，根据验证数据集对模型进行调整。不使用交叉验证

发布于 2025-01-26 15:40:44 字数 636 浏览 2 评论 0原文

我将数据集分为三组。火车集，验证集和测试集。我想使用随机森林方法来训练数据。但是，要找到最好的ntree，mytry和nnodes，我想使用验证集并查看哪些是最佳参数。然后，我想将这些参数用于我的培训集。我不想使用Caret软件包，因为它使用了交叉验证。我正在处理分类问题。

 a=as.numeric(2:15)
 for (i in 2:15){
 model2= randomForest(as.factor(V2)~ .,data = vset, ntree=500, mtry=i, importance=TRUE)
 predValid2 = predict(model2, newdata = test, type = "class")
a[i-1]= mean(predValid2 == test$V2)
}
n.tree=seq(from = 100, to = 5000, by = 100)
n.mtry= seq(from = 1, to = 15, by = 1)

model3= randomForest(as.factor(V2)~ .,data = vset, ntree=n.tree, mtry=n.mtry, 
importance=TRUE)

我使用上述代码编写循环，但我相信它们是不正确的。如果您可以帮助我根据验证集找到最佳参数，我将不胜感激

原文

I separate my dataset into three sets. train set, validation set, and test set. I want to use random forest method to train the data. But, To find the best ntree, mytry, and nnodes I want to use a validation set and see which are the best parameters. Then, I want to use those parameters for my training set.
I do not want to use the caret package since it used cross-validation.
I am dealing with classification problem.

 a=as.numeric(2:15)
 for (i in 2:15){
 model2= randomForest(as.factor(V2)~ .,data = vset, ntree=500, mtry=i, importance=TRUE)
 predValid2 = predict(model2, newdata = test, type = "class")
a[i-1]= mean(predValid2 == test$V2)
}
n.tree=seq(from = 100, to = 5000, by = 100)
n.mtry= seq(from = 1, to = 15, by = 1)

model3= randomForest(as.factor(V2)~ .,data = vset, ntree=n.tree, mtry=n.mtry, 
importance=TRUE)

I use the above codes to write a loop but I believe they are not correct.
I'd appreciate it if you could help me to find the best parameters based on validation set not cross validation

分享到QQ

分享到微博