使用 r 和 weka。如何将元算法与 nfold 评估方法一起使用?

发布于 2024-09-26 11:14:14 字数 634 浏览 9 评论 0原文

这是我的问题的一个示例

library(RWeka)
iris <- read.arff("iris.arff")

执行 nfolds 以获得分类器的正确精度。

m<-J48(class~., data=iris)
e<-evaluate_Weka_classifier(m,numFolds = 5)
summary(e)

这里提供的结果是通过使用数据集的一部分构建模型并使用另一部分进行测试来获得的,因此提供了准确的精度

现在我执行 AdaBoost 来优化分类器的参数

m2 <- AdaBoostM1(class ~. , data = temp ,control = Weka_control(W = list(J48, M = 30)))
summary(m2)

这里提供的结果是通过使用相同的数据集获得的构建模型以及用于评估模型的模型,因此准确性并不代表现实生活中的精度,在现实生活中我们使用模型评估的其他实例。尽管如此,此过程有助于优化所构建的模型。

主要问题是我无法优化构建的模型,同时使用未用于构建模型的数据对其进行测试,或者仅使用 nfold 验证方法来获得适当的准确性。

Here is an example of my problem

library(RWeka)
iris <- read.arff("iris.arff")

Perform nfolds to obtain the proper accuracy of the classifier.

m<-J48(class~., data=iris)
e<-evaluate_Weka_classifier(m,numFolds = 5)
summary(e)

The results provided here are obtained by building the model with a part of the dataset and testing it with another part, therefore provides accurate precision

Now I Perform AdaBoost to optimize the parameters of the classifier

m2 <- AdaBoostM1(class ~. , data = temp ,control = Weka_control(W = list(J48, M = 30)))
summary(m2)

The results provided here are obtained by using the same dataset for building the model and also the same ones used for evaluating it, therefore the accuracy is not representative of real life precision in which we use other instances to be evaluated by the model. Nevertheless this procedure is helpful for optimizing the model that is built.

The main problem is that I can not optimize the model built, and at the same time test it with data that was not used to build the model, or just use a nfold validation method to obtain the proper accuracy.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

自由如风 2024-10-03 11:14:14

我猜你误解了evaluate_Weka_classifier的功能。在这两种情况下,evaluate_Weka_classifier 仅根据训练数据进行交叉验证。它不会改变模型本身。比较以下代码的混淆矩阵:

m<-J48(Species~., data=iris)
e<-evaluate_Weka_classifier(m,numFolds = 5)
summary(m)
e


m2 <- AdaBoostM1(Species ~. , data = iris ,
       control = Weka_control(W = list(J48, M = 30)))
e2 <- evaluate_Weka_classifier(m2,numFolds = 5)
summary(m2)
e2

在这两种情况下,摘要都会为您提供基于训练数据的评估,而函数 evaluate_Weka_classifier() 会为您提供正确的交叉验证。无论是 J48 还是 AdaBoostM1,模型本身都不会根据交叉验证进行更新。

现在关于 AdaBoost 算法本身:事实上,它确实使用某种“加权交叉验证”来得出最终的分类器。错误分类的项目在下一个构建步骤中会被赋予更大的权重,但评估是对所有观察结果使用相同的权重来完成的。因此,使用交叉验证来优化结果并不真正符合自适应增强算法背后的一般思想。

如果您想要使用训练集和评估集进行真正的交叉验证,您可以执行以下操作:

id <- sample(1:length(iris$Species),length(iris$Species)*0.5)
m3 <- AdaBoostM1(Species ~. , data = iris[id,] ,
      control = Weka_control(W = list(J48, M=5)))

e3 <- evaluate_Weka_classifier(m3,numFolds = 5)
# true crossvalidation
e4 <- evaluate_Weka_classifier(m3,newdata=iris[-id,])

summary(m3)
e3
e4

如果您想要基于交叉验证更新的模型,则必须使用不同的算法,例如 randomForest () 来自 randomForest 包。它收集一组基于交叉验证的最佳树。它也可以与 RWeka 包结合使用。

编辑:更正了真正交叉验证的代码。使用 subset 参数也会对 evaluate_Weka_classifier() 产生影响。

I guess you misinterprete the function of evaluate_Weka_classifier. In both cases, evaluate_Weka_classifier does only the cross-validation based on the training data. It doesn't change the model itself. Compare the confusion matrices of following code:

m<-J48(Species~., data=iris)
e<-evaluate_Weka_classifier(m,numFolds = 5)
summary(m)
e


m2 <- AdaBoostM1(Species ~. , data = iris ,
       control = Weka_control(W = list(J48, M = 30)))
e2 <- evaluate_Weka_classifier(m2,numFolds = 5)
summary(m2)
e2

In both cases, the summary gives you the evaluation based on the training data, while the function evaluate_Weka_classifier() gives you the correct crossvalidation. Neither for J48 nor for AdaBoostM1 the model itself gets updated based on the crossvalidation.

Now regarding the AdaBoost algorithm itself : In fact, it does use some kind of "weighted crossvalidation" to come to the final classifier. Wrongly classified items are given more weight in the next building step, but the evaluation is done using equal weight for all observations. So using crossvalidation to optimize the result doesn't really fit into the general idea behind the adaptive boosting algorithm.

If you want a true crossvalidation using a training set and a evaluation set, you could do the following :

id <- sample(1:length(iris$Species),length(iris$Species)*0.5)
m3 <- AdaBoostM1(Species ~. , data = iris[id,] ,
      control = Weka_control(W = list(J48, M=5)))

e3 <- evaluate_Weka_classifier(m3,numFolds = 5)
# true crossvalidation
e4 <- evaluate_Weka_classifier(m3,newdata=iris[-id,])

summary(m3)
e3
e4

If you want a model that gets updated based on a crossvalidation, you'll have to go to a different algorithm, eg randomForest() from the randomForest package. That collects a set of optimal trees based on crossvalidation. It can be used in combination with the RWeka package as well.

edit : corrected code for a true crossvalidation. Using the subset argument has effect in the evaluate_Weka_classifier() as well.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文