在 WEKA 中运行交叉验证之前是否需要应用过滤器
我想对我正在使用的一些分类器运行 n 折交叉验证。我在 WEKA Wiki 上找到了示例代码(这里是 < a href="http://weka.wikispaces.com/file/view/WekaDemo.java" rel="nofollow">WekaDemo.java),但这会在运行验证之前应用过滤器。这是否总是需要完成或不需要?
这是代码部分:
/**
* runs 10fold CV over the training file
*/
public void execute() throws Exception {
// run filter
m_Filter.setInputFormat(m_Training);
Instances filtered = Filter.useFilter(m_Training, m_Filter);
// train classifier on complete file for tree
m_Classifier.buildClassifier(filtered);
// 10fold CV with seed=1
m_Evaluation = new Evaluation(filtered);
m_Evaluation.crossValidateModel(
m_Classifier, filtered, 10, m_Training.getRandomNumberGenerator(1));
}
这也是评估分类器性能的可接受的方法吗?
I want to run n fold cross validation on some classifiers I am using. I found example code on the WEKA Wiki (here is the WekaDemo.java) but this applies a filter before running the validation. Does this always need to be done or is this not required?
Here is the section of code:
/**
* runs 10fold CV over the training file
*/
public void execute() throws Exception {
// run filter
m_Filter.setInputFormat(m_Training);
Instances filtered = Filter.useFilter(m_Training, m_Filter);
// train classifier on complete file for tree
m_Classifier.buildClassifier(filtered);
// 10fold CV with seed=1
m_Evaluation = new Evaluation(filtered);
m_Evaluation.crossValidateModel(
m_Classifier, filtered, 10, m_Training.getRandomNumberGenerator(1));
}
Also is this an acceptable way of evaluating performance of a classifier?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为这是不好的做法。如果过滤器依赖于/使用类信息,那么交叉验证估计将(可能非常)乐观地偏差,因此可能毫无用处。举一个极端的例子,考虑将类属性的副本添加到数据中。在几乎所有情况下,如果您使用 weka.classifiers.meta.FilteredClassifier,您会变得更好、更安全,在您引用的同一 Wiki 页面上有一个关于如何使用它的示例。
干杯,伯恩哈德
I'd consider this bad practise. If the filter depends on/uses class information, then the cross-validation estimate will be (potentially very) optimistically biased, and therefore probably useless. For an extreme example think about adding a copy of the class-attribute to the data. In almost all cases you will be better off and safer if you use weka.classifiers.meta.FilteredClassifier there is an example on how to use it on the same Wiki page you cite.
cheers, Bernhard