在 WEKA 中运行交叉验证之前是否需要应用过滤器

发布于 2024-12-21 01:39:47 字数 849 浏览 0 评论 0原文

我想对我正在使用的一些分类器运行 n 折交叉验证。我在 WEKA Wiki 上找到了示例代码（这里是 < a href="http://weka.wikispaces.com/file/view/WekaDemo.java" rel="nofollow">WekaDemo.java），但这会在运行验证之前应用过滤器。这是否总是需要完成或不需要？

这是代码部分：

  /**
   * runs 10fold CV over the training file
   */
  public void execute() throws Exception {
    // run filter
    m_Filter.setInputFormat(m_Training);
    Instances filtered = Filter.useFilter(m_Training, m_Filter);

    // train classifier on complete file for tree
    m_Classifier.buildClassifier(filtered);

    // 10fold CV with seed=1
    m_Evaluation = new Evaluation(filtered);
    m_Evaluation.crossValidateModel(
        m_Classifier, filtered, 10, m_Training.getRandomNumberGenerator(1));
  }

这也是评估分类器性能的可接受的方法吗？

原文

I want to run n fold cross validation on some classifiers I am using. I found example code on the WEKA Wiki (here is the WekaDemo.java) but this applies a filter before running the validation. Does this always need to be done or is this not required?

Here is the section of code:

  /**
   * runs 10fold CV over the training file
   */
  public void execute() throws Exception {
    // run filter
    m_Filter.setInputFormat(m_Training);
    Instances filtered = Filter.useFilter(m_Training, m_Filter);

    // train classifier on complete file for tree
    m_Classifier.buildClassifier(filtered);

    // 10fold CV with seed=1
    m_Evaluation = new Evaluation(filtered);
    m_Evaluation.crossValidateModel(
        m_Classifier, filtered, 10, m_Training.getRandomNumberGenerator(1));
  }

Also is this an acceptable way of evaluating performance of a classifier?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

过度放纵 2024-12-28 01:39:47

我认为这是不好的做法。如果过滤器依赖于/使用类信息，那么交叉验证估计将（可能非常）乐观地偏差，因此可能毫无用处。举一个极端的例子，考虑将类属性的副本添加到数据中。在几乎所有情况下，如果您使用 weka.classifiers.meta.FilteredClassifier，您会变得更好、更安全，在您引用的同一 Wiki 页面上有一个关于如何使用它的示例。

干杯，伯恩哈德

回复收藏 0 原文

~没有更多了~