关于决策树的问题

发布于 2024-10-04 10:50:27 字数 187 浏览 10 评论 0原文

研究决策树一段时间后,我注意到有一种称为提升的小技术。我发现在正常情况下,它会提高决策树的准确性。

所以我只是想知道,为什么我们不简单地将这种提升合并到我们构建的每个决策树中?由于目前我们将 boosting 作为一项单独的技术保留下来,所以我思考:使用 boosting 比仅使用单个决策树有什么缺点吗?

谢谢你在这里帮助我!

after studying decision tree for a while, I noticed there is a small technique called boosting. I see in normal cases, it will improve the accuracy of the decision tree.

So I am just wondering, why don't we just simply incorporate this boosting into every decision tree we built? Since currently we leave boosting out as a separate technique, so I ponder: are there any disadvantages of using boosting than just using a single decision tree?

Thanks for helping me out here!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

没有心的人 2024-10-11 10:50:27

Boosting 是一种可以超越任何学习算法的技术。当您构建的原始分类器的性能略高于随机时,它是最有效的。如果您的决策树已经相当不错,则 boosting 可能不会产生太大影响,但会带来性能损失 - 如果您运行 boosting 100 次迭代,则必须训练和存储 100 个决策树。

通常人们使用决策树桩(只有一个节点的决策树)进行 boost,并获得与使用完整决策树进行 boost 一样好的结果。

我已经做了一些实验与提升和发现它相当稳健,比单树分类器更好,但也更慢(我曾经进行过 10 次迭代),并且不如一些更简单的学习器(公平地说,这是一个非常嘈杂的数据集)

Boosting is a technique that can go on top any learning algorithm. It is the most effective when the original classifier you built performs just barely above random. If your decision tree is pretty good already, boosting may not make much difference, but have performance penalty -- if you run boosting for 100 iterations you'll have to train and store 100 decision trees.

Usually people do boost with decision stumps (decision trees with just one node) and get results as good as boosting with full decision trees.

I've done some experiments with boosting and found it to be fairly robust, better than single tree classifier, but also slower (I used to 10 iterations), and not as good as some of the simpler learners (to be fair, it was an extremely noisy dataset)

虫児飞 2024-10-11 10:50:27

boosting 有几个缺点:
1-实施困难
2-他们比决策树更需要使用训练集进行广泛的训练
3-最糟糕的是所有的 boosting 算法都需要一个 Threshold 值
在大多数情况下这并不容易弄清楚,因为它需要大量的试验和错误测试,知道增强算法的整体性能取决于这个阈值

there are several disadvatages for boosting:
1-hard to implement
2-they need extensive training with training sets more than a decision tree does
3- the worst thing is that all boosting algorithms require a Threshold value
which is in most cases not easy to figure out because it requires extensive trial and error tests knowing that the whole performance of the boosting algorithm depends on this threshold

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文