决策树剪枝的效果

发布于 2024-09-28 16:28:50 字数 143 浏览 10 评论 0原文

我想知道我是否从训练和验证集中建立了一个像 ID3 这样的决策树 A,但 A 没有被修剪。 同时,我在ID3中还有另一个决策树B,它是从相同的训练和验证集生成的,但B被修剪了。 现在我在未来未标记的测试集上测试 A 和 B,是否总是剪枝树会表现更好? 欢迎任何想法,谢谢。

I want to know if I build up a decision tree A like ID3 from training and validation set,but A is unpruned.
At the same time,I have another decision tree B also in ID3 generated from the same training and validation set,but B is pruned.
Now I test both A and B on a future unlabeled test set,is it always the case that pruned tree will perform better?
Any idea is welcomed,thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

甜心小果奶 2024-10-05 16:28:50

我认为我们需要使区别更清楚:修剪后的树在验证集上总是表现更好,但在测试集上不一定如此(事实上它也具有相同的性能)或者训练集上的表现较差)。我假设修剪是在树构建后完成的(即:后修剪)。

请记住,使用验证集的全部原因是为了避免对训练数据集的过度拟合,并且这里的关键点是泛化:我们想要一个模型(决策树),它能够超越“训练时”提供的实例泛化到新的未见过的示例。

I think we need to make the distinction clearer: pruned trees always perform better on the validation set, but not necessarily so on the testing set (in fact it is also of equal or worse performance on the training set). I am assuming that the pruning is done after the tree is built (ie: post-pruning)..

Remember that the whole reason of using a validation set is to avoid overfitting over the training dataset, and the key point here is generalization: we want a model (decision tree) that generalizes beyond the instances that have been provided at "training time" to new unseen examples.

像极了他 2024-10-05 16:28:50

修剪应该通过防止过度拟合来改进分类。由于只有在提高验证集的分类率时才会进行剪枝,因此在验证过程中,剪枝树的性能将与未剪枝的树一样好,甚至更好。

Pruning is supposed to improve classification by preventing overfitting. Since pruning will only occur if it improves classification rates on the validation set, a pruned tree will perform as well or better than an un-pruned tree during validation.

小瓶盖 2024-10-05 16:28:50

错误的修剪可能会导致错误的结果。尽管通常需要减小决策树的大小,但修剪时通常会追求更好的结果。因此如何是修剪的关键。

Bad pruning can lead to wrong results. Although a reduced decision tree size is often desired, you usually aim for better results when pruning. Therefore the how is the crux of the pruning.

心的位置 2024-10-05 16:28:50

我同意@AMRO 的第一个答案。 后剪枝是决策树剪枝的最常见方法,它是在树构建之后完成的。但是,也可以进行预修剪。在预剪枝中,通过使用指定的阈值提前停止其构造来剪枝树。例如,决定不在给定节点处分割训练元组的子集。

然后该节点成为叶子。该叶子可以保存元组子集中最频繁的类别或这些元组的概率。

I agree with 1st answer by @AMRO. Post-pruning is the most common approach for decision tree pruning and it is done after the tree is built. But, Pre-pruning can also be done. in pre-pruning, a tree is pruned by halting its construction early, by using a specified threshold value. For example, by deciding not to split the subset of training tuples at a given node.

Then that node becomes a leaf. This leaf may hold the most frequent class among the subset of tuples or the probability of those tuples.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文