如何在具有多个最佳分割的决策树中选择最佳分割标准?
我用 python 从头开始编写了一个决策树回归器。它的性能优于 sklearn 算法。两棵树都使用相同的叶节点构建完全相同的分割。但是,在寻找最佳分割时,存在具有最佳方差减少的多个分割,它们仅因特征索引而异。我的算法选择作为生长树中的分割标准的特征会导致测试集预测中出现主要异常值,而从 sklearn 中选择的特征则不会。
那么,如果在构建树时同一分支中有多个最佳分裂,该怎么做呢?选择哪个是最好的功能?
I wrote a decision tree regressor from scratch in python. It is outperformed by the sklearn algorithm. Both trees build exactly the same splits with the same leaf nodes. BUT when looking for the best split there are multiple splits with optimal variance reduction that only differ by the feature index. The feature that my algorithm selects as a splitting criteria in the grown tree leads to major outliers in the test set prediction, whereas the feature selected from sklearn does not.
So what is the right thing to do if there are mutliple best splits in the same branch while building the tree? Which is the best feature to choose?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论