为什么使用单个估计器的Adaboost或梯度启动集合会产生与单个估计器不同的值?

发布于 2025-01-29 18:53:20 字数 1007 浏览 3 评论 0原文

我很好奇为什么单个估计器Adaboost“ Ensemble”,单估计器梯度增强了“集合”,而单个决策树具有不同的值。

下面的代码比较了三个模型,所有模型都使用相同的基本估计器(具有max_depth = 4的回归树,基于MSE的损失。)

  1. 基本估计作为裸树模型,
  2. 估计器ADABOOST
  3. 使用基本估计器作为单型单型单型单 使用基本估计器作为原型

提取和检查树木的估计器GBR表明它们非常不同,即使每个人都应该以相同的方式训练。

from sklearn.datasets import load_diabetes
from sklearn.ensemble import AdaBoostRegressor, GradientBoostingRegressor
from sklearn.tree import DecisionTreeRegressor, export_text

data = load_diabetes()
X = data['data']
y = data['target']

simple_model = DecisionTreeRegressor(max_depth=4)
prototype = DecisionTreeRegressor(max_depth=4)
simple_ada = AdaBoostRegressor(prototype, n_estimators=1)
simple_gbr = GradientBoostingRegressor(max_depth=4, n_estimators=1, criterion='mse')

simple_model.fit(X, y)
simple_ada.fit(X, y)
simple_gbr.fit(X, y)

ada_one = simple_ada.estimators_[0]
gbr_one = simple_gbr.estimators_[0][0]

print(export_text(simple_model))
print(export_text(ada_one))
print(export_text(gbr_one))

I'm curious why a single-estimator Adaboost "ensemble", a single-estimator Gradient Boosted "ensemble" and a single decision tree give different values.

The code below compares three models, all using the same base estimator (regression tree with max_depth = 4 and loss based on mse.)

  1. The base estimate as a bare tree model
  2. A single-estimator Adaboost using the base estimator as a prototype
  3. A single-estimator GBR using the base estimator as a prototype

Extracting and inspecting the trees indicate they are very different, even though each should have been trained in the same fashion.

from sklearn.datasets import load_diabetes
from sklearn.ensemble import AdaBoostRegressor, GradientBoostingRegressor
from sklearn.tree import DecisionTreeRegressor, export_text

data = load_diabetes()
X = data['data']
y = data['target']

simple_model = DecisionTreeRegressor(max_depth=4)
prototype = DecisionTreeRegressor(max_depth=4)
simple_ada = AdaBoostRegressor(prototype, n_estimators=1)
simple_gbr = GradientBoostingRegressor(max_depth=4, n_estimators=1, criterion='mse')

simple_model.fit(X, y)
simple_ada.fit(X, y)
simple_gbr.fit(X, y)

ada_one = simple_ada.estimators_[0]
gbr_one = simple_gbr.estimators_[0][0]

print(export_text(simple_model))
print(export_text(ada_one))
print(export_text(gbr_one))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

梦开始←不甜 2025-02-05 18:53:20

adaboostregressor为其每一棵树执行加权引导采样(与ADABOOSTCLASSIFIER不同,IIRC仅适合使用样本权重的基本分类器):


渐变BoostingRegressor具有每个样本的初始值,可以从以下方式提升:

init:估计器或'零',默认= none
用于计算初始预测的估计器对象。 init必须提供合适和预测。如果“零”,则将初始原始预测设置为零。默认情况下,使用了dummyestimator,可以预测平均目标值(对于损失='squared_error')或其他损失的分数。

因此,您的树和单估计器GBM之间的主要区别在于,后者的叶值随平均目标值而变化。设置init ='Zero'使我们更加接近,但是我确实看到了在树下选择的拆分方面的一些差异。 是由于在最佳拆分值中的纽带引起的,并且可以通过在整个过程中设置common random_state来修复。

AdaBoostRegressor performs weighted bootstrap sampling for each of its trees (unlike AdaBoostClassifier which IIRC just fits the base classifier using sample weights): source. So there's no way to enforce that a single-tree AdaBoost regressor matches a single decision tree (without, I suppose, doing the bootstrap sampling manually and fitting the single decision tree).


GradientBoostingRegressor has an initial value for each sample to boost from:

init : estimator or ‘zero’, default=None
An estimator object that is used to compute the initial predictions. init has to provide fit and predict. If ‘zero’, the initial raw predictions are set to zero. By default a DummyEstimator is used, predicting either the average target value (for loss=’squared_error’), or a quantile for the other losses.

So the main difference between your tree and single-estimator-gbm is that the latter's leaf values are shifted by the average target value. Setting init='zero' gets us much closer, but I do see some differences in chosen splits further down the tree. That is due to ties in optimal split values, and can be fixed by setting a common random_state throughout.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文