将base_score与XGBClassifier一起为每个目标类提供初始先验

发布于 2025-01-24 04:06:23 字数 508 浏览 3 评论 0 原文

使用XGBRegressor时,可以使用 base_score 设置设置所有数据点的初始预测值。通常,该值将设置为训练集中观察值的平均值。

objective 参数设置为 multi:softproba 时,是否可以使用XGBClassifier,通过指定每个目标类的值来实现类似的目标?

例如,计算训练集中每个目标类别的每次发生的总和,并按总数为准,将为我们归一百分比:

class      pct_total
--------------------
blue       0.57
red        0.22
green      0.16
black      0.05

因此,在开始其第一次迭代时,XGBClassifier将从每个数据点开始以这些每类值开始,而不是简单地开始从 1 / num_classes < / code>开始。

有可能实现这一目标吗?

When using XGBRegressor, it's possible to use the base_score setting to set the initial prediction value for all data points. Typically that value would be set to the mean of the observed value in the training set.

Is it possible to achieve a similar thing using XGBClassifier, by specifying a value for every target class, when the objective parameter is set to multi:softproba?

E.g. computing the sum of each occurrence for each target class in the training set and normalizing by percentage of total would give us:

class      pct_total
--------------------
blue       0.57
red        0.22
green      0.16
black      0.05

So that when beginning its first iteration, XGBClassifier would start with these per-class values for every data point, instead of simply starting with 1 / num_classes for all classes.

Is it possible to achieve this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

夜清冷一曲。 2025-01-31 04:06:23

您可以使用参数 base_margin 来完成此操作。阅读中引用的演示为 dmatrix ;但是,正如文档所说,您可以在 xgbClassifier.fit method(具有足够新的XGBOOST)中设置 base_margin

base_margin 的形状有望为(n_samples,n_classes);由于Xgboost 以单vs-rest时尚适合多类模型,您为每个样本提供了每个样本的基础分数在三个独立的GBM中。还请注意,这些值在log-odds空间中,因此相应地转换。另外,请不要忘记将 base_margin 添加到每个预测调用(现在将会更好,因为它可以保存到课堂上...再次查看链接的问题在本段的早期)。

You can accomplish this using the parameter base_margin. Read about it in the docs; the referenced demo is here, but it uses the native API and DMatrix; as the docs say though, you can set base_margin in the XGBClassifier.fit method (with new enough xgboost).

The shape of base_margin is expected to be (n_samples, n_classes); since xgboost fits multiclass models in a one-vs-rest fashion, you're providing for each sample its base score for each of the three separate GBMs. Note also that these values are in the log-odds space, so transform accordingly. Also don't forget to add base_margin to every prediction call (now that would be nicer as a builtin that would be saved to the class...see again the linked question earlier in this paragraph).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文