使用XGBRegressor时,可以使用 base_score
设置设置所有数据点的初始预测值。通常,该值将设置为训练集中观察值的平均值。
当 objective
参数设置为 multi:softproba
时,是否可以使用XGBClassifier,通过指定每个目标类的值来实现类似的目标?
例如,计算训练集中每个目标类别的每次发生的总和,并按总数为准,将为我们归一百分比:
class pct_total
--------------------
blue 0.57
red 0.22
green 0.16
black 0.05
因此,在开始其第一次迭代时,XGBClassifier将从每个数据点开始以这些每类值开始,而不是简单地开始从 1 / num_classes < / code>开始。
有可能实现这一目标吗?
When using XGBRegressor, it's possible to use the base_score
setting to set the initial prediction value for all data points. Typically that value would be set to the mean of the observed value in the training set.
Is it possible to achieve a similar thing using XGBClassifier, by specifying a value for every target class, when the objective
parameter is set to multi:softproba
?
E.g. computing the sum of each occurrence for each target class in the training set and normalizing by percentage of total would give us:
class pct_total
--------------------
blue 0.57
red 0.22
green 0.16
black 0.05
So that when beginning its first iteration, XGBClassifier would start with these per-class values for every data point, instead of simply starting with 1 / num_classes
for all classes.
Is it possible to achieve this?
发布评论
评论(1)
您可以使用参数
base_margin
来完成此操作。阅读中引用的演示为 dmatrix ;但是,正如文档所说,您可以在xgbClassifier.fit
method(具有足够新的XGBOOST)中设置base_margin
。base_margin
的形状有望为(n_samples,n_classes)
;由于Xgboost 以单vs-rest时尚适合多类模型,您为每个样本提供了每个样本的基础分数在三个独立的GBM中。还请注意,这些值在log-odds空间中,因此相应地转换。另外,请不要忘记将base_margin
添加到每个预测调用(现在,将会更好,因为它可以保存到课堂上...再次查看链接的问题在本段的早期)。You can accomplish this using the parameter
base_margin
. Read about it in the docs; the referenced demo is here, but it uses the native API andDMatrix
; as the docs say though, you can setbase_margin
in theXGBClassifier.fit
method (with new enough xgboost).The shape of
base_margin
is expected to be(n_samples, n_classes)
; since xgboost fits multiclass models in a one-vs-rest fashion, you're providing for each sample its base score for each of the three separate GBMs. Note also that these values are in the log-odds space, so transform accordingly. Also don't forget to addbase_margin
to every prediction call (now that would be nicer as a builtin that would be saved to the class...see again the linked question earlier in this paragraph).