XGBRegressor具有权重和base_margin：不可用样本验证吗？

发布于 2025-02-04 20:11:16 字数 1178 浏览 4 评论 0原文

我有一个旧的线性模型，我希望使用XGBoost进行改进。我有旧模型的预测，我希望将其用作基本利润。另外，由于我的建模性质，我需要使用权重。我的旧glm是一种带公式number_of_defaults/expuse的泊松回归〜param_1 + param_2，将设置为expuse的权重（与响应变量中的分母相同）。在数据上训练新的XGBOOST模型时，我要这样做：

xgb_model = xgb.XGBRegressor(n_estimators=25,
                             max_depth=100,
                             max_leaves=100,
                             learning_rate=0.01,
                             n_jobs=4,
                             eval_metric="poisson-nloglik",
                             nrounds=50)

model = xgb_model.fit(X=X_train, y=y_train, sample_weight=_WEIGHT, base_margin=_BASE_MARGIN)

，其中_ WEATER和_base_margin是权重和预测（从x_train中弹出）。但是，当我需要指定权重和基本保证金时，如何进行交叉验证或未经样本分析？

据我所知，我可以使用sklearn和gridSearchcv，但是然后我需要在xgbRegressor（）中指定权重和基本保证金（而是fit（）如上所述）。 base_margin在xgbregressor（）中的等效词是参数base_score，但没有权重的参数。

另外，我可能会忘记进行交叉验证，而只使用培训和测试数据集，然后我将在xgbregressor（）中使用eval_set参数（），但是如果我是否可以指定不同集合中的重量和什么是基本余量。

朝着正确方向的任何指导都非常感谢！

原文

I have an old linear model which I wish to improve using XGBoost. I have the predictions from the old model, which I wish to use as a base margin. Also, due to the nature of what I'm modeling, I need to use weights. My old glm is a poisson regression with formula number_of_defaults/exposure ~ param_1 + param_2 and weights set to exposure (same as denominator in response variable). When training the new XGBoost model on data, I do this:

xgb_model = xgb.XGBRegressor(n_estimators=25,
                             max_depth=100,
                             max_leaves=100,
                             learning_rate=0.01,
                             n_jobs=4,
                             eval_metric="poisson-nloglik",
                             nrounds=50)

model = xgb_model.fit(X=X_train, y=y_train, sample_weight=_WEIGHT, base_margin=_BASE_MARGIN)

, where _WEIGHT and _BASE_MARGIN are the weights and predictions (popped out of X_train).
But how do I do cross validation or out of sample analysis when I need to specify weights and base margin?

As far as I see I can use sklearn and GridSearchCV, but then I would need to specify weights and base margin in XGBRegressor() (instead of in fit() as above). The equivalent of base_margin in XGBRegressor() is the argument base_score, but there is no argument for weight.

Also, I could potentially forget about doing cross-validation, and just use a training and test dataset, and I would then use eval_set argument in XGBRegressor(), but if I did that there is no way of specifying what is weight and what is base margin in the different sets.

Any guidance in the right direction is much appreciated!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

骷髅 2025-02-11 20:11:16

您可以使用cross_val_predict带有fit_params参数，或gridsearchcv.fit ** fit_params。

Here is a working proof of concept

import xgboost as xgb
from sklearn import datasets
from sklearn.model_selection import cross_val_predict, GridSearchCV
import numpy as np

# Sample dataset
diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]

xgb_model = xgb.XGBRegressor(n_estimators=5)
fit_params = dict(sample_weight=np.abs(X[:, 0]), base_margin=np.abs(X[:, 1]))

# Simple fit
xgb_model.fit(X, y, **fit_params)

# cross_val_predict
y_pred = cross_val_predict(xgb_model, X, y, cv=3, fit_params=fit_params)
print(y_pred.shape, y.shape)

# grid search
grid = GridSearchCV(xgb_model, param_grid={"n_estimators": [5, 10, 15]})
grid.fit(X, y, **fit_params)

You can see what happen in the code source: here, here and 在这里。最后一个链接是fit_params在交叉验证拆分后获得索引。

You can use cross_val_predict with fit_params argument, or GridSearchCV.fit with **fit_params.

Here is a working proof of concept

import xgboost as xgb
from sklearn import datasets
from sklearn.model_selection import cross_val_predict, GridSearchCV
import numpy as np

# Sample dataset
diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]

xgb_model = xgb.XGBRegressor(n_estimators=5)
fit_params = dict(sample_weight=np.abs(X[:, 0]), base_margin=np.abs(X[:, 1]))

# Simple fit
xgb_model.fit(X, y, **fit_params)

# cross_val_predict
y_pred = cross_val_predict(xgb_model, X, y, cv=3, fit_params=fit_params)
print(y_pred.shape, y.shape)

# grid search
grid = GridSearchCV(xgb_model, param_grid={"n_estimators": [5, 10, 15]})
grid.fit(X, y, **fit_params)

You can see what happen in the code source: here, here and here. The last link is where fit_params get indexing following cross validation splits.

回复收藏 0 原文

~没有更多了~