XGBoost回归器带有Dask RandomizedSearchCv错误('\ x27; sample_weight \'不支持。
我正在尝试使用dask和RandarizedSearchCV调整XGBoost回归器模型的超参数,但是获得此错误:例外:'valueError(“不支持\'sample_weight \'不支持。”)'
。我什至没有在任何地方使用sample_自务,所以我不明白为什么会遇到此错误。请参阅下面的完整错误以及一些示例代码,以说明我的调整过程。为什么我会遇到这个错误?
2022-06-25 10:12:06,128 - distributed.worker - WARNING - Compute Failed
Key: ('xgbregressor-fit-score-68a8811442e8db26da73c496969ce684', 0, 0)
Function: fit_and_score
args: (XGBRegressor(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=-999.0, monotone_constraints=None,
n_estimators=100, n_jobs=1, num_parallel_tree=None, predictor=None,
random_state=0, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method='gpu_hist',
validate_parameters=None, verbosity=None),
Exception: 'ValueError("\'sample_weight\' is not supported.")'
# Define our model
cluster = LocalCUDACluster(dashboard_address="127.0.0.1:8005")
client = Client(cluster)
params_fixed = {'objective' : 'reg:squarederror',
'random_state': 0,
'n_jobs' : 1,
'tree_method' : 'gpu_hist',
'missing' : -999.0
}
params_hyp = {
'n_estimators': [500, 800, 1000],
'max_depth':[5,7, 10, 12],
'min_child_weight': [0.9, 1.0],
'subsample': [0.9, 1.0],
'colsample_bylevel': [0.9, 1.0],
'colsample_bynode': [0.9, 1.0],
'colsample_bytree': [0.9, 1.0]}
regressor = xgb.XGBRegressor(**params_fixed)
def do_HPO(model, gridsearch_params, scorer, X, y, mode='gpu-Grid', n_iter=10):
"""
Perform HPO based on the mode specified
mode: default gpu-Grid. The possible options are:
1. gpu-grid: Perform GPU based GridSearchCV
2. gpu-random: Perform GPU based RandomizedSearchCV
n_iter: specified with Random option for number of parameter settings sampled
Returns the best estimator and the results of the search
"""
if mode == 'gpu-grid':
print("gpu-grid selected")
clf = dcv.GridSearchCV(model,
gridsearch_params,
cv=N_FOLDS,
scoring=scorer)
elif mode == 'gpu-random':
print("gpu-random selected")
clf = dcv.RandomizedSearchCV(model,
gridsearch_params,
cv=N_FOLDS,
scoring=scorer,
n_iter=n_iter)
else:
print("Unknown Option, please choose one of [gpu-grid, gpu-random]")
return None, None
res = clf.fit(X, y)
print("Best clf and score {} {}\n---\n".format(res.best_estimator_, res.best_score_))
return res.best_estimator_, res
mode = "gpu-random"
res, results = do_HPO(regressor,
params_hyp,
mean_absolute_error,
X,
y,
mode=mode,
n_iter=N_ITER)
I am trying to tune the hyperparameters of xgboost regressor model using DASK and RandomizedSearchCV, but getting this error: Exception: 'ValueError("\'sample_weight\' is not supported.")'
. I am not even using sample_weight anywhere, so I don't understand why I am getting this error. Please see the full error below and some sample code demonstrating my process for tuning. Why am I getting this error?
2022-06-25 10:12:06,128 - distributed.worker - WARNING - Compute Failed
Key: ('xgbregressor-fit-score-68a8811442e8db26da73c496969ce684', 0, 0)
Function: fit_and_score
args: (XGBRegressor(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=-999.0, monotone_constraints=None,
n_estimators=100, n_jobs=1, num_parallel_tree=None, predictor=None,
random_state=0, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method='gpu_hist',
validate_parameters=None, verbosity=None),
Exception: 'ValueError("\'sample_weight\' is not supported.")'
# Define our model
cluster = LocalCUDACluster(dashboard_address="127.0.0.1:8005")
client = Client(cluster)
params_fixed = {'objective' : 'reg:squarederror',
'random_state': 0,
'n_jobs' : 1,
'tree_method' : 'gpu_hist',
'missing' : -999.0
}
params_hyp = {
'n_estimators': [500, 800, 1000],
'max_depth':[5,7, 10, 12],
'min_child_weight': [0.9, 1.0],
'subsample': [0.9, 1.0],
'colsample_bylevel': [0.9, 1.0],
'colsample_bynode': [0.9, 1.0],
'colsample_bytree': [0.9, 1.0]}
regressor = xgb.XGBRegressor(**params_fixed)
def do_HPO(model, gridsearch_params, scorer, X, y, mode='gpu-Grid', n_iter=10):
"""
Perform HPO based on the mode specified
mode: default gpu-Grid. The possible options are:
1. gpu-grid: Perform GPU based GridSearchCV
2. gpu-random: Perform GPU based RandomizedSearchCV
n_iter: specified with Random option for number of parameter settings sampled
Returns the best estimator and the results of the search
"""
if mode == 'gpu-grid':
print("gpu-grid selected")
clf = dcv.GridSearchCV(model,
gridsearch_params,
cv=N_FOLDS,
scoring=scorer)
elif mode == 'gpu-random':
print("gpu-random selected")
clf = dcv.RandomizedSearchCV(model,
gridsearch_params,
cv=N_FOLDS,
scoring=scorer,
n_iter=n_iter)
else:
print("Unknown Option, please choose one of [gpu-grid, gpu-random]")
return None, None
res = clf.fit(X, y)
print("Best clf and score {} {}\n---\n".format(res.best_estimator_, res.best_score_))
return res.best_estimator_, res
mode = "gpu-random"
res, results = do_HPO(regressor,
params_hyp,
mean_absolute_error,
X,
y,
mode=mode,
n_iter=N_ITER)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论