XGBoost回归器带有Dask RandomizedSearchCv错误（'\ x27; sample_weight \＆＃x27;不支持。

发布于 2025-02-10 21:49:08 字数 3413 浏览 3 评论 0原文

我正在尝试使用dask和RandarizedSearchCV调整XGBoost回归器模型的超参数，但是获得此错误：例外：'valueError（“不支持\'sample_weight \'不支持。”）'。我什至没有在任何地方使用sample_自务，所以我不明白为什么会遇到此错误。请参阅下面的完整错误以及一些示例代码，以说明我的调整过程。为什么我会遇到这个错误？

2022-06-25 10:12:06,128 - distributed.worker - WARNING - Compute Failed
Key:       ('xgbregressor-fit-score-68a8811442e8db26da73c496969ce684', 0, 0)
Function:  fit_and_score
args:      (XGBRegressor(base_score=None, booster=None, colsample_bylevel=None,
             colsample_bynode=None, colsample_bytree=None,
             enable_categorical=False, gamma=None, gpu_id=None,
             importance_type=None, interaction_constraints=None,
             learning_rate=None, max_delta_step=None, max_depth=None,
             min_child_weight=None, missing=-999.0, monotone_constraints=None,
             n_estimators=100, n_jobs=1, num_parallel_tree=None, predictor=None,
             random_state=0, reg_alpha=None, reg_lambda=None,
             scale_pos_weight=None, subsample=None, tree_method='gpu_hist',
             validate_parameters=None, verbosity=None),
Exception: 'ValueError("\'sample_weight\' is not supported.")'

# Define our model

cluster = LocalCUDACluster(dashboard_address="127.0.0.1:8005")
client = Client(cluster)
params_fixed = {'objective'   : 'reg:squarederror', 
            'random_state': 0,
            'n_jobs'      : 1,
            'tree_method' : 'gpu_hist',
            'missing' : -999.0
        }
params_hyp = {
'n_estimators': [500, 800, 1000],
'max_depth':[5,7, 10, 12],
'min_child_weight': [0.9, 1.0],
'subsample': [0.9, 1.0],
'colsample_bylevel': [0.9, 1.0],
'colsample_bynode': [0.9, 1.0],
'colsample_bytree': [0.9, 1.0]}


regressor = xgb.XGBRegressor(**params_fixed)

def do_HPO(model, gridsearch_params, scorer, X, y, mode='gpu-Grid', n_iter=10):
    """
        Perform HPO based on the mode specified
        
        mode: default gpu-Grid. The possible options are:
        1. gpu-grid: Perform GPU based GridSearchCV
        2. gpu-random: Perform GPU based RandomizedSearchCV
        
        n_iter: specified with Random option for number of parameter settings sampled
        
        Returns the best estimator and the results of the search
    """
    if mode == 'gpu-grid':
        print("gpu-grid selected")
        clf = dcv.GridSearchCV(model,
                               gridsearch_params,
                               cv=N_FOLDS,
                               scoring=scorer)
    elif mode == 'gpu-random':
        print("gpu-random selected")
        clf = dcv.RandomizedSearchCV(model,
                               gridsearch_params,
                               cv=N_FOLDS,
                               scoring=scorer,
                               n_iter=n_iter)

    else:
        print("Unknown Option, please choose one of [gpu-grid, gpu-random]")
        return None, None
    res = clf.fit(X, y)
    print("Best clf and score {} {}\n---\n".format(res.best_estimator_, res.best_score_))
    return res.best_estimator_, res
    mode = "gpu-random"
    

res, results = do_HPO(regressor,
                                params_hyp,
                                mean_absolute_error,
                                X,
                                y,
                                mode=mode,
                                n_iter=N_ITER)

原文

I am trying to tune the hyperparameters of xgboost regressor model using DASK and RandomizedSearchCV, but getting this error: Exception: 'ValueError("\'sample_weight\' is not supported.")'. I am not even using sample_weight anywhere, so I don't understand why I am getting this error. Please see the full error below and some sample code demonstrating my process for tuning. Why am I getting this error?

2022-06-25 10:12:06,128 - distributed.worker - WARNING - Compute Failed
Key:       ('xgbregressor-fit-score-68a8811442e8db26da73c496969ce684', 0, 0)
Function:  fit_and_score
args:      (XGBRegressor(base_score=None, booster=None, colsample_bylevel=None,
             colsample_bynode=None, colsample_bytree=None,
             enable_categorical=False, gamma=None, gpu_id=None,
             importance_type=None, interaction_constraints=None,
             learning_rate=None, max_delta_step=None, max_depth=None,
             min_child_weight=None, missing=-999.0, monotone_constraints=None,
             n_estimators=100, n_jobs=1, num_parallel_tree=None, predictor=None,
             random_state=0, reg_alpha=None, reg_lambda=None,
             scale_pos_weight=None, subsample=None, tree_method='gpu_hist',
             validate_parameters=None, verbosity=None),
Exception: 'ValueError("\'sample_weight\' is not supported.")'

# Define our model

cluster = LocalCUDACluster(dashboard_address="127.0.0.1:8005")
client = Client(cluster)
params_fixed = {'objective'   : 'reg:squarederror', 
            'random_state': 0,
            'n_jobs'      : 1,
            'tree_method' : 'gpu_hist',
            'missing' : -999.0
        }
params_hyp = {
'n_estimators': [500, 800, 1000],
'max_depth':[5,7, 10, 12],
'min_child_weight': [0.9, 1.0],
'subsample': [0.9, 1.0],
'colsample_bylevel': [0.9, 1.0],
'colsample_bynode': [0.9, 1.0],
'colsample_bytree': [0.9, 1.0]}


regressor = xgb.XGBRegressor(**params_fixed)

def do_HPO(model, gridsearch_params, scorer, X, y, mode='gpu-Grid', n_iter=10):
    """
        Perform HPO based on the mode specified
        
        mode: default gpu-Grid. The possible options are:
        1. gpu-grid: Perform GPU based GridSearchCV
        2. gpu-random: Perform GPU based RandomizedSearchCV
        
        n_iter: specified with Random option for number of parameter settings sampled
        
        Returns the best estimator and the results of the search
    """
    if mode == 'gpu-grid':
        print("gpu-grid selected")
        clf = dcv.GridSearchCV(model,
                               gridsearch_params,
                               cv=N_FOLDS,
                               scoring=scorer)
    elif mode == 'gpu-random':
        print("gpu-random selected")
        clf = dcv.RandomizedSearchCV(model,
                               gridsearch_params,
                               cv=N_FOLDS,
                               scoring=scorer,
                               n_iter=n_iter)

    else:
        print("Unknown Option, please choose one of [gpu-grid, gpu-random]")
        return None, None
    res = clf.fit(X, y)
    print("Best clf and score {} {}\n---\n".format(res.best_estimator_, res.best_score_))
    return res.best_estimator_, res
    mode = "gpu-random"
    

res, results = do_HPO(regressor,
                                params_hyp,
                                mean_absolute_error,
                                X,
                                y,
                                mode=mode,
                                n_iter=N_ITER)

分享到QQ

分享到微博