如何通过Sklearn随机网格搜索XGBranker中的组信息？

发布于 2025-01-21 09:47:31 字数 1131 浏览 3 评论 0原文

当我尝试在XGBranker模型上执行随机网格搜索时，我会遇到以下错误：

/workspace/src/objective/rank_obj.cc:52：检查失败：gptr.size（）！= 0＆amp;＆amp;＆amp; gptr.back（）== info.labels_.size（）：组结构与#rows
不一致

错误似乎与传递的组信息的结构有关。我正在传递每个组的大小。如果有n行和2个组，则传递的数组将是[g1_size，g2_size]。

我不确定我会在哪里出错，因为我能够在没有任何问题的情况下适应该模型。只有当我尝试执行RandomGridSearchCV时，我才会面临此错误。代码段如下：

model =  xgb.XGBRanker(
    objective="rank:ndcg",
    max_depth= 10,
    n_estimators=100,
    verbosity=1)
param_dist = {'n_estimators': [100,200,300],
              'learning_rate': [1e-3,1e-4,1e-5],
              'subsample': [0.8,0.9,1],
              'max_depth': [5, 6, 7]
              }
    
fit_params = {"group": groups}
scoring = make_scorer(ndcg_score, greater_is_better=True)

clf = RandomizedSearchCV(model,
                         param_distributions=param_dist,
                         cv =5,
                         n_iter=5,  
                         scoring=scoring,
                         error_score=0,
                         verbose=3,
                         n_jobs=-1)
    
clf.fit(X_train, Y_train,**fit_params)

原文

When I'm tryingto perform random grid search on XGBRanker model, I keep getting an error as follows:

/workspace/src/objective/rank_obj.cc:52: Check failed: gptr.size() != 0 && gptr.back() == info.labels_.Size(): group structure not consistent with #rows

The error seems to be regarding the structure of the group information passed. I'm passing the size of each group. If there are N rows and 2 groups then the array passed would be [g1_size, g2_size].

I'm not sure where I'm going wrong since I'm able to fit the model without any issues. Only when I try to perform RandomGridSearchCV, am I facing this error. The code snippet is as follows:

model =  xgb.XGBRanker(
    objective="rank:ndcg",
    max_depth= 10,
    n_estimators=100,
    verbosity=1)
param_dist = {'n_estimators': [100,200,300],
              'learning_rate': [1e-3,1e-4,1e-5],
              'subsample': [0.8,0.9,1],
              'max_depth': [5, 6, 7]
              }
    
fit_params = {"group": groups}
scoring = make_scorer(ndcg_score, greater_is_better=True)

clf = RandomizedSearchCV(model,
                         param_distributions=param_dist,
                         cv =5,
                         n_iter=5,  
                         scoring=scoring,
                         error_score=0,
                         verbose=3,
                         n_jobs=-1)
    
clf.fit(X_train, Y_train,**fit_params)

分享到QQ

分享到微博