最佳估计器拟合如何在随机搜索中起作用？

发布于 2025-02-13 20:34:55 字数 913 浏览 2 评论 0 原文

我将随机搜索（RSCV）与带有评估集的LGBMClassifier的默认5倍CV一起使用。

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
model_LGBM=LGBMClassifier(objective='binary',metric='auc',random_state=0,early_stopping_round=100)

distributions = dict(max_depth=range(1,10),
                     num_leaves=[50,100,150],
                     learning_rate=[0.1,0.2,0.3],
                     )

clf = RandomizedSearchCV(model_LGBM, distributions, random_state=0,n_iter=100,verbose=10)
clf.fit(X_train,y_train,eval_set=(X_test,y_test))

因此，RSCV的输出看起来像：

First iter: CV 1/5, "valid0's" CV 2/5 "valid0's", ..., CV 5/5 "valid0's";
Second iter: CV 1/5 "valid0's", CV 2/5 "valid0's", ..., CV 5/5 "valid0's";
...
Last iter: CV 1/5 "valid0's", CV 2/5 "valid0's", ..., CV 5/5 "valid0's";
+1 fit with "valid0's"

我想最后一个拟合是改装的最佳估计器。它使用整个训练集吗？它在哪里使用评估集？

原文

I used RandomizedSearchCV (RSCV) with the default 5-fold CV for LGBMClassifier with an evaluation set.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
model_LGBM=LGBMClassifier(objective='binary',metric='auc',random_state=0,early_stopping_round=100)

distributions = dict(max_depth=range(1,10),
                     num_leaves=[50,100,150],
                     learning_rate=[0.1,0.2,0.3],
                     )

clf = RandomizedSearchCV(model_LGBM, distributions, random_state=0,n_iter=100,verbose=10)
clf.fit(X_train,y_train,eval_set=(X_test,y_test))

So the output of the RSCV looks like:

First iter: CV 1/5, "valid0's" CV 2/5 "valid0's", ..., CV 5/5 "valid0's";
Second iter: CV 1/5 "valid0's", CV 2/5 "valid0's", ..., CV 5/5 "valid0's";
...
Last iter: CV 1/5 "valid0's", CV 2/5 "valid0's", ..., CV 5/5 "valid0's";
+1 fit with "valid0's"

I suppose the last fit is the refitted best estimator. Does it use the whole training set? Where does it use the evaluation set?

分享到QQ

分享到微博