如何从 xgboost.train 检索最佳模型
我正在学习如何使用 XGBClassifier 生成预测,并且我发现 xgboost.train 是 XGBClassifier 在幕后调用的内容。我想第一个问题是:是否有任何理由偏爱一种方式而不是另一种方式,或者它们根本不相等?
我设置了这段代码,它在迭代 12 时为我提供了最佳模型:
m1 = xgb.XGBClassifier(max_depth = 5,
n_estimators = 20,
objective = 'binary:logistic',
use_label_encoder = False,
eval_metric = 'auc',
random_state = 1234)
m1.fit(x_train, y_train,
eval_set = [(x_test, y_test)],
eval_metric = 'auc',
early_stopping_rounds = 5)
pred1 = m1.predict_proba(x_test)[:,1]
roc_auc_score(y_test, pred1)
我还没有调整参数,因为我只是想确保代码运行。然后我设置了下面的代码,希望获得与上面的行为相同的行为:
train_params = {'objective': 'binary:logistic',
'max_depth': 5,
'eval_metric':'auc',
'random_state':1234}
mat_train = xgb.DMatrix(data = x_train, label = y_train)
mat_test = xgb.DMatrix(data = x_test, label = y_test)
evals_result = {}
m2 = xgb.train(params = train_params,
dtrain = mat_train,
num_boost_round = 20,
early_stopping_rounds = 5,
evals = [(mat_test, 'eval')],
evals_result = evals_result)
pred2 = m2.predict(mat_test)
roc_auc_score(y_test, pred2)
这也在迭代 12 处返回相同的最佳模型,但预测结果与 XGBClassifier 方法不同,因为 pred2 实际上使用了第 17 次迭代。我仔细研究了文档,发现了关于early_stopping_rounds 参数的内容:
该方法返回上次迭代的模型(不是 最好的一个)。如果需要最佳模型,请使用自定义回调或模型切片。
我无法找到有关该主题的大量资源,因此我在这里寻求一些帮助,以便我可以使用具有最高 AUC 值的模型迭代生成预测。欣赏它!
I'm learning how to use XGBClassifier to generate predictions, and I found out that xgboost.train is what XGBClassifier calls under the hood. I guess the first question is: is there any reason to favor one way over another, or are they not equivalent at all?
I had this code set up that gave me the best model at iteration 12:
m1 = xgb.XGBClassifier(max_depth = 5,
n_estimators = 20,
objective = 'binary:logistic',
use_label_encoder = False,
eval_metric = 'auc',
random_state = 1234)
m1.fit(x_train, y_train,
eval_set = [(x_test, y_test)],
eval_metric = 'auc',
early_stopping_rounds = 5)
pred1 = m1.predict_proba(x_test)[:,1]
roc_auc_score(y_test, pred1)
I haven't tuned the parameters yet as I just wanted to make sure the code runs. Then I had the code below set up, hoping to get the same behavior as the one above:
train_params = {'objective': 'binary:logistic',
'max_depth': 5,
'eval_metric':'auc',
'random_state':1234}
mat_train = xgb.DMatrix(data = x_train, label = y_train)
mat_test = xgb.DMatrix(data = x_test, label = y_test)
evals_result = {}
m2 = xgb.train(params = train_params,
dtrain = mat_train,
num_boost_round = 20,
early_stopping_rounds = 5,
evals = [(mat_test, 'eval')],
evals_result = evals_result)
pred2 = m2.predict(mat_test)
roc_auc_score(y_test, pred2)
This also returns the same best model at iteration 12, but the prediction turns out different than the XGBClassifier method because pred2 actually used the 17th iteration. I dug through the docs and found this about the early_stopping_rounds argument:
The method returns the model from the last iteration (not
the best one). Use custom callback or model slicing if the best model is desired.
I haven't been able to find a lot of resources on this topic, so I'm here to ask for some help so that I can generate predictions using the model iteration with the highest AUC value. Appreciate it!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论