如何从 xgboost.train 检索最佳模型

发布于 2025-01-11 06:55:24 字数 1586 浏览 0 评论 0原文

我正在学习如何使用 XGBClassifier 生成预测,并且我发现 xgboost.train 是 XGBClassifier 在幕后调用的内容。我想第一个问题是:是否有任何理由偏爱一种方式而不是另一种方式,或者它们根本不相等?

我设置了这段代码,它在迭代 12 时为我提供了最佳模型:

m1 = xgb.XGBClassifier(max_depth = 5,
                       n_estimators = 20,
                       objective = 'binary:logistic',
                       use_label_encoder = False,
                       eval_metric = 'auc',
                       random_state = 1234)

m1.fit(x_train, y_train,
       eval_set = [(x_test, y_test)],
       eval_metric = 'auc',
       early_stopping_rounds = 5)

pred1 = m1.predict_proba(x_test)[:,1]
roc_auc_score(y_test, pred1)

我还没有调整参数,因为我只是想确保代码运行。然后我设置了下面的代码,希望获得与上面的行为相同的行为:

train_params = {'objective': 'binary:logistic',
                'max_depth': 5,
                'eval_metric':'auc',
                'random_state':1234}

mat_train = xgb.DMatrix(data = x_train, label = y_train)
mat_test = xgb.DMatrix(data = x_test, label = y_test)

evals_result = {}
m2 = xgb.train(params = train_params,
               dtrain = mat_train,
               num_boost_round = 20,
               early_stopping_rounds = 5,
               evals = [(mat_test, 'eval')],
               evals_result = evals_result)

pred2 = m2.predict(mat_test)
roc_auc_score(y_test, pred2)

这也在迭代 12 处返回相同的最佳模型,但预测结果与 XGBClassifier 方法不同,因为 pred2 实际上使用了第 17 次迭代。我仔细研究了文档,发现了关于early_stopping_rounds 参数的内容:

该方法返回上次迭代的模型(不是 最好的一个)。如果需要最佳模型,请使用自定义回调或模型切片。

我无法找到有关该主题的大量资源,因此我在这里寻求一些帮助,以便我可以使用具有最高 AUC 值的模型迭代生成预测。欣赏它!

I'm learning how to use XGBClassifier to generate predictions, and I found out that xgboost.train is what XGBClassifier calls under the hood. I guess the first question is: is there any reason to favor one way over another, or are they not equivalent at all?

I had this code set up that gave me the best model at iteration 12:

m1 = xgb.XGBClassifier(max_depth = 5,
                       n_estimators = 20,
                       objective = 'binary:logistic',
                       use_label_encoder = False,
                       eval_metric = 'auc',
                       random_state = 1234)

m1.fit(x_train, y_train,
       eval_set = [(x_test, y_test)],
       eval_metric = 'auc',
       early_stopping_rounds = 5)

pred1 = m1.predict_proba(x_test)[:,1]
roc_auc_score(y_test, pred1)

I haven't tuned the parameters yet as I just wanted to make sure the code runs. Then I had the code below set up, hoping to get the same behavior as the one above:

train_params = {'objective': 'binary:logistic',
                'max_depth': 5,
                'eval_metric':'auc',
                'random_state':1234}

mat_train = xgb.DMatrix(data = x_train, label = y_train)
mat_test = xgb.DMatrix(data = x_test, label = y_test)

evals_result = {}
m2 = xgb.train(params = train_params,
               dtrain = mat_train,
               num_boost_round = 20,
               early_stopping_rounds = 5,
               evals = [(mat_test, 'eval')],
               evals_result = evals_result)

pred2 = m2.predict(mat_test)
roc_auc_score(y_test, pred2)

This also returns the same best model at iteration 12, but the prediction turns out different than the XGBClassifier method because pred2 actually used the 17th iteration. I dug through the docs and found this about the early_stopping_rounds argument:

The method returns the model from the last iteration (not
the best one). Use custom callback or model slicing if the best model is desired.

I haven't been able to find a lot of resources on this topic, so I'm here to ask for some help so that I can generate predictions using the model iteration with the highest AUC value. Appreciate it!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文