我可以采用 optuna 函数的最佳参数和最佳模型并将该模型直接应用到我的笔记本中吗?

发布于 2025-01-17 11:17:32 字数 3285 浏览 3 评论 0原文

我建立了 optuna 的一个功能来为我的数据找出 GBM 和 xgboost 的最佳模型,但我想知道是否可以采用最佳模型并将其直接应用到我的笔记本中(提取最佳模型作为对象以便稍后重用) 这是我的目标函数:

import lightgbm as lgb 
import optuna
import sklearn.metrics
from xgboost import XGBRegressor
from optuna.integration import XGBoostPruningCallback
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
best_booster = None
gbm = None
def objective(trial,random_state=22,n_jobs=1,early_stopping_rounds=50):
    
    regrosser_name = trial.suggest_categorical("regressor", ["XGBoost", "lightgbm"])
    train_x, valid_x, train_y, valid_y = train_test_split(X_train, y_train, test_size=0.25)
    dtrain = lgb.Dataset(train_x, label=train_y)
    # Step 2. Setup values for the hyperparameters:
    if regrosser_name == 'XGBoost':
        params = {
        "verbosity": 0,  # 0 (silent) - 3 (debug)
        "objective": "reg:squarederror",
        "n_estimators": 10000,
        "max_depth": trial.suggest_int("max_depth", 4, 12),
        "learning_rate": trial.suggest_loguniform("learning_rate", 0.005, 0.05),
        "colsample_bytree": trial.suggest_loguniform("colsample_bytree", 0.2, 0.6),
        "subsample": trial.suggest_loguniform("subsample", 0.4, 0.8),
        "alpha": trial.suggest_loguniform("alpha", 0.01, 10.0),
        "lambda": trial.suggest_loguniform("lambda", 1e-8, 10.0),
        "gamma": trial.suggest_loguniform("lambda", 1e-8, 10.0),
        "min_child_weight": trial.suggest_loguniform("min_child_weight", 10, 1000),
        "seed": random_state,
        "n_jobs": n_jobs,
        }
        model = XGBRegressor(**params)
        model.fit(train_x, train_y)
        y_pred = model.predict(X_val)
        accuracy_rf = sklearn.metrics.mean_absolute_error(valid_y, y_pred)
        return accuracy_rf
    
        print(rf_max_depth)
        print(rf_n_estimators)
        
    else:
        param = {
        "objective": "binary",
        "metric": "binary_logloss",
        "verbosity": -1,
        "boosting_type": "gbdt",
        "lambda_l1": trial.suggest_float("lambda_l1", 1e-8, 10.0, log=True),
        "lambda_l2": trial.suggest_float("lambda_l2", 1e-8, 10.0, log=True),
        "num_leaves": trial.suggest_int("num_leaves", 2, 256),
        "feature_fraction": trial.suggest_float("feature_fraction", 0.4, 1.0),
        "bagging_fraction": trial.suggest_float("bagging_fraction", 0.4, 1.0),
        "bagging_freq": trial.suggest_int("bagging_freq", 1, 7),
        "min_child_samples": trial.suggest_int("min_child_samples", 5, 100),
        }
        gbm = lgb.train(param, dtrain)
        preds_gbm = gbm.predict(valid_x)
        pred_labels_gbm = np.rint(preds_gbm)
        accuracy_gbm = sklearn.metrics.mean_absolute_error(valid_y, pred_labels_gbm)
        return accuracy_gbm

这是我尝试解决这个问题的方法:

def callback(study, trial):
    global best_booster
    if study.best_trial == trial:
        best_booster = gbm
if __name__ == "__main__":
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=100, callbacks=[callback])

我认为它是关于导入一些东西,如果我的 optuna 函数有任何提示,请说明

i esttablished a function of optuna to find out best model of gbm and xgboost for my data but i was wondering if i can take the best model and apply it directly into my notebook(extracting best model as an object to reuse it later)
here is my objective function:

import lightgbm as lgb 
import optuna
import sklearn.metrics
from xgboost import XGBRegressor
from optuna.integration import XGBoostPruningCallback
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
best_booster = None
gbm = None
def objective(trial,random_state=22,n_jobs=1,early_stopping_rounds=50):
    
    regrosser_name = trial.suggest_categorical("regressor", ["XGBoost", "lightgbm"])
    train_x, valid_x, train_y, valid_y = train_test_split(X_train, y_train, test_size=0.25)
    dtrain = lgb.Dataset(train_x, label=train_y)
    # Step 2. Setup values for the hyperparameters:
    if regrosser_name == 'XGBoost':
        params = {
        "verbosity": 0,  # 0 (silent) - 3 (debug)
        "objective": "reg:squarederror",
        "n_estimators": 10000,
        "max_depth": trial.suggest_int("max_depth", 4, 12),
        "learning_rate": trial.suggest_loguniform("learning_rate", 0.005, 0.05),
        "colsample_bytree": trial.suggest_loguniform("colsample_bytree", 0.2, 0.6),
        "subsample": trial.suggest_loguniform("subsample", 0.4, 0.8),
        "alpha": trial.suggest_loguniform("alpha", 0.01, 10.0),
        "lambda": trial.suggest_loguniform("lambda", 1e-8, 10.0),
        "gamma": trial.suggest_loguniform("lambda", 1e-8, 10.0),
        "min_child_weight": trial.suggest_loguniform("min_child_weight", 10, 1000),
        "seed": random_state,
        "n_jobs": n_jobs,
        }
        model = XGBRegressor(**params)
        model.fit(train_x, train_y)
        y_pred = model.predict(X_val)
        accuracy_rf = sklearn.metrics.mean_absolute_error(valid_y, y_pred)
        return accuracy_rf
    
        print(rf_max_depth)
        print(rf_n_estimators)
        
    else:
        param = {
        "objective": "binary",
        "metric": "binary_logloss",
        "verbosity": -1,
        "boosting_type": "gbdt",
        "lambda_l1": trial.suggest_float("lambda_l1", 1e-8, 10.0, log=True),
        "lambda_l2": trial.suggest_float("lambda_l2", 1e-8, 10.0, log=True),
        "num_leaves": trial.suggest_int("num_leaves", 2, 256),
        "feature_fraction": trial.suggest_float("feature_fraction", 0.4, 1.0),
        "bagging_fraction": trial.suggest_float("bagging_fraction", 0.4, 1.0),
        "bagging_freq": trial.suggest_int("bagging_freq", 1, 7),
        "min_child_samples": trial.suggest_int("min_child_samples", 5, 100),
        }
        gbm = lgb.train(param, dtrain)
        preds_gbm = gbm.predict(valid_x)
        pred_labels_gbm = np.rint(preds_gbm)
        accuracy_gbm = sklearn.metrics.mean_absolute_error(valid_y, pred_labels_gbm)
        return accuracy_gbm

and here is how i tried to solve this issue:

def callback(study, trial):
    global best_booster
    if study.best_trial == trial:
        best_booster = gbm
if __name__ == "__main__":
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=100, callbacks=[callback])

i think its about importing somthing, and if there is any tips on my optuna function please state it

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

童话里做英雄 2025-01-24 11:17:32

如果我正确理解您的问题,那么是的,这就是模型的目的。

就像将保存的模型带到笔记本上一样,馈送与您过去训练的结构相同的结构的数据,并且应该达到其目的。或在管道中使用它。

甚至可以使用与NP阵列相同结构的1行。例如,我的模型预测是否应批准贷款。

例如,银行客户希望贷款并提交他的信息。银行官员在系统中输入此信息。该系统将这些信息转换为单个NP阵列,其结构与用于训练模型的数据集相同。

然后,系统将使用该模型来预测是否应批准贷款。

我将Optuna XGB模型保存为JSON,例如

My_model.get_booster()。save_model(f'{savepath} my_model.json')

If I understood your question correctly, then yes, that's what models are for.

Like bring your saved model to your notebook, feed it data that has the same structure as what you used to train it, and it should serve its purpose. Or use it in a pipeline.

Even 1 line of the same structure as an np array can be used. For example, my model predicts whether a loan should be approved or not.

For example, a bank customer wants a loan and submits his information. The bank officer inputs this info in the system. The system transforms this information into a single np array with the same structure as the dataset used to train the model.

The model is then used by the system to predict whether the loan should be approved or not.

I save my optuna xgb models as json, e.g.

my_model.get_booster().save_model(f'{savepath}my_model.json')

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文