以 bin 格式保存后,xgboost 不提供可重复的预测
我遇到一个问题,即 xgboost 将其保存到二进制文件后无法产生可重现的结果。
R版本:4.1.2
XGBoost 版本 1.5.2.1
方法如下(逻辑回归,gbtree):
bst <- xgboost(
params = best.params
, data = dtrain
, nrounds = nrounds
, early_stopping_rounds = early_stopping_rounds
, nthread = nthread
, num_parallel_tree = num_parallel_tree
, eval_metric = eval_metric
, verbose = 2
, print_every_n = 1
)
min(predict(bst, dtest))
max(predict(bst, dtest))
xgb.save(bst, savefilemodelloc)
这会产生:
分钟 = 0.17932555079
max = 0.78802382946
现在我读回此产生的垃圾箱
remove(bst)
bst <- xgb.load(savefilemodelloc)
min(predict(bst, dtest))
max(predict(bst, dtest))
:
分钟 = 0.49377295375
max = 0.50564271212
这是在完全相同的数据集上运行,并且没有产生接近相同的结果。我曾多次尝试重建模型,结果几乎相同。
模型大小约为17GB。
我的操作系统是 RHEL 7
有谁知道这里发生了什么?
2022 年 8 月 3 日更新
我发现,如果我手动将参数加载回模型,它就会起作用。
例如
remove(bst)
bst <- xgb.load(savefilemodelloc)
xgb.parameters(bst) <- best.params
min(predict(bst, dtest))
max(predict(bst, dtest))
现在生成:
分钟 = 0.17932555079
max = 0.78802382946
我不确定这是否是预期行为
I am having an issue where xgboost is not producing reproducible results after saving it to binary file.
R version: 4.1.2
XGBoost version 1.5.2.1
The methodology is as follows (logistic-regression, gbtree):
bst <- xgboost(
params = best.params
, data = dtrain
, nrounds = nrounds
, early_stopping_rounds = early_stopping_rounds
, nthread = nthread
, num_parallel_tree = num_parallel_tree
, eval_metric = eval_metric
, verbose = 2
, print_every_n = 1
)
min(predict(bst, dtest))
max(predict(bst, dtest))
xgb.save(bst, savefilemodelloc)
this produces:
min = 0.17932555079
max = 0.78802382946
now I read the bin back in
remove(bst)
bst <- xgb.load(savefilemodelloc)
min(predict(bst, dtest))
max(predict(bst, dtest))
this produces:
min = 0.49377295375
max = 0.50564271212
this is being run on the exact same data set, and is producing no where near the same results. I have tried rebuilding the model several times with nearly identical results.
The model size is about 17GB.
My OS is RHEL 7
Does anyone know what is going on here?
Update 3.8.2022
I have discovered that if I load my parameters back into the model manually it works.
for example
remove(bst)
bst <- xgb.load(savefilemodelloc)
xgb.parameters(bst) <- best.params
min(predict(bst, dtest))
max(predict(bst, dtest))
this now produces:
min = 0.17932555079
max = 0.78802382946
I am not sure if this is expected behavior
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我以前遇到过这个问题,我通过将模型保存为 .rds 格式来“解决它”。我的预测并不那么极端(保存/导入之前/之后的差异小于 1%),但我认为这与 xgboost 模型对象“丢失”特定参数有关(与“提前停止”有关?)模型是使用 xgb.save() 保存的。
我将模型保存为:
文本
二进制
R 对象 (.rds)
我还将特征名称保存到文件中(在后续过程中非常有用,强烈推荐):
要加载模型以“重复”我使用的预测:
有不仅如此,但希望这能解决您眼前的问题。
I've faced this issue before and I 'solved it' by saving the model in .rds format. My predictions weren't as extreme (<1% different before/after save/import) but I think it was to do with 'losing' specific parameters from the xgboost model object (something to do with 'early stopping'?) when the model was saved using
xgb.save()
.I saved the model as:
Text
Binary
R object (.rds)
And I also saved the feature names to a file (super useful down the track, highly recommend):
To load in the model to 'repeat' the predictions I used:
There was more to it than that, but hopefully this will solve your immediate problem.
根据 Jared 上面所说的提示,我能够解决我的问题。问题似乎是,如果保存 xgboost bin 文件,它不会保留使用的参数。解决方案是重新加载参数。我尝试将模型保存到 json 文件,但每次尝试都会使我的会话崩溃。所以看来 bin 是我唯一的选择。
方法如下(逻辑回归,gbtree):
这会产生:
分钟 = 0.17932555079
max = 0.78802382946
现在读回 bin,
现在会产生:
分钟 = 0.17932555079
最大值 = 0.78802382946
Based on a hint that Jared said above, I was able to resolve my issue. The problem seems to be that if you save an xgboost bin file, it does not keep the parameters used. The solution is to load the parameters back in. I tried saving the model to a json file but it crashed my rsession in each attempt. So it would appear that bin is my only option.
The methodology is as follows (logistic-regression, gbtree):
this produces:
min = 0.17932555079
max = 0.78802382946
now read the bin back in
this now produces:
min = 0.17932555079
max = 0.78802382946