为什么 xgboost 在我更好的计算机上明显变慢?
对于我的功能更强大的计算机,xgboost 的执行速度非常慢,尤其是 xgboost。
较弱的计算机规格:
- Windows 10、python 3.9.7 (jupyter)、pandas 1.3.5、sklearn 1.0.2、xgboost 1.5.1
- 16GB RAM、Intel i7-10870H
强大的计算机规格:
- Ubuntu、python 3.9.5、pandas 1.4。 0、sklearn 1.0.2、xgboost 1.5.2
- 32GB RAM,AMD Ryzen 5900
以下代码在我的强大计算机上花费了 2.7 分钟,而在我的较弱计算机上花费了 1.1 分钟。当使用 sklearn 的利用多处理的交叉验证类时,性能差异甚至更可怕(大约慢 30 倍)。
import pandas as pd
from sklearn.datasets import load_iris
import xgboost as xgb
import datetime
iris = load_iris()
x = iris.data
y = iris.target
start = datetime.datetime.now()
print(start)
for i in range(1000):
mdl2 = xgb.XGBClassifier(learning_rate=.15
, max_depth=9
, min_child_weight=1
, min_split_loss=.1
, colsample_bytree=.5
, scale_pos_weight=.46)
mdl2.fit(x, y)
finished = ((datetime.datetime.now() - start).seconds/60)
print(finished)
更新 #1:我怀疑与 Intel 相比,AMD CPU 与 xgboost 的兼容性不太好。我正在购买英特尔的 i9 12900,并将在大约 1 周内更新
更新#2:虽然从 AMD Ryzen 9 5900 切换到英特尔的 i9 12900(2.7 分钟到 1.8 分钟)性能提高了很多,但我仍然遇到计算机较弱的问题表现出色(1.1 分钟)。也许 xgboost 对于 Linux 和/或现代高端 CPU 来说不太好。我会再次指出,这只是 xgboost (不是 sklearn 模型)。我还注意到,训练期间 CPU 温度出乎意料地冷(~40C),因此系统显然没有发挥 CPU 的潜力。
更新 #3:我在功能强大的计算机上安装了 Windows 10 和 anaconda,并运行上述代码,并在 0.86 分钟内完成。显然 xgboost 针对 Windows 10 进行了优化,而不是针对 Ubuntu
xgboost is performing very slowly for my more powerful computer for xgboost in particular.
Weaker computer specs:
- Windows 10, python 3.9.7 (jupyter), pandas 1.3.5, sklearn 1.0.2, xgboost 1.5.1
- 16GB RAM, Intel i7-10870H
Powerful computer specs:
- Ubuntu, python 3.9.5, pandas 1.4.0, sklearn 1.0.2, xgboost 1.5.2
- 32GB RAM, AMD Ryzen 5900
The following code took 2.7 minutes for my powerful computer vs. 1.1 minute for my weaker computer. The performance difference is even more terrible (~30x slower) when using sklearn's cross validation classes that utilize multiprocessing.
import pandas as pd
from sklearn.datasets import load_iris
import xgboost as xgb
import datetime
iris = load_iris()
x = iris.data
y = iris.target
start = datetime.datetime.now()
print(start)
for i in range(1000):
mdl2 = xgb.XGBClassifier(learning_rate=.15
, max_depth=9
, min_child_weight=1
, min_split_loss=.1
, colsample_bytree=.5
, scale_pos_weight=.46)
mdl2.fit(x, y)
finished = ((datetime.datetime.now() - start).seconds/60)
print(finished)
Update #1: I suspect the AMD CPU is not so compatible with xgboost compared to Intel. I am buying Intel's i9 12900 and will update in ~1 week
Update #2: While performance improved a lot switching from AMD Ryzen 9 5900 to Intel's i9 12900 (2.7 minutes to 1.8 minutes), I am still having the issues where the weaker computer is outperforming (1.1 minutes). Maybe xgboost is not so good for linux, and/or the modern high-end CPUs. I'll note again this is only xgboost (not sklearn models). I also noted that the cpu temperature is unexpectedly cold during training (~40C), so the system is obviously not pushing the CPU's potential.
Update #3: I installed Windows 10 and anaconda on the powerful computer and ran the above code and it completed in 0.86 minutes. So apparently xgboost is optimized for Windows 10 over Ubuntu
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
确保您在两个系统上使用相同的库;我建议使用
pip freeze
。只需执行 pip freeze -l > 即可requirements.txt 在 PC 上复制的执行时间较短。然后在目标计算机上创建虚拟环境并执行
pip installrequirements.txt
。这样,您将一次性安装所有相同的软件包。然而,解释性能差异的一种猜测是:该算法可能需要高频单核,而不是频率低得多的 AMD 核心。因此,这可能是 AMD 设备性能较低的原因。
更新:
我在两个系统上遇到了同样的问题:我使用了
sudo dmidecode -tprocessor | grep "Speed"
获取 CPU 速度MAX 可能显示超频时的最大速度)
PC1:
PC2:
为了公平地进行比较,我使用 python
3.9.7
创建了一个新的 Conda 环境,并在两者上安装了以下版本:其他点:所有 CPU根据
htop
,两项测试的核心利用率均超过 90%最后:
与
python testXGboost.py
PC1:
3.7 ~ 4.7
sPC2:
0.93 ~ 1.04
s与
CUDA_VISIBLE_DEVICES=-1 python testXGboost.py
:PC1:
~ 13.5
PC2:
4.7
我认为原因之一是在 XGBoost 性能中,每核心时钟比核心数量更重要。第二个事实是,XGboost 可能会在某种程度上自动使用 GPU。因此,这两者(可能)解释了为什么两台电脑的性能不同。
to make sure you are using the same libraries on both systems; I suggest using
pip freeze
. Just dopip freeze -l > requirements.txt
on PC with a lower execution time to copy.Then on the target computer create a virtual environment and do
pip install requirements.txt
. In this way, you will install the same packages all at once.However, one guess to explain the performance difference is that; the algorithm might need high-frequency single cores, rather than much lower frequency AMD cores. So, it might be the reason for lower performance on AMD devices.
UPDATE:
I experienced the same issue on two systems that I have: I used
sudo dmidecode -t processor | grep "Speed"
to get CPU speedMAX possible shows the maximum speed when overclocked)
PC1:
PC2:
to be fair about the comparison I made a new Conda environment with python
3.9.7
and installed the following versions on both:Other points: all CPU cores for both tests were 90%+, according to
htop
Finally:
with
python testXGboost.py
PC1:
3.7 ~ 4.7
sPC2:
0.93 ~ 1.04
swith
CUDA_VISIBLE_DEVICES=-1 python testXGboost.py
:PC1:
~ 13.5
PC2:
4.7
I think one reason is that the per-core clock is more important in XGBoost performance than core count. The second is that it turns out XGboost might automatically use GPU to some extent. So, these two explain (probably) why the performance is different on your two PCs.