为什么 xgboost 在我更好的计算机上明显变慢？

发布于 2025-01-10 17:52:54 字数 1489 浏览 0 评论 0原文

对于我的功能更强大的计算机，xgboost 的执行速度非常慢，尤其是 xgboost。

较弱的计算机规格：

Windows 10、python 3.9.7 (jupyter)、pandas 1.3.5、sklearn 1.0.2、xgboost 1.5.1
16GB RAM、Intel i7-10870H

强大的计算机规格：

Ubuntu、python 3.9.5、pandas 1.4。 0、sklearn 1.0.2、xgboost 1.5.2
32GB RAM，AMD Ryzen 5900

以下代码在我的强大计算机上花费了 2.7 分钟，而在我的较弱计算机上花费了 1.1 分钟。当使用 sklearn 的利用多处理的交叉验证类时，性能差异甚至更可怕（大约慢 30 倍）。

import pandas as pd
from sklearn.datasets import load_iris
import xgboost as xgb
import datetime

iris = load_iris()

x = iris.data
y = iris.target

start = datetime.datetime.now()
print(start)
for i in range(1000):
    mdl2 = xgb.XGBClassifier(learning_rate=.15
                             , max_depth=9
                             , min_child_weight=1
                             , min_split_loss=.1
                             , colsample_bytree=.5
                             , scale_pos_weight=.46)
     
    mdl2.fit(x, y)

finished = ((datetime.datetime.now() - start).seconds/60)
print(finished)

更新 #1：我怀疑与 Intel 相比，AMD CPU 与 xgboost 的兼容性不太好。我正在购买英特尔的 i9 12900，并将在大约 1 周内更新

更新#2：虽然从 AMD Ryzen 9 5900 切换到英特尔的 i9 12900（2.7 分钟到 1.8 分钟）性能提高了很多，但我仍然遇到计算机较弱的问题表现出色（1.1 分钟）。也许 xgboost 对于 Linux 和/或现代高端 CPU 来说不太好。我会再次指出，这只是 xgboost （不是 sklearn 模型）。我还注意到，训练期间 CPU 温度出乎意料地冷（~40C），因此系统显然没有发挥 CPU 的潜力。

更新 #3：我在功能强大的计算机上安装了 Windows 10 和 anaconda，并运行上述代码，并在 0.86 分钟内完成。显然 xgboost 针对 Windows 10 进行了优化，而不是针对 Ubuntu

原文

xgboost is performing very slowly for my more powerful computer for xgboost in particular.

Weaker computer specs:

Windows 10, python 3.9.7 (jupyter), pandas 1.3.5, sklearn 1.0.2, xgboost 1.5.1
16GB RAM, Intel i7-10870H

Powerful computer specs:

Ubuntu, python 3.9.5, pandas 1.4.0, sklearn 1.0.2, xgboost 1.5.2
32GB RAM, AMD Ryzen 5900

The following code took 2.7 minutes for my powerful computer vs. 1.1 minute for my weaker computer. The performance difference is even more terrible (~30x slower) when using sklearn's cross validation classes that utilize multiprocessing.

import pandas as pd
from sklearn.datasets import load_iris
import xgboost as xgb
import datetime

iris = load_iris()

x = iris.data
y = iris.target

start = datetime.datetime.now()
print(start)
for i in range(1000):
    mdl2 = xgb.XGBClassifier(learning_rate=.15
                             , max_depth=9
                             , min_child_weight=1
                             , min_split_loss=.1
                             , colsample_bytree=.5
                             , scale_pos_weight=.46)
     
    mdl2.fit(x, y)

finished = ((datetime.datetime.now() - start).seconds/60)
print(finished)

Update #1: I suspect the AMD CPU is not so compatible with xgboost compared to Intel. I am buying Intel's i9 12900 and will update in ~1 week

Update #2: While performance improved a lot switching from AMD Ryzen 9 5900 to Intel's i9 12900 (2.7 minutes to 1.8 minutes), I am still having the issues where the weaker computer is outperforming (1.1 minutes). Maybe xgboost is not so good for linux, and/or the modern high-end CPUs. I'll note again this is only xgboost (not sklearn models). I also noted that the cpu temperature is unexpectedly cold during training (~40C), so the system is obviously not pushing the CPU's potential.

Update #3: I installed Windows 10 and anaconda on the powerful computer and ran the above code and it completed in 0.86 minutes. So apparently xgboost is optimized for Windows 10 over Ubuntu

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

甜扑 2025-01-17 17:52:54

确保您在两个系统上使用相同的库；我建议使用pip freeze。只需执行 pip freeze -l > 即可requirements.txt 在 PC 上复制的执行时间较短。

然后在目标计算机上创建虚拟环境并执行pip installrequirements.txt。这样，您将一次性安装所有相同的软件包。

然而，解释性能差异的一种猜测是：该算法可能需要高频单核，而不是频率低得多的 AMD 核心。因此，这可能是 AMD 设备性能较低的原因。

更新：

我在两个系统上遇到了同样的问题：我使用了sudo dmidecode -tprocessor | grep "Speed" 获取 CPU 速度
MAX 可能显示超频时的最大速度）

PC1:

Ubuntu 20.04 server
377GB RAM, 
2CPUs: 96cores in sum with  Max Speed: 4500 MHz Current Speed: 2200 MGHz
lscpu model name: `Intel(R) Xeon(R) Gold 5220R CPU @ 2.20GHz`
nvcc -V: 11.0

PC2:

Ubuntu 20.04
31GB RAM, 
12 cores    Max Speed: 8300 MHz Current Speed: 3200 MHz
lscpu: model name: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
nvcc -V: 10.1

为了公平地进行比较，我使用 python 3.9.7 创建了一个新的 Conda 环境，并在两者上安装了以下版本：

certifi==2021.10.8
DateTime==4.4
joblib==1.1.0
numpy==1.22.2
pandas==1.4.1
python-dateutil==2.8.2
pytz==2021.3
scikit-learn==1.0.2
scipy==1.8.0
six==1.16.0
sklearn==0.0
threadpoolctl==3.1.0
xgboost==1.5.2
zope.interface==5.4.0

其他点：所有 CPU根据 htop，两项测试的核心利用率均超过 90%

最后：
与python testXGboost.py

PC1：3.7 ~ 4.7 s

PC2：0.93 ~ 1.04 s

与CUDA_VISIBLE_DEVICES=-1 python testXGboost.py:

PC1: ~ 13.5

PC2: 4.7

我认为原因之一是在 XGBoost 性能中，每核心时钟比核心数量更重要。第二个事实是，XGboost 可能会在某种程度上自动使用 GPU。因此，这两者（可能）解释了为什么两台电脑的性能不同。

to make sure you are using the same libraries on both systems; I suggest using pip freeze. Just do pip freeze -l > requirements.txt on PC with a lower execution time to copy.

Then on the target computer create a virtual environment and do pip install requirements.txt. In this way, you will install the same packages all at once.

However, one guess to explain the performance difference is that; the algorithm might need high-frequency single cores, rather than much lower frequency AMD cores. So, it might be the reason for lower performance on AMD devices.

UPDATE:

I experienced the same issue on two systems that I have: I used sudo dmidecode -t processor | grep "Speed" to get CPU speed
MAX possible shows the maximum speed when overclocked)

PC1:

Ubuntu 20.04 server
377GB RAM, 
2CPUs: 96cores in sum with  Max Speed: 4500 MHz Current Speed: 2200 MGHz
lscpu model name: `Intel(R) Xeon(R) Gold 5220R CPU @ 2.20GHz`
nvcc -V: 11.0

PC2:

Ubuntu 20.04
31GB RAM, 
12 cores    Max Speed: 8300 MHz Current Speed: 3200 MHz
lscpu: model name: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
nvcc -V: 10.1

to be fair about the comparison I made a new Conda environment with python 3.9.7 and installed the following versions on both:

certifi==2021.10.8
DateTime==4.4
joblib==1.1.0
numpy==1.22.2
pandas==1.4.1
python-dateutil==2.8.2
pytz==2021.3
scikit-learn==1.0.2
scipy==1.8.0
six==1.16.0
sklearn==0.0
threadpoolctl==3.1.0
xgboost==1.5.2
zope.interface==5.4.0

Other points: all CPU cores for both tests were 90%+, according to htop

Finally:
with python testXGboost.py

PC1: 3.7 ~ 4.7 s

PC2: 0.93 ~ 1.04 s

with CUDA_VISIBLE_DEVICES=-1 python testXGboost.py:

PC1: ~ 13.5

PC2: 4.7

I think one reason is that the per-core clock is more important in XGBoost performance than core count. The second is that it turns out XGboost might automatically use GPU to some extent. So, these two explain (probably) why the performance is different on your two PCs.

回复收藏 0 原文

~没有更多了~