Docker 中的 CUDA 版本与 WSL2 后端不匹配

发布于 2025-01-13 15:51:30 字数 1649 浏览 4 评论 0原文

我正在尝试将 docker(适用于 Windows 10 Pro 的 Docker Desktop)与 WSL2 后端(WINDOWS SUBSHELL LINUX (WSL) (Ubuntu 20.04.4 LTS))结合使用。

该部分似乎工作正常,但我想将我的 GPU (Nvidia RTX A5000) 传递到我的 docker 容器。

在我做到这一点之前,我仍在尝试进行设置。 我发现了一个非常好的教程,旨在18.04,但发现所有步骤与 20.04 相同,只是版本号有所不同。

最后,我可以看到我的 Cuda 版本不匹配。您可以在这里看到 在此图片中

真正的问题是当我尝试运行测试命令 如 docker 网站所示

 docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

我收到此错误:

 --> docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380:
starting container process caused: process_linux.go:545: container init caused: Running
hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli:
requirement error: unsatisfied condition: cuda>=11.6, please update your driver to a
newer version, or use an earlier cuda container: unknown.

...我只是不知道该怎么办,或者我该如何解决这个问题。

有人可以解释一下如何让 GPU 成功传递到 docker 容器吗?

I am trying to use docker (Docker Desktop for Windows 10 Pro) with the WSL2 Backend (WINDOWS SUBSHELL LINUX (WSL) (Ubuntu 20.04.4 LTS)).

That part seems to be working fine, except I would like to pass my GPU (Nvidia RTX A5000) through to my docker container.

Before I even get that far, I am still trying to set things up. I found a very good tutorial aimed at 18.04, but found all the steps are the same for 20.04, just with some version numbers bumpede.

At the end, I can see that my Cuda versions do not match. You can see that here, in this image.

The real issue is when I try to run the test command as shown on the docker website:

 docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

I get this error:

 --> docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380:
starting container process caused: process_linux.go:545: container init caused: Running
hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli:
requirement error: unsatisfied condition: cuda>=11.6, please update your driver to a
newer version, or use an earlier cuda container: unknown.

... and I just don't know what to do, or how I can fix this.

Can someone explain how to get the GPU to pass through to a docker container successfully.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

油饼 2025-01-20 15:51:30

当我尝试运行容器时,我在 Ubuntu 上遇到了同样的问题:

s.evloev@some-pc:~$ docker run --gpus all --rm nvidia/cuda:11.7.0-base-ubuntu18.04
docker: Error response from daemon: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.7, please update your driver to a newer version, or use an earlier cuda container: unknown.

就我而言,当我尝试启动 docker 映像时,它会发生,该映像的 nvidia cuda 版本高于我的主机上安装的版本。

当我检查主机上安装的 cuda 版本时,我发现它是版本 11.3。

s.evloev@some-pc:~$ nvidia-smi
Thu Jul 21 15:06:33 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC | 
|...                                                                          |
+-----------------------------------------------------------------------------+

因此,当我尝试运行相同的 cuda 版本(11.3)时,它运行良好:

s.evloev@some-pc:~$ docker run -it --gpus all --rm nvidia/cuda:11.3.0-base-ubuntu18.04 nvidia-smi
Thu Jul 21 12:13:46 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:65:00.0 Off |                  N/A |
|  0%   44C    P8     7W / 180W |   1404MiB /  8110MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

I had the same issue on Ubuntu when I tried to run the container:

s.evloev@some-pc:~$ docker run --gpus all --rm nvidia/cuda:11.7.0-base-ubuntu18.04
docker: Error response from daemon: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.7, please update your driver to a newer version, or use an earlier cuda container: unknown.

In my case it occurred when I tried to launch docker image that have nvidia cuda version which is higher than what was installed on my host.

When I checked my cuda version that was installed on my host I have found that it is version 11.3.

s.evloev@some-pc:~$ nvidia-smi
Thu Jul 21 15:06:33 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC | 
|...                                                                          |
+-----------------------------------------------------------------------------+

So when I try to run the same cuda version (11.3) it works well:

s.evloev@some-pc:~$ docker run -it --gpus all --rm nvidia/cuda:11.3.0-base-ubuntu18.04 nvidia-smi
Thu Jul 21 12:13:46 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:65:00.0 Off |                  N/A |
|  0%   44C    P8     7W / 180W |   1404MiB /  8110MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

回忆躺在深渊里 2025-01-20 15:51:30

来自的评论@RobertCrovella 解决了这个问题:

使用 WSL 时,请将您的驱动程序更新到较新的版本,您的 WSL 设置中的驱动程序不是您在 WSL 中安装的驱动程序,它是由 Windows 端的驱动程序提供的。您的 WSL 驱动程序是 472.84,这对于 CUDA 11.6 来说太旧了(它最多只支持 CUDA 11.4)。因此,如果您想运行 CUDA 11.6 测试用例,则需要将 Windows 端驱动程序更新为适用于您的 GPU 的最新驱动程序。关于CUDA版本的“不匹配”,这提供了用于解释的一般背景材料。

下载最新的 Nvidia 驱动程序:

Version:             R510 U3 (511.79)  WHQL
Release Date:        2022.2.14
Operating System:    Windows 10 64-bit, Windows 11
Language:            English (US)
File Size:           640.19 MB

现在我可以支持 CUDA 11.6,并且 docker 文档中的测试现在可以工作:

--> docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance)
        -numbodies=<N>    (number of bodies (>= 1) to run in simulation)
        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Ampere" with compute capability 8.6

> Compute 8.6 CUDA device: [NVIDIA RTX A5000]
65536 bodies, total time for 10 iterations: 58.655 ms
= 732.246 billion interactions per second
= 14644.916 single-precision GFLOP/s at 20 flops per interaction

感谢您的快速回复!

The comment from @RobertCrovella resolved this:

please update your driver to a newer version when using WSL, the driver in your WSL setup is not something you install in WSL, it is provided by the driver on the windows side. Your WSL driver is 472.84 and this is too old to work with CUDA 11.6 (it only supports up to CUDA 11.4). So you would need to update your windows side driver to the latest one possible for your GPU, if you want to run a CUDA 11.6 test case. Regarding the "mismatch" of CUDA versions, this provides general background material for interpretation.

Downloading the most current Nvidia driver:

Version:             R510 U3 (511.79)  WHQL
Release Date:        2022.2.14
Operating System:    Windows 10 64-bit, Windows 11
Language:            English (US)
File Size:           640.19 MB

Now I am able to support CUDA 11.6, and the test from the docker documentation now works:

--> docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance)
        -numbodies=<N>    (number of bodies (>= 1) to run in simulation)
        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Ampere" with compute capability 8.6

> Compute 8.6 CUDA device: [NVIDIA RTX A5000]
65536 bodies, total time for 10 iterations: 58.655 ms
= 732.246 billion interactions per second
= 14644.916 single-precision GFLOP/s at 20 flops per interaction

Thank you for the quick response!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文