使用与我的NVIDIA群集使用不同的cudatoolkit时,如何修复软件包依赖关系?
我正在使用一个包装的软件包,该软件包需要tensorflow-gpu == 2.0.0,而cuda = 10.0.0 with cudann == 7.6.0
我在nvidia gpu cluster上运行此代码,当我运行nvidia-smi时,它显示 this 。它仍然显示CUDA 11,我猜这是实际服务器上安装的一个。
有人告诉我,我可以通过在我需要的版本中安装cudatoolkit来基本上“覆盖”此版本。我这样做了,并安装了cudatoolkit == 10.0。
不幸的是,当尝试使用TensorFlow-GPU运行基于LSTM的模型时,我现在遇到了一个问题。我得到以下内容:
2022-06-14 17:02:26.988359: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2022-06-14 17:02:26.989175: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2022-06-14 17:02:26.989208: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
在路径中,我仍然看到CUDA11。这会导致问题吗?我该如何解决?
I am using a package that requires tensorflow-gpu == 2.0.0 and CUDA=10.0.0 with cudann==7.6.0
I am running this code on a NVIDIA gpu cluster and when I run nvidia-smi it shows
this. It still shows cuda 11, which I guess is the one installed on the actually server.
I was told that I can basically 'override' this version by installing the cudatoolkit in the version that I need. I did that and installed cudatoolkit==10.0.
Unfortunately I am now running into a problem when trying to run an LSTM based model with tensorflow-gpu. I get the following:
2022-06-14 17:02:26.988359: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2022-06-14 17:02:26.989175: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2022-06-14 17:02:26.989208: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
In the path I still see cuda11. Is this causing the problem? How can I resolve this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如您在注释中提到的,您需要使用
Tensorflow 2.1
,然后需要安装cudnn 7.6
和cuda 10.1 10.1
。请按照以下经过测试的构建配置和
data:image/s3,"s3://crabby-images/bb178/bb1781358026647c2ce4c7c213176f6b223c1c51" alt=""
TensorFlow
版本兼容cuda
和cudnn
。请检查此 link 有关GPU设置的更多详细信息。
As you mentioned in the comment you need to use
TensorFlow 2.1
, then you need to installcuDNN 7.6
andCUDA 10.1
specifically.Please follow the below tested build configurations to know about
data:image/s3,"s3://crabby-images/750a0/750a0c6471c4e212c8c56f67694ba853fee50070" alt="enter image description here"
Python
andTensorFlow
versions compatibleCUDA
andcuDNN
.Please check this link for more details on GPU setup.