使用与我的NVIDIA群集使用不同的cudatoolkit时,如何修复软件包依赖关系?
我正在使用一个包装的软件包,该软件包需要tensorflow-gpu == 2.0.0,而cuda = 10.0.0 with cudann == 7.6.0
我在nvidia gpu cluster上运行此代码,当我运行nvidia-smi时,它显示 this 。它仍然显示CUDA 11,我猜这是实际服务器上安装的一个。
有人告诉我,我可以通过在我需要的版本中安装cudatoolkit来基本上“覆盖”此版本。我这样做了,并安装了cudatoolkit == 10.0。
不幸的是,当尝试使用TensorFlow-GPU运行基于LSTM的模型时,我现在遇到了一个问题。我得到以下内容:
2022-06-14 17:02:26.988359: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2022-06-14 17:02:26.989175: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2022-06-14 17:02:26.989208: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
在路径中,我仍然看到CUDA11。这会导致问题吗?我该如何解决?
I am using a package that requires tensorflow-gpu == 2.0.0 and CUDA=10.0.0 with cudann==7.6.0
I am running this code on a NVIDIA gpu cluster and when I run nvidia-smi it shows
this. It still shows cuda 11, which I guess is the one installed on the actually server.
I was told that I can basically 'override' this version by installing the cudatoolkit in the version that I need. I did that and installed cudatoolkit==10.0.
Unfortunately I am now running into a problem when trying to run an LSTM based model with tensorflow-gpu. I get the following:
2022-06-14 17:02:26.988359: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2022-06-14 17:02:26.989175: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2022-06-14 17:02:26.989208: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
In the path I still see cuda11. Is this causing the problem? How can I resolve this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如您在注释中提到的,您需要使用
Tensorflow 2.1
,然后需要安装cudnn 7.6
和cuda 10.1 10.1
。请按照以下经过测试的构建配置和

TensorFlow
版本兼容cuda
和cudnn
。请检查此 link 有关GPU设置的更多详细信息。
As you mentioned in the comment you need to use
TensorFlow 2.1
, then you need to installcuDNN 7.6
andCUDA 10.1
specifically.Please follow the below tested build configurations to know about

Python
andTensorFlow
versions compatibleCUDA
andcuDNN
.Please check this link for more details on GPU setup.