为什么带有cuda10.1的Pytorch 1.7无法兼容Nvidia A100 Ampere架构(根据PTX兼容性原则)
根据 Nvidia 官方文档,如果 CUDA 应用程序构建为包括 PTX,因为 PTX 是向前兼容的,这意味着支持 PTX 在计算能力高于生成该 PTX 时假定的计算能力的任何 GPU 上运行。 所以我尝试查找torch-1.7.0+cu101是否使用PTX编译为二进制文件,事实似乎是pytorch实际上使用nvcc编译标志“-gencode=arch=compute_xx,code=sm_xx”pytorch CMakeLists.txt。我认为这个标志意味着编译 pytorch 后,编译的产品包含 PTX。 但是,当我尝试在A100中使用pytorch1.7和cuda10.1时,总是出现错误。
>>> import torch
>>> torch.zeros(1).cuda()
/data/miniconda3/lib/python3.7/site-packages/torch/cuda/__init__.py:104: UserWarning:
A100-SXM4-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75.
If you want to use the A100-SXM4-40GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/data/miniconda3/lib/python3.7/site-packages/torch/tensor.py", line 179, in __repr__
return torch._tensor_str._str(self)
File "/data/miniconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 372, in _str
return _str_intern(self)
File "/data/miniconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 352, in _str_intern
tensor_str = _tensor_str(self, indent)
File "/data/miniconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 241, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/data/miniconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 89, in __init__
nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: CUDA error: no kernel image is available for execution on the device
所以,我很想知道,为什么“PTX兼容性原则”不适用于pytorch。 还有其他答案只告诉使用cuda11或更高版本,我知道它有效。但他们没有告诉我真正的原因——为什么cuda10.1的pytorch不适用于A100。 我尝试使用工具包中的 cuda10.1 示例,这些小型演示应用程序实际上可以工作。
[Matrix Multiply Using CUDA] - Starting...
MapSMtoCores for SM 8.0 is undefined. Default to use 64 Cores/SM
GPU Device 0: "A100-SXM4-40GB" with compute capability 8.0
MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 4286.91 GFlop/s, Time= 0.031 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS
NOTE: The CUDA Samples are not meant for performancemeasurements. Results may vary when GPU Boost is enabled.
如果有人可以帮助我回答我将非常感激
According to Nvidia official documentation, if CUDA appliation is built to include PTX, because the PTX is forward-compatible, Meaning PTX is supported to run on any GPU with compute capability higher than the compute capability assumed for generation of that PTX.
so I try to find whether torch-1.7.0+cu101 is compiled to binary with PTX, and the fact seem like that pytorch actually compiled with nvcc compile flag "-gencode=arch=compute_xx,code=sm_xx" pytorch CMakeLists.txt.I think this flag means after compiling pytorch , the compiled product contains the PTX.
However, when I try to use pytorch1.7 with cuda10.1 in A100,there is always error.
>>> import torch
>>> torch.zeros(1).cuda()
/data/miniconda3/lib/python3.7/site-packages/torch/cuda/__init__.py:104: UserWarning:
A100-SXM4-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75.
If you want to use the A100-SXM4-40GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/data/miniconda3/lib/python3.7/site-packages/torch/tensor.py", line 179, in __repr__
return torch._tensor_str._str(self)
File "/data/miniconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 372, in _str
return _str_intern(self)
File "/data/miniconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 352, in _str_intern
tensor_str = _tensor_str(self, indent)
File "/data/miniconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 241, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/data/miniconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 89, in __init__
nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: CUDA error: no kernel image is available for execution on the device
so ,i really want to know,why "PTX compatibilty pricinple" does not apply to pytorch.
there are other answers which only tell to use cuda11 or higher ,and i know it works.But they don't tell me the real reason -- why pytorch for cuda10.1 does not work for A100.
I try use cuda10.1 samples in toolkit, and these small demo applications acctually work.
[Matrix Multiply Using CUDA] - Starting...
MapSMtoCores for SM 8.0 is undefined. Default to use 64 Cores/SM
GPU Device 0: "A100-SXM4-40GB" with compute capability 8.0
MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 4286.91 GFlop/s, Time= 0.031 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS
NOTE: The CUDA Samples are not meant for performancemeasurements. Results may vary when GPU Boost is enabled.
If anyone could help me with an answer I would be very grateful
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在 @talonmies 的提醒之后,我也在 discuss.pytorch.org。
答案是因为pytorch1.7使用了cuDNN7,与A100不兼容。 Nvidia Ampere 架构不支持 CuDNN7.6.5。 Ampere 支持的唯一 cuDNN 版本是 cuDNN 8 或更高版本。
After @talonmies' reminder, I also posted the same question in discuss.pytorch.org.
The answer is because pytorch1.7 uses cuDNN7, which is not compatible with the A100. CuDNN7.6.5 is not supported by the Nvidia Ampere architecture. The only version of cuDNN supported by Ampere is cuDNN 8 or higher.