为什么带有cuda10.1的Pytorch 1.7无法兼容Nvidia A100 Ampere架构（根据PTX兼容性原则）

发布于 2025-01-11 18:31:06 字数 2778 浏览 0 评论 0原文

根据 Nvidia 官方文档，如果 CUDA 应用程序构建为包括 PTX，因为 PTX 是向前兼容的，这意味着支持 PTX 在计算能力高于生成该 PTX 时假定的计算能力的任何 GPU 上运行。所以我尝试查找torch-1.7.0+cu101是否使用PTX编译为二进制文件，事实似乎是pytorch实际上使用nvcc编译标志“-gencode=arch=compute_xx,code=sm_xx”pytorch CMakeLists.txt。我认为这个标志意味着编译 pytorch 后，编译的产品包含 PTX。但是，当我尝试在A100中使用pytorch1.7和cuda10.1时，总是出现错误。

>>> import torch
>>> torch.zeros(1).cuda()
/data/miniconda3/lib/python3.7/site-packages/torch/cuda/__init__.py:104: UserWarning: 
A100-SXM4-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75.
If you want to use the A100-SXM4-40GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/data/miniconda3/lib/python3.7/site-packages/torch/tensor.py", line 179, in __repr__
  return torch._tensor_str._str(self)
File "/data/miniconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 372, in _str
return _str_intern(self)
File "/data/miniconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 352, in _str_intern
  tensor_str = _tensor_str(self, indent)
File "/data/miniconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 241, in _tensor_str
  formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/data/miniconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 89, in __init__
  nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: CUDA error: no kernel image is available for execution on the device

所以，我很想知道，为什么“PTX兼容性原则”不适用于pytorch。还有其他答案只告诉使用cuda11或更高版本，我知道它有效。但他们没有告诉我真正的原因——为什么cuda10.1的pytorch不适用于A100。我尝试使用工具包中的 cuda10.1 示例，这些小型演示应用程序实际上可以工作。

[Matrix Multiply Using CUDA] - Starting...
MapSMtoCores for SM 8.0 is undefined.  Default to use 64 Cores/SM
GPU Device 0: "A100-SXM4-40GB" with compute capability 8.0

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 4286.91 GFlop/s, Time= 0.031 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performancemeasurements. Results may vary when GPU Boost is enabled.

如果有人可以帮助我回答我将非常感激

原文

According to Nvidia official documentation, if CUDA appliation is built to include PTX, because the PTX is forward-compatible, Meaning PTX is supported to run on any GPU with compute capability higher than the compute capability assumed for generation of that PTX.
so I try to find whether torch-1.7.0+cu101 is compiled to binary with PTX， and the fact seem like that pytorch actually compiled with nvcc compile flag "-gencode=arch=compute_xx,code=sm_xx" pytorch CMakeLists.txt.I think this flag means after compiling pytorch , the compiled product contains the PTX.
However, when I try to use pytorch1.7 with cuda10.1 in A100，there is always error.

>>> import torch
>>> torch.zeros(1).cuda()
/data/miniconda3/lib/python3.7/site-packages/torch/cuda/__init__.py:104: UserWarning: 
A100-SXM4-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75.
If you want to use the A100-SXM4-40GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/data/miniconda3/lib/python3.7/site-packages/torch/tensor.py", line 179, in __repr__
  return torch._tensor_str._str(self)
File "/data/miniconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 372, in _str
return _str_intern(self)
File "/data/miniconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 352, in _str_intern
  tensor_str = _tensor_str(self, indent)
File "/data/miniconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 241, in _tensor_str
  formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/data/miniconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 89, in __init__
  nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: CUDA error: no kernel image is available for execution on the device

so ,i really want to know，why "PTX compatibilty pricinple" does not apply to pytorch.
there are other answers which only tell to use cuda11 or higher ，and i know it works.But they don't tell me the real reason -- why pytorch for cuda10.1 does not work for A100.
I try use cuda10.1 samples in toolkit, and these small demo applications acctually work.

[Matrix Multiply Using CUDA] - Starting...
MapSMtoCores for SM 8.0 is undefined.  Default to use 64 Cores/SM
GPU Device 0: "A100-SXM4-40GB" with compute capability 8.0

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 4286.91 GFlop/s, Time= 0.031 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performancemeasurements. Results may vary when GPU Boost is enabled.

If anyone could help me with an answer I would be very grateful

分享到QQ

分享到微博