当前位置：文江博客话题详情

cuda SM寄存器限制

发布于 2024-09-26 01:17:27 字数 100 浏览 4 评论 0原文

我知道在一个 SM 上运行的块数量受到块数量、线程、共享内存和寄存器的限制。有没有什么策略可以避免寄存器过多？我的意思是我只是不想拥有太多，最终它限制了我在一个 SM 上运行的块的数量。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

倚栏听风 2024-10-03 01:17:27

使用nvcc -Xptxas -v编译将打印出Edric提到的诊断信息。此外，您可以使用 __launch_bounds__ 限定符强制编译器保留寄存器。例如，

__global__ void
__launch_bounds__(maxThreadsPerBlock, minBlocksPerMultiprocessor)
MyKernel(...)
{ 
   ...
}

保证至少 minBlocksPerMultiprocessor 个大小为 maxThreadsPerBlock 的块适合单个 SM。请参阅 CUDA 编程指南< 的 B.16 节< /a> 的完整解释 __launch_bounds__。

Compiling with nvcc -Xptxas -v will print out the diagnostic information Edric mentioned. Additionally, you can force the compiler to conserve registers using the __launch_bounds__ qualifier. For example

__global__ void
__launch_bounds__(maxThreadsPerBlock, minBlocksPerMultiprocessor)
MyKernel(...)
{ 
   ...
}

guarantees that at least minBlocksPerMultiprocessor blocks of size maxThreadsPerBlock will fit on a single SM. See Section B.16 of the CUDA Programming Guide for a complete explanation of __launch_bounds__.

回复收藏 0 原文