如何诊断由于资源不足而导致 CUDA 启动失败?
我在尝试启动 CUDA 内核(通过 PyCUDA)时遇到资源不足错误,我想知道是否可以让系统告诉我缺少哪个资源。显然系统知道什么资源已经耗尽,我也只想查询一下。
我已经使用了占用率计算器,一切看起来都很好,所以要么有一个极端的情况没有被涵盖,要么我使用错误。我知道这不是寄存器(这似乎是通常的罪魁祸首),因为我使用的是 <= 63,并且在 CC 2.1 设备上使用 1x1x1 块和 1x1 网格时它仍然失败。
感谢您的任何帮助。我在 NVidia 板上发布了一个帖子:
http://forums.nvidia。 com/index.php?showtopic=206261&st=0
但没有得到回应。如果答案是“你不能向系统询问该信息”,那么也很高兴知道(有点......;)。
编辑:
我见过的最多寄存器使用次数是 63。编辑上面的内容以反映这一点。
I'm getting an out-of-resources error when trying to launch a CUDA kernel (through PyCUDA), and I'm wondering if it's possible to get the system to tell me which resource it is that I'm short on. Obviously the system knows what resource has been exhausted, I just want to query that as well.
I've used the occupancy calculator, and everything seems okay, so either there's a corner case not covered, or I'm using it wrong. I know it's not registers (which seems to be the usual culprit) because I'm using <= 63 and it still fails with a 1x1x1 block and 1x1 grid on a CC 2.1 device.
Thanks for any help. I posted a thread on the NVidia boards:
http://forums.nvidia.com/index.php?showtopic=206261&st=0
But got no responses. If the answer is "you can't ask the system for that information" that would be nice to know too (sort of... ;).
Edit:
The most register usage I've seen has been 63. Edited the above to reflect that.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为 PyCUDA 使用 CUDA 驱动程序 API,因此以下可能是错误的:如果您在使用 cuLaunch() 时没有指定足够的参数,或者指定了错误的参数大小,则可能会发生 CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES 。启动内核。由于您使用的是 PyCUDA,因此内核所需的参数列表与您实际传递的参数可能很容易不匹配,因此您可能需要检查如何调用内核。
我认为在这种情况下这是一个命名不当的错误代码......
I think PyCUDA uses the CUDA driver API, so the following may be what is wrong: CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES can happen if you do not specify enough arguments, or you specify the wrong size for arguments, when using
cuLaunch()
to launch kernels. Since you are using PyCUDA, it could be pretty easy to mismatch the argument list required for a kernel and the arguments you are actually passing, so you might want to check how you are calling your kernels.I think that this is a poorly named error code in this situation...
请参阅此答案
每个线程的CUDA最大寄存器:sm_12 vs sm_20
看来 70 个寄存器太多了。
See this answer
CUDA maximum registers per thread: sm_12 vs sm_20
It seems 70 registers is too many registers.