诊断 CUDA 内核问题

发布于 2024-11-01 22:47:15 字数 871 浏览 0 评论 0原文

CUDA 到处都有很多文档和指南,但我找不到任何形式的说明,如何诊断编译但收到神秘、模糊的错误消息(例如“未指定的启动失败”)的内核正常的“这些块/网格结构有意义吗?” ?

我可以以某种方式拦截 cubin 文件并对内存结构等进行一些静态分析吗 请原谅我的菜鸟,但我在任何地方都找不到任何明确的白痴指南。

祝大家周末愉快。

我在寻找什么

  • 如何分离出 cubin 中间文件
  • 然后如何处理它以弄清楚发生了什么,特别是寄存器和内存配置,以查看我的代码是否违反了任何硬件要求,或者我是否只是错过了某处出现相差一错误。

对于后来遇到这个问题的任何人(我似乎有创建一些问题的习惯,这些问题在几个月后不断出现在我自己的查询中......) CUDA-Memcheck 提供比“检查错误”处理程序更有趣的响应。例如,

========= Error: process didn't terminate successfully
========= Invalid __global__ write of size 4
=========     at 0x00000040 in decomp
=========     by thread (1,0,0) in block (0,0,0)
=========     Address 0x00101024 is out of bounds
=========
========= ERROR SUMMARY: 1 error

我什至不必解释该错误消息......

CUDA has lots of documentation and guides all over the place, but one I haven't been able to find has been any form of instruction in how to diagnose kernels that compile but get cryptic, vague error messages such as 'unspecified launch failure' beyond the normal "Do these block/grid structures make sense?" etc.

Can I intercept the cubin file somehow and do some static analysis on the memory structures etc? Forgive my noobness but I can't find any definitive, idiots guide, anywhere.

Have a good weekend everyone.

What I'm looking for

  • How to separate out the cubin intermediate file
  • What to do with it afterwards to work out what's going on, specifically register and memory configuration to see if my code is violating any hardware requirements, or if I'm just missing an off-by-one error somewhere.

For anyone coming across this later (I seem to have a habit of creating SO questions that keep showing up in my own queries months later...) CUDA-Memcheck gives much more interesting responses that the 'check error' handlers. eg

========= Error: process didn't terminate successfully
========= Invalid __global__ write of size 4
=========     at 0x00000040 in decomp
=========     by thread (1,0,0) in block (0,0,0)
=========     Address 0x00101024 is out of bounds
=========
========= ERROR SUMMARY: 1 error

I don't even have to explain that error message...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

因为看清所以看轻 2024-11-08 22:47:15

在 CUDA 中,“未指定的启动失败”相当于段错误。

最近的工具包版本附带了一个名为 cuda-memcheck 的实用程序。它执行类似 valgrind 的分析,对正在执行的内核中的内存事务进行分析,并报告缓冲区溢出或内核中任何非法指针的使用。您可以将其用作进一步分析的启动点。如果您使用 Fermi 卡,还有内核内的 printf 支持,那么生成您自己的断言函数来测试和报告内核内的错误情况并不困难。

CUDA 还附带源级调试器,但您需要专用 GPU 才能使用它。如果您使用的是 Linux 并且只有一个 GPU,请退出 X11 并从控制台 TTY 运行它。

In CUDA, "unspecified launch failure" is the equivalent of a segfault.

Recent toolkit versions ship with a utility called cuda-memcheck. It performs valgrind like analysis of memory transactions inside an executing kernel, and will report buffer overflows or any illegal pointer usage in a kernel. You can use that as a launching point for further analysis. If you are using a Fermi card, there is also in-kernel printf support, it isn't hard to generate your own assert function to test and report for error conditions inside a kernel.

CUDA also ships with a source level debugger, but you need a dedicated GPU to use it. If you are on linux and only have a single GPU, quit out of X11 and run it from a console TTY.

旧夏天 2024-11-08 22:47:15

如果您设置保留预处理文件标志 --keep,这将留下 CUBIN 文件和许多其他文件供您查看。但我不确定这会有多大帮助。

If you set the Keep Preprocessed Files flag --keep this will leave the CUBIN files and a host of others lying around for you to take a look at. But I'm not sure this will help that much.

作死小能手 2024-11-08 22:47:15

您正在使用 cudaGetLastError() 吗?如果尚未用于给出“未指定的发射失败”,这可能有助于提供更多信息。

Are you using cudaGetLastError()? That could help give more information if it's not already used to give 'unspecified launch failure'.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文