CUDA/PyCUDA:诊断在 cuda-gdb 下消失的启动失败

发布于 2024-11-02 10:21:37 字数 118 浏览 0 评论 0原文

有人知道在 cuda-gdb 下运行时消失的内核启动失败的可能调查途径吗?内存分配符合规范,每次在同一内核的同一运行中启动都会失败,并且(到目前为止)它在调试器中尚未失败。

噢,太棒了,大师们,现在怎么办?

Anyone know likely avenues of investigation for kernel launch failures that disappear when run under cuda-gdb? Memory assignments are within spec, launches fail on the same run of the same kernel every time, and (so far) it hasn't failed within the debugger.

Oh Great SO Gurus, What now?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

热风软妹 2024-11-09 10:21:38

CUDA GDB可以使一些cuda操作同步。

  • 您是否在初始化后从内存中读取数据?
  • 你在使用 Streams 吗?
  • 您是否启动了多个内核?
  • 它在哪里以及如何失败?

CUDA GDB can make some of the cuda operations synchronous.

  • Are you reading from a memory after has been initialized ?
  • are you using Streams?
  • Are you launching more than one kernel?
  • Where and how does it fail ?
苦妄 2024-11-09 10:21:37

cuda-gdb 将所有共享内存和寄存器溢出到本地内存。因此,当为调试而构建的某些东西运行正常但否则失败时,通常意味着共享内存访问越界。 cuda-memcheck 可能会有所帮助,具体取决于您使用的卡类型。费米在这方面比旧卡更好。

编辑
让我回想起过去糟糕的日子,我记得有一个脾气暴躁的 GT9500,它过去常常抛出类似的 NV13 错误,并且在运行具有大量共享内存活动的内存密集型内核时出现随机代码故障。调试时从来没有。我将其归咎于硬件故障,然后转而使用 GT200,此后再也没有出现过类似的错误。一种可能性可能是硬件不良。这是 G92(9800GT 或类似产品)吗?

cuda-gdb spills all shared memory and registers to local memory. So when something runs ok built for debugging and fails otherwise, it usually means out of bounds shared memory access. cuda-memcheck might help, depending on what sort of card you are using. Fermi is better than older cards in that respect.

EDIT:
Casting my mind back to the bad old days, I remember having an ornery GT9500 which used to throw similar NV13 errors and have random code failures when running very memory intensive kernels with a lot of shared memory activity. Never when debugging. I put it down to bad hardware and moved on to a GT200, never to see a similar error since. One possibility might be bad hardware. Is this a G92 (9800GT or similar)?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文