CUDA/PyCUDA:诊断在 cuda-gdb 下消失的启动失败
有人知道在 cuda-gdb 下运行时消失的内核启动失败的可能调查途径吗?内存分配符合规范,每次在同一内核的同一运行中启动都会失败,并且(到目前为止)它在调试器中尚未失败。
噢,太棒了,大师们,现在怎么办?
Anyone know likely avenues of investigation for kernel launch failures that disappear when run under cuda-gdb? Memory assignments are within spec, launches fail on the same run of the same kernel every time, and (so far) it hasn't failed within the debugger.
Oh Great SO Gurus, What now?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
CUDA GDB可以使一些cuda操作同步。
CUDA GDB can make some of the cuda operations synchronous.
cuda-gdb 将所有共享内存和寄存器溢出到本地内存。因此,当为调试而构建的某些东西运行正常但否则失败时,通常意味着共享内存访问越界。 cuda-memcheck 可能会有所帮助,具体取决于您使用的卡类型。费米在这方面比旧卡更好。
编辑:
让我回想起过去糟糕的日子,我记得有一个脾气暴躁的 GT9500,它过去常常抛出类似的 NV13 错误,并且在运行具有大量共享内存活动的内存密集型内核时出现随机代码故障。调试时从来没有。我将其归咎于硬件故障,然后转而使用 GT200,此后再也没有出现过类似的错误。一种可能性可能是硬件不良。这是 G92(9800GT 或类似产品)吗?
cuda-gdb spills all shared memory and registers to local memory. So when something runs ok built for debugging and fails otherwise, it usually means out of bounds shared memory access. cuda-memcheck might help, depending on what sort of card you are using. Fermi is better than older cards in that respect.
EDIT:
Casting my mind back to the bad old days, I remember having an ornery GT9500 which used to throw similar NV13 errors and have random code failures when running very memory intensive kernels with a lot of shared memory activity. Never when debugging. I put it down to bad hardware and moved on to a GT200, never to see a similar error since. One possibility might be bad hardware. Is this a G92 (9800GT or similar)?