CUDA 应用程序超时 & 几秒钟后失败 - 如何解决这个问题？

发布于 2024-07-12 04:32:13 字数 136 浏览 12 评论 0原文

我注意到 CUDA 应用程序在失败并退出之前的最大运行时间往往为 5-15 秒。我意识到最好不要让 CUDA 应用程序运行那么长时间，但假设使用 CUDA 是正确的选择，并且由于每个线程的顺序工作量必须运行那么长时间，有什么方法可以延长这个时间或绕过它？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

饮惑 2024-07-19 04:32:13

我不是 CUDA 专家，---我一直在使用 AMD Stream SDK 进行开发，据我所知，它大致相当。

您可以禁用 Windows 看门狗计时器，但强烈不建议这样做，原因显而易见。
要禁用它，您需要注册表编辑HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Watchdog\Display\DisableBugCheck，创建一个REG_DWORD并将其设置为1。
您可能还需要在 NVidia 控制面板中执行某些操作。在 CUDA 文档中查找对“VPU 恢复”的一些参考。

理想情况下，您应该能够将内核操作分解为对数据的多次传递，以将其分解为在时间限制内运行的操作。

或者，您可以划分问题域，以便每个命令计算更少的输出像素。即，不是一下子计算 1,000,000 个输出像素，而是向 GPU 发出 10 个命令，每个命令计算 100,000 个。

必须适合时间片的基本单元不是整个应用程序，而是单个命令缓冲区的执行。在 AMD Stream SDK 中，通过使用 CtxFlush() 调用显式刷新命令队列，可以将长操作序列分解为多个时间片。也许CUDA有类似的东西？

您不应该不必在每个时间片上通过 PCIX 总线来回读取所有数据；您可以将纹理等保留在 GPU 本地内存中；您只是偶尔完成一些命令缓冲区，以向操作系统证明您没有陷入无限循环。

最后，GPU 速度很快，因此如果您的应用程序无法在 5 或 10 秒内完成有用的工作，我会将其视为出现问题的迹象。

[2010 年 3 月编辑更新：] （再次过时，请参阅下面的更新以获取最新信息） 上面的注册表项已过时。我认为这是 Windows XP 64 位的关键。 Vista 和 Windows 7 有新的注册表项。您可以在此处找到它们：http ://www.microsoft.com/whdc/device/display/wddm_timeout.mspx
或此处： http://msdn.microsoft.com/en-us/library /ee817001.aspx

[2015 年 4 月编辑更新：] 这真的已经过时了。为 Cuda 编程禁用 TDR 的最简单方法（假设您已安装 NVIDIA Nsight 工具）是打开 Nsight Monitor，单击“Nsight Monitor options”，然后在“General”下将“WDDM TDRenabled”设置为 false。这将为您更改注册表设置。关闭并重新启动。对 TDR 注册表设置的任何更改只有在重新启动后才会生效。

[编辑 2018 年 8 月更新：]
尽管 NVIDIA 工具现在允许禁用 TDR，但 AMD/OpenCL 开发人员也面临同样的问题。对于这些：记录 TDR 设置的当前链接位于 https://learn.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys

I'm not a CUDA expert, --- I've been developing with the AMD Stream SDK, which AFAIK is roughly comparable.

You can disable the Windows watchdog timer, but that is highly not recommended, for reasons that should be obvious.
To disable it, you need to regedit HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Watchdog\Display\DisableBugCheck, create a REG_DWORD and set it to 1.
You may also need to do something in the NVidia control panel. Look for some reference to "VPU Recovery" in the CUDA docs.

Ideally, you should be able to break your kernel operations up into multiple passes over your data to break it up into operations that run in the time limit.

Alternatively, you can divide the problem domain up so that it's computing fewer output pixels per command. I.e., instead of computing 1,000,000 output pixels in one fell swoop, issue 10 commands to the gpu to compute 100,000 each.

The basic unit that has to fit within the time slice is not your entire application, but the execution of a single command buffer. In the AMD Stream SDK, a long sequence of operations can be broken up into multiple time slices by explicitly flushing the command queue with a CtxFlush() call. Perhaps CUDA has something similar?

You should not have to read all of your data back and forth across the PCIX bus on every time slice; you can leave your textures, etc. in gpu local memory; you just have some command buffers complete occasionally, to prove to the OS that you're not stuck in an infinite loop.

Finally, GPUs are fast, so if your application is not able to do useful work in that 5 or 10 seconds, I'd take that as a sign that something is wrong.

[EDIT Mar 2010 to update:] (outdated again, see the updates below for the most recent information) The registry key above is out-of-date. I think that was the key for Windows XP 64-bit. There are new registry keys for Vista and Windows 7. You can find them here: http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx
or here: http://msdn.microsoft.com/en-us/library/ee817001.aspx

[EDIT Apr 2015 to update:] This is getting really out of date. The easiest way to disable TDR for Cuda programming, assuming you have the NVIDIA Nsight tools installed, is to open the Nsight Monitor, click on "Nsight Monitor options", and under "General" set "WDDM TDR enabled" to false. This will change the registry setting for you. Close and reboot. Any change to the TDR registry setting won't take effect until you reboot.

[EDIT August 2018 to update:]
Although the NVIDIA tools allow disabling the TDR now, the same question is relevant for AMD/OpenCL developers. For those: The current link that documents the TDR settings is at https://learn.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys

回复收藏 0 原文

三月梨花 2024-07-19 04:32:13

在 Windows 上，图形驱动程序有一个看门狗计时器，可以杀死任何运行时间超过 5 秒的着色器程序。请注意，Xorg/XFree86 驱动程序不会执行此操作，因此一种可能的解决方法是在 Linux 上运行 CUDA 应用程序。

AFAIK 在 Windows 上禁用看门狗定时器是不可能的。在 Windows 上解决此问题的唯一方法是使用第二张没有显示屏幕的卡。它不一定是特斯拉，但它必须没有活动屏幕。

回复收藏 0 原文

沫尐诺 2024-07-19 04:32:13

解决超时检测和恢复 - WINDOWS 7（32/64 位）
在 Windows 中创建注册表项以将 TDR 设置更改为
更高的量，以便 Windows 在之前允许更长的延迟
TDR 流程启动。
从运行或 DOS 中打开 Regedit。
在 Windows 7 中，导航到正确的注册表项区域，以创建
新密钥：
HKEY_LOCAL_MACHINE>SYSTEM>CurrentControlSet>Control>GraphicsDrivers。
那里可能会有一个名为 DxgKrnlVersion 的密钥作为
双字。
右键单击并选择创建一个新密钥REG_DWORD，并将其命名
Tdr延迟。分配给它的值是之前的秒数
TDR 启动 - 它> 当前在 Windows 中自动为 2（甚至
虽然reg。键值不存在>直到您创建它）。分配
它具有新值（我尝试了 4 秒），这使之前的时间加倍
时域反射率。然后重新启动电脑。您需要重新启动 PC，该值才会生效
工作。