CUDA 应用程序超时 & 几秒钟后失败 - 如何解决这个问题?

发布于 2024-07-12 04:32:13 字数 136 浏览 9 评论 0原文

我注意到 CUDA 应用程序在失败并退出之前的最大运行时间往往为 5-15 秒。 我意识到最好不要让 CUDA 应用程序运行那么长时间,但假设使用 CUDA 是正确的选择,并且由于每个线程的顺序工作量必须运行那么长时间,有什么方法可以延长这个时间或绕过它?

I've noticed that CUDA applications tend to have a rough maximum run-time of 5-15 seconds before they will fail and exit out. I realize it's ideal to not have CUDA application run that long but assuming that it is the correct choice to use CUDA and due to the amount of sequential work per thread it must run that long, is there any way to extend this amount of time or to get around it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

饮惑 2024-07-19 04:32:13

我不是 CUDA 专家,---我一直在使用 AMD Stream SDK 进行开发,据我所知,它大致相当。

您可以禁用 Windows 看门狗计时器,但强烈不建议这样做,原因显而易见。
要禁用它,您需要注册表编辑HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Watchdog\Display\DisableBugCheck,创建一个REG_DWORD并将其设置为1。
您可能还需要在 NVidia 控制面板中执行某些操作。 在 CUDA 文档中查找对“VPU 恢复”的一些参考。

理想情况下,您应该能够将内核操作分解为对数据的多次传递,以将其分解为在时间限制内运行的操作。

或者,您可以划分问题域,以便每个命令计算更少的输出像素。 即,不是一下子计算 1,000,000 个输出像素,而是向 GPU 发出 10 个命令,每个命令计算 100,000 个。

必须适合时间片的基本单元不是整个应用程序,而是单个命令缓冲区的执行。 在 AMD Stream SDK 中,通过使用 CtxFlush() 调用显式刷新命令队列,可以将长操作序列分解为多个时间片。 也许CUDA有类似的东西?

您不应该不必在每个时间片上通过 PCIX 总线来回读取所有数据; 您可以将纹理等保留在 GPU 本地内存中; 您只是偶尔完成一些命令缓冲区,以向操作系统证明您没有陷入无限循环。

最后,GPU 速度很快,因此如果您的应用程序无法在 5 或 10 秒内完成有用的工作,我会将其视为出现问题的迹象。

[2010 年 3 月编辑更新:] (再次过时,请参阅下面的更新以获取最新信息) 上面的注册表项已过时。 我认为这是 Windows XP 64 位的关键。 Vista 和 Windows 7 有新的注册表项。您可以在此处找到它们:http ://www.microsoft.com/whdc/device/display/wddm_timeout.mspx
或此处: http://msdn.microsoft.com/en-us/library /ee817001.aspx

[2015 年 4 月编辑更新:] 这真的已经过时了。 为 Cuda 编程禁用 TDR 的最简单方法(假设您已安装 NVIDIA Nsight 工具)是打开 Nsight Monitor,单击“Nsight Monitor options”,然后在“General”下将“WDDM TDRenabled”设置为 false。 这将为您更改注册表设置。 关闭并重新启动。 对 TDR 注册表设置的任何更改只有在重新启动后才会生效。

[编辑 2018 年 8 月更新:]
尽管 NVIDIA 工具现在允许禁用 TDR,但 AMD/OpenCL 开发人员也面临同样的问题。 对于这些:记录 TDR 设置的当前链接位于 https://learn.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys

I'm not a CUDA expert, --- I've been developing with the AMD Stream SDK, which AFAIK is roughly comparable.

You can disable the Windows watchdog timer, but that is highly not recommended, for reasons that should be obvious.
To disable it, you need to regedit HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Watchdog\Display\DisableBugCheck, create a REG_DWORD and set it to 1.
You may also need to do something in the NVidia control panel. Look for some reference to "VPU Recovery" in the CUDA docs.

Ideally, you should be able to break your kernel operations up into multiple passes over your data to break it up into operations that run in the time limit.

Alternatively, you can divide the problem domain up so that it's computing fewer output pixels per command. I.e., instead of computing 1,000,000 output pixels in one fell swoop, issue 10 commands to the gpu to compute 100,000 each.

The basic unit that has to fit within the time slice is not your entire application, but the execution of a single command buffer. In the AMD Stream SDK, a long sequence of operations can be broken up into multiple time slices by explicitly flushing the command queue with a CtxFlush() call. Perhaps CUDA has something similar?

You should not have to read all of your data back and forth across the PCIX bus on every time slice; you can leave your textures, etc. in gpu local memory; you just have some command buffers complete occasionally, to prove to the OS that you're not stuck in an infinite loop.

Finally, GPUs are fast, so if your application is not able to do useful work in that 5 or 10 seconds, I'd take that as a sign that something is wrong.

[EDIT Mar 2010 to update:] (outdated again, see the updates below for the most recent information) The registry key above is out-of-date. I think that was the key for Windows XP 64-bit. There are new registry keys for Vista and Windows 7. You can find them here: http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx
or here: http://msdn.microsoft.com/en-us/library/ee817001.aspx

[EDIT Apr 2015 to update:] This is getting really out of date. The easiest way to disable TDR for Cuda programming, assuming you have the NVIDIA Nsight tools installed, is to open the Nsight Monitor, click on "Nsight Monitor options", and under "General" set "WDDM TDR enabled" to false. This will change the registry setting for you. Close and reboot. Any change to the TDR registry setting won't take effect until you reboot.

[EDIT August 2018 to update:]
Although the NVIDIA tools allow disabling the TDR now, the same question is relevant for AMD/OpenCL developers. For those: The current link that documents the TDR settings is at https://learn.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys

三月梨花 2024-07-19 04:32:13

在 Windows 上,图形驱动程序有一个看门狗计时器,可以杀死任何运行时间超过 5 秒的着色器程序。 请注意,Xorg/XFree86 驱动程序不会执行此操作,因此一种可能的解决方法是在 Linux 上运行 CUDA 应用程序。

AFAIK 在 Windows 上禁用看门狗定时器是不可能的。 在 Windows 上解决此问题的唯一方法是使用第二张没有显示屏幕的卡。 它不一定是特斯拉,但它必须没有活动屏幕。

On Windows, the graphics driver has a watchdog timer that kills any shader programs that run for more than 5 seconds. Note that the Xorg/XFree86 drivers don't do this, so one possible workaround is to run the CUDA apps on Linux.

AFAIK it is not possible to disable the watchdog timer on Windows. The only way to get around this on Windows is to use a second card that has no displayed screens on it. It doesn't have to be a Tesla but it must have no active screens.

沫尐诺 2024-07-19 04:32:13

解决超时检测和恢复 - WINDOWS 7(32/64 位)

在 Windows 中创建注册表项以将 TDR 设置更改为
更高的量,以便 Windows 在之前允许更长的延迟
TDR 流程启动。

从运行或 DOS 中打开 Regedit。

在 Windows 7 中,导航到正确的注册表项区域,以创建
新密钥:

HKEY_LOCAL_MACHINE>SYSTEM>CurrentControlSet>Control>GraphicsDrivers

那里可能会有一个名为 DxgKrnlVersion 的密钥作为
双字。

右键单击并选择创建一个新密钥REG_DWORD,并将其命名
Tdr延迟。 分配给它的值是之前的秒数
TDR 启动 - 它> 当前在 Windows 中自动为 2(甚至
虽然reg。 键值不存在>直到您创建它)。 分配
它具有新值(我尝试了 4 秒),这使之前的时间加倍
时域反射率。 然后重新启动电脑。 您需要重新启动 PC,该值才会生效
工作。

来源 Win7 TDR(驱动程序超时检测和恢复)
我也验证了这一点并且工作正常。

Resolve Timeout Detection and Recovery - WINDOWS 7 (32/64 bit)

Create a registry key in Windows to change the TDR settings to a
higher amount, so that Windows will allow for a longer delay before
TDR process starts.

Open Regedit from Run or DOS.

In Windows 7 navigate to the correct registry key area, to create the
new key:

HKEY_LOCAL_MACHINE>SYSTEM>CurrentControlSet>Control>GraphicsDrivers.

There will probably one key in there called DxgKrnlVersion there as a
DWord.

Right click and select to create a new key REG_DWORD, and name it
TdrDelay. The value assigned to it is the number of seconds before
TDR kicks in - it > is currently 2 automatically in Windows (even
though the reg. key value doesn't exist >until you create it). Assign
it with a new value (I tried 4 seconds), which doubles the time before
TDR. Then restart PC. You need to restart the PC before the value will
work.

Source from Win7 TDR (Driver Timeout Detection & Recovery)
I have also verified this and works fine.

唔猫 2024-07-19 04:32:13

最基本的解决方案是在计算过程中选择一个点,我确信我正在使用的 GPU 能够及时完成,保存所有状态信息并停止,然后重新开始。

更新:
对于 Linux:退出 X 将允许您根据需要运行 CUDA 应用程序。 不需要 Tesla(测试中使用了 9600)。

但是,需要注意的一件事是,如果从未输入 X,则驱动程序可能不会加载,并且无法工作。

似乎对于 Linux 来说,只要当时不显示任何 X 也可以工作,因此只要您屏幕显示到非 X 全屏终端,就不需要退出 X。

The most basic solution is to pick a point in the calculation some percentage of the way through that I am sure the GPU I am working with is able to complete in time, save all the state information and stop, then to start again.

Update:
For Linux: Exiting X will allow you to run CUDA applications as long as you want. No Tesla required (A 9600 was used in testing this)

One thing to note, however, is that if X is never entered, the drivers probably won't be loaded, and it won't work.

It also seems that for Linux, simply not having any X displays up at the time will also work, so X does not need to be exited as long as you screen to a non-X full-screen terminal.

久伴你 2024-07-19 04:32:13

这是不可能的。 超时是为了防止计算错误长时间占用 GPU。

如果您使用专用卡进行 CUDA 工作,则时间限制将被解除。 我不确定这是否需要 Tesla 卡,或者是否可以使用未连接显示器的 GeForce。

This isn't possible. The time-out is there to prevent bugs in calculations from taking up the GPU for long periods of time.

If you use a dedicated card for CUDA work, the time limit is lifted. I'm not sure if this requires a Tesla card, or if a GeForce with no monitor connected can be used.

花桑 2024-07-19 04:32:13

我使用的解决方案是:

1. 将所有信息传递到设备。
2. 运行算法的迭代版本,其中每次迭代都会调用已存储在设备内的内存上的内核。
3. 最后,仅在所有迭代结束后才将内存传输到主机。

这使得可以从 CPU 控制迭代(包括中止选项),而无需在迭代之间进行昂贵的设备--主机内存传输。

The solution I use is:



1. Pass all information to device.

2. Run iterative versions of algorithms, where each iteration invokes the kernel on the memory already stored within the device.

3. Finally transfer memory to host only after all iterations have ended.

This enables control over iterations from CPU (including option to abort), without the costly device<-->host memory transfers between iterations.

心房的律动 2024-07-19 04:32:13

看门狗定时器仅适用于连接显示器的 GPU。

在 Windows 上,计时器是 WDDM 的一部分,可以使用某些注册表项修改设置(超时、达到超时时的行为等),请参阅此 Microsoft 文章了解更多信息。

The watchdog timer only applies on GPUs with a display attached.

On Windows the timer is part of the WDDM, it is possible to modify the settings (timeout, behaviour on reaching timeout etc.) with some registry keys, see this Microsoft article for more information.

深海少女心 2024-07-19 04:32:13

可以在 Linux 中禁用此行为。 尽管“看门狗”的目的很明显,但在使用着色器/CUDA 进行大量计算时,它可能会导致一些非常意外的结果。

该选项可以在您的 X 配置(可能是 /etc/X11/xorg.conf)中切换,

添加:选项“Interactive”“0”到 GPU 的设备部分即可完成这项工作。

请参阅CUDA Visual Profiler“交互式”X 配置选项?

有关详细信息, 配置

参见 ftp://download. nvidia.com/XFree86/Linux-x86/270.41.06/README/xconfigoptions.html#Interactive

有关参数的说明。

It is possible to disable this behavior in Linux. Although the "watchdog" has an obvious purpose, it may cause some very unexpected results when doing extensive computations using shaders / CUDA.

The option can be toggled in your X-configuration (likely /etc/X11/xorg.conf)

Adding: Option "Interactive" "0" to the device section of your GPU does the job.

see CUDA Visual Profiler 'Interactive' X config option?

For details on the config

and

see ftp://download.nvidia.com/XFree86/Linux-x86/270.41.06/README/xconfigoptions.html#Interactive

For a description of the parameter.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文