CUDA 阻塞标志
创建 CUDA 事件时,您可以选择打开 cudaEventBlockingSync
标志。但是 - 如果创建带有或不带有标志的事件之间存在差异怎么办?我阅读了精美的手册;这对我来说没有意义。什么是“调用主机线程”,以及当您不使用该标志时什么会“阻塞”?
4.6.2.7 cudaError_t cudaEventSynchronize(cudaEvent_t事件)
阻塞直到事件实际发生 被记录。 ...等待一个 使用创建的事件 cudaEventBlockingSync 标志将导致 调用主机线程阻塞直到 该事件实际上已被记录。
When creating a CUDA event, you can optionally turn on the cudaEventBlockingSync
flag. But - what if the difference between creating an event with or without the flag? I read the fine manual; it just doesn't make sense to me. What is the "calling host thread", and what "blocks" when you don't use the flag?
4.6.2.7 cudaError_t cudaEventSynchronize(cudaEvent_t event)
Blocks until the event has actually
been recorded. ... Waiting for an
event that was created with the
cudaEventBlockingSync flag will cause
the calling host thread to block until
the event has actually been recorded.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
cudaEventBlockingSync
将定义主机如何等待事件发生。当设置
cudaEventBlockingSync
时,CPU 可以放弃主机线程。即CPU将被传递到一个不同的线程(可能是一个进程)。宿主线程稍后会重新获取CPU。采用这种方式,主机线程不会独占所有CPU时间,可以允许主机做其他工作。当
cudaEventBlockingSync
未设置时,CPU将忙等待,即CPU将进入检查事件循环。当这种情况发生时,CPU 就会旋转,寻找要发生的事件。这通常会导致 CPU 性能表显示为 100%。通过这种方法,主机线程独占所有 CPU 时间。不设置 cudaEventBlockingSync 会导致从内核执行结束到控制返回线程的延迟最小。您要使用哪个设置取决于内核正在执行的操作。即事件发生需要多长时间,以及 CPU 阻塞涉及多少调度开销。不设置此标志的代价是在等待事件发生时无法执行任何其他 CPU 工作(其他线程)。
cudaEventBlockingSync
will define how the host will wait for the event to happen.When
cudaEventBlockingSync
is SET the CPU can give up the host thread. i.e. The CPU will be passed a different thread (possibly of a process). The host thread will re-acquire the CPU at a later time. With this approach, the host thread does not monopolize all the CPU time, the host can be allowed to do other work.When
cudaEventBlockingSync
is NOT SET the CPU will busy-wait, i.e. the CPU will enter a check-event loop. When this happens the CPU just spins, looking for the event to occur. This usually causes the CPU performance meter to peg-out to 100%. With this approach, the host thread monopolizes all the CPU time.Not setting
cudaEventBlockingSync
results in the minimum latency from kernel execution conclusion to the control returning to the thread. Which setting you want to use depends on what the kernel is doing. i.e. How long will it take for the event to happen, versus, how much schedule overhead is involved with the CPU blocking. Not setting this flag comes at the cost of not being able to do any other CPU work (other threads) while waiting for the event to occur.当您调用该函数时,线程将停止执行,直到该事件发生,此时程序将继续。这是确保您了解正在运行的程序的状态的一种方法。这在 CUDA 中尤其重要,因为很多事情都是异步的。
“调用主机线程”是在CUDA设备所在的主机的CPU上运行的线程。
编辑以回应下面的评论:
我认为“阻塞同步”和常规同步之间的区别在于,线程会阻塞并且在事件完成之前不会运行,而不是在等待时“旋转”的线程,不断检查该值。这意味着线程不会使用任何额外的 CPU 时间旋转,而是会在事件完成后被唤醒。例如,如果您在 CPU 时间非常宝贵的服务器上运行此程序,或者您必须按单位时间付费,则这非常有用。
when you call that function, the thread will stop executing until that event happens, at which time the program continues. It is a way of making sure you know the state of the running program. This is especially important in CUDA because so many things are asynchronous.
The "calling host thread" is the thread that is running on the CPU of the host computer in which the CUDA device resides.
edit in response to comment below:
I believe that the difference between a "blocking sync" and a regular sync is that the thread blocks and will not run until the event is completed, as opposed to a thread that "spins" as it waits, constantly checking the value. This means that the thread will not use any extra CPU time spinning, but will instead be awakened once the event is completed. This is useful if, say, you're running this program on a server where CPU time is at a premium or you have to pay per unit time.