CUDA 中的异步 memcpy 期间的设备同步
假设我想在 CUDA 中执行异步 memcpy 主机到设备,然后立即运行内核。如何在内核中测试异步传输是否已完成?
Suppose I want to perform an async memcpy host to device in CUDA, then immediately run the kernel. How can I test in the kernel if the async transfer has completed ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用 CUDA“流”对异步复制和内核启动进行排序可确保内核在异步传输完成后执行。下面的代码示例演示了:
输出:
我不相信有任何受支持的方法可以从内核内部测试是否满足某些异步条件(例如异步传输的完成)。假定 CUDA 线程块完全独立于其他执行线程执行。
Sequencing your asynchronous copy and kernel launch using a CUDA "stream" ensures that the kernel executes after the asynchronous transfer has completed. The following code example demonstrates:
And the output:
I don't believe there's any supported way to test from within a kernel whether some asynchronous condition (such as the completion of an asynchronous transfer) has been met. CUDA thread blocks are assumed to execute completely independently from other threads of execution.