如何使用 CUDA 在一个 GPU 上运行 1 个以上内核?
kernel1 <<< blocks1, threads1, 0, stream1 >>> ( args ... );
...
kernel2 <<< blocks2, threads2, 0, stream2 >>> ( args ... );
...
我有两个内核要同时运行,
设备是GTX460,所以是Fermi架构。
cuda工具包和sdk是3.2 rc。
与上面的代码一样,两个内核被编码为同时运行,
但没有任何内核的响应。
内核所做的事情是否有任何限制? 两个内核共享一些数据
他们有一些共同点。
如果我注释掉一个内核函数的大部分,那么程序就会停止。
请给我任何帮助。
kernel1 <<< blocks1, threads1, 0, stream1 >>> ( args ... );
...
kernel2 <<< blocks2, threads2, 0, stream2 >>> ( args ... );
...
I have two kernels to run concurrently,
and the device is GTX460, so it's Fermi architecture.
The cuda toolkit and sdk are 3.2 rc.
Like codes above, two kernels are coded to be run concurrently,
but there are no responses from any kernel.
Is there any constraints on what kernels are doing?
Two kernels share some data
and they have some part in common.
If I comment out most of one kernel function, then program halts.
Please give me any help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在不同流上运行的事实并不意味着它们将同时运行。
如果第一个内核所需的资源量足以允许运行第二个内核,则情况就是如此,否则它们将串行运行。
确保在两次内核调用之后有一个 cudaSyncThreads(),或者在两个线程上同步。请记住,所有 cuda 调用都是异步的。
the fact that are running on different streams does not imply they will run concurrently.
If the amount of resources needed by the first kernel is such that allows to run the second kernel then it will be the case, otherwise they will run serially.
Make sure to have a cudaSyncThreads() after the two kernel invocations, or sync on both threads. Remember that all the cuda invocations are async.