屈服于 OpenMP 中的其他线程/任务
我想将 OpenMP 与 CUDA 结合使用来实现重叠内核执行。这些内核调用都是异步的,但我在启动之间的代码很少,因此各个 OpenMP 线程在尝试启动另一个内核或执行内存复制时往往会阻塞(我并不总是在调用后立即获得内存副本,因此异步内存副本不一定是解决方案)。我想要一种方法来向 OpenMP 调度程序发出信号以切换到另一个 OpenMP 线程。这在 OpenMP 中可能吗?
例子:
int main() {
#pragma omp parallel for
for(int i=0;i<10;i++) {
for(int j=0;j<10;j++) {
//call kernel here
// ----> Would like to signal to continue with other
// threads as next call will block
//copy data from kernel
}
}
}
I want to use OpenMP with CUDA to achieve overlapping kernel executions. Ther kernel calls are all asynchronous, but I have very little code between launches so the individual OpenMP threads tend to block as they try to launch another kernel, or do a mem copy (I don't always have mem copys right after the call so async mem copys aren't necessarily the solution). I would like a way to signal to the OpenMP schedular to switch to another OpenMP thread. Is this possible in OpenMP?
Example:
int main() {
#pragma omp parallel for
for(int i=0;i<10;i++) {
for(int j=0;j<10;j++) {
//call kernel here
// ----> Would like to signal to continue with other
// threads as next call will block
//copy data from kernel
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果线程阻塞,操作系统的调度程序应该自动切换到另一个可运行的线程(如果有可用的线程),因此您不需要执行任何操作。
但是,如果您的 OpenMP 程序所做的所有事情都是调用 CUDA 内核,那么 GPU 很可能是瓶颈,因此您无论如何也不会从使用 CPU 上的线程中获得太多好处。可能根本不值得使用 OpenMP。
不过,如果您继续使用 OpenMP,则可能应该向该
omp parallel for
添加一个collapse(2)
。If a thread blocks, the operating system's scheduler should automatically switch to another runnable thread (if one is available), so you shouldn't need to do anything.
However, if all your OpenMP program is doing is calling CUDA kernels, it's likely that the GPU is the bottleneck, so you won't get much benefit from using threads on the CPU anyway. It may not be worth using OpenMP at all.
If you do continue using OpenMP, though, you should probably add a
collapse(2)
to thatomp parallel for
.