屈服于 OpenMP 中的其他线程/任务

发布于 2024-12-07 22:41:24 字数 549 浏览 0 评论 0原文

我想将 OpenMP 与 CUDA 结合使用来实现重叠内核执行。这些内核调用都是异步的,但我在启动之间的代码很少,因此各个 OpenMP 线程在尝试启动另一个内核或执行内存复制时往往会阻塞(我并不总是在调用后立即获得内存副本,因此异步内存副本不一定是解决方案)。我想要一种方法来向 OpenMP 调度程序发出信号以切换到另一个 OpenMP 线程。这在 OpenMP 中可能吗?

例子:

int main() {
   #pragma omp parallel for
   for(int i=0;i<10;i++) {
       for(int j=0;j<10;j++) {
           //call kernel here

           // ---->   Would like to signal to continue with other  
           //           threads as next call will block

           //copy data from kernel
       }
   }
}

I want to use OpenMP with CUDA to achieve overlapping kernel executions. Ther kernel calls are all asynchronous, but I have very little code between launches so the individual OpenMP threads tend to block as they try to launch another kernel, or do a mem copy (I don't always have mem copys right after the call so async mem copys aren't necessarily the solution). I would like a way to signal to the OpenMP schedular to switch to another OpenMP thread. Is this possible in OpenMP?

Example:

int main() {
   #pragma omp parallel for
   for(int i=0;i<10;i++) {
       for(int j=0;j<10;j++) {
           //call kernel here

           // ---->   Would like to signal to continue with other  
           //           threads as next call will block

           //copy data from kernel
       }
   }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

猥琐帝 2024-12-14 22:41:24

如果线程阻塞,操作系统的调度程序应该自动切换到另一个可运行的线程(如果有可用的线程),因此您不需要执行任何操作。

但是,如果您的 OpenMP 程序所做的所有事情都是调用 CUDA 内核,那么 GPU 很可能是瓶颈,因此您无论如何也不会从使用 CPU 上的线程中获得太多好处。可能根本不值得使用 OpenMP。

不过,如果您继续使用 OpenMP,则可能应该向该 omp parallel for 添加一个 collapse(2)

If a thread blocks, the operating system's scheduler should automatically switch to another runnable thread (if one is available), so you shouldn't need to do anything.

However, if all your OpenMP program is doing is calling CUDA kernels, it's likely that the GPU is the bottleneck, so you won't get much benefit from using threads on the CPU anyway. It may not be worth using OpenMP at all.

If you do continue using OpenMP, though, you should probably add a collapse(2) to that omp parallel for.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文