如何使用 OpenMP 在 omp single 中嵌套 omp for
我有一个 for 循环,我不想并行化它,它调用一个我想要并行化的函数(其中有一个我想要并行化的 for 循环)。我想将并行区域放在整个区域之外,以便我的线程仅创建一次(以减少线程创建的开销)。
然而,目前我有一个覆盖 for 循环的 omp single
,它调用该函数和函数内部的一个 omp for
来处理内部 for 循环。它手,并且根据 OMP single 挂在里面 这是因为这样做是非法的!
如果我不能这样做,我该如何处理呢?我想确保只有一个线程运行外部 for 循环并调用该函数,但在函数内部我可以获得完全并行性。
这可能吗?有什么想法吗?
I have a for loop, which I don't want to parallelise, which calls a function which I want to parallelise (which has a for loop in it that I want to parallelise). I want to put the parallel region outside of the whole lot, so that my threads only get created once (to reduce the overhead of thread creation).
However, at the moment I have a omp single
covering the for loop, which calls the function and an omp for
inside the function to deal with the internal for loop. It hands, and according to OMP single hangs inside for this is because doing that is illegal!
If I can't do it that way, how can I approach it? I want to make sure that only one thread runs the outer for loop and calls the function, but that inside the function I can get full parallelism.
Is this possible? Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
大多数实现仅创建线程一次 - 在程序启动时或遇到第一个并行区域时。一旦创建,它们通常不会被销毁,而是在遇到并行区域末尾时放入空闲线程池(由 OpenMP 实现处理)中。这意味着您应该能够将并行区域放入循环内,并且每次遇到并行区域时都不会产生线程创建开销。每次遇到并行区域时都会有一些小的开销,但比创建线程时要小得多。
Most implementations only create the threads once - either when the program is started or when the first parallel region is encountered. Once created they are generally not destroyed, but put into a free thread pool (handled by the OpenMP implementation) when the end of parallel region is encountered. This means that you should be able to put the parallel region within the loop and not have the thread creation overhead each time the parallel region is encountered. There will be some small overhead each time the parallel region is encountered, but much smaller than when the threads are created.
怎么样:
- 将内部循环放入 #pragma omp 并行中
- 在外循环之前将活动线程数设置为 1
- 在调用其他函数之前将其设置回 N
- 在函数内部放置#pragma omp for
?
并行部分在 OMP 中的函数边界上是瞬态的,并且设置活动线程的数量应该不会太有害。但不需要进行测试/基准测试。
What about :
- putting your inner loop in a #pragma omp parallel
- setting the number of active threads to one before your outer loop
- set it back to N before calling your other function
- put a #pragma omp for inside the function
?
Parallel sections are transient over function boundaries in OMP and setting the number of actives threads should not be too detrimental. Needs ot be tested/benchmarked though.