OpenMP for 循环顺序并行
我正在寻找使用 OpenMP 的多线程 for 循环。 据我了解,当你做一个循环时;
#pragma omp parallel for num_threads(NTHREADS)
for (size_t i = 0; i < length; i++)
{
...
所有线程都会抓住一个 i 并继续他们的工作。 对于我的实现,我需要让它们“按顺序”并行工作。 我的意思是,例如,对于具有 8 个线程的 800 长度,我需要线程 1 在 0 到 99 上工作,线程 2 在 100-199 上工作,依此类推。
OpenMP 可以做到这一点吗?
I'm looking to multithread a for loop using OpenMP.
As I understood when you do a loop like;
#pragma omp parallel for num_threads(NTHREADS)
for (size_t i = 0; i < length; i++)
{
...
All the threads will just grab an i and move on with their work.
For my implementation, I need to have it that they work "sequentially" in parallel.
By that I mean that e.g., for a length of 800 with 8 threads, I need thread 1 to work on 0 to 99, thread 2 to work on 100-199 and so on.
Is this possible with OpenMP?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您想要的行为是默认行为。可以通过多种方式安排循环,
schedule(static)
是默认设置:循环分为多个块,第一个线程获取第一个块,等等。所以你最初的理解是错误的:线程获取的不是索引,而是块。
只是要注意:如果您希望线程获取较小的块,您可以指定
schedule(static,8)
或任何适合您的数字,但少于 8 会遇到缓存性能问题。Your desired behavior is the default. The loop can be scheduled in several ways, and
schedule(static)
is the default: the loop gets divided in blocks, and the first thread takes the first block, et cetera.So your initial understanding was wrong: a thread does not grab an index, but a block.
Just to note: if you want a thread to grab a smaller block, you can specify
schedule(static,8)
or whatever number suits you, but less than 8 runs into cache performance problems.来自 OpenMP 规范:
默认计划取自 def-sched-var< /a> 并且它是实现定义的,因此如果您的程序依赖于它,请显式定义它:
在这种情况下,它清楚地定义了您的程序的行为方式,并且根本不依赖于实现。另请注意,您的代码不应依赖于线程数,因为 OpenMP 不保证您的并行区域获得所有请求/可用的线程。另请注意,
monotonic
修饰符是static
计划情况下的默认修饰符,因此您不必显式声明它。因此,如果上述“近似”块大小或确切的线程数在您的情况下不是问题,那么您的代码应该是
另一方面,如果您需要控制 chunk_size 和/或确切的线程数使用过,你应该使用类似的东西
From OpenMP specification:
The default schedule is taken from def-sched-var and it is implementation defined, so if your program relies on it, define it explicitly:
In this case it is clearly defined how your program behaves and does not depend on the implementation at all. Note also that your code should not depend on the number of threads, because OpenMP does not guarantee that your parallel region gets all the requested/available threads. Note also that the
monotonic
modifier is the default in the case ofstatic
schedule, so you do not have to state it explicitly.So, if the above mentioned 'approximate' chunk size or the exact number of threads is not an issue in your case, your code should be
On the other hand, if you need a control on chunk_size and/or on the exact number of threads used, you should use something like