OpenMP for 循环顺序并行

发布于 2025-01-11 00:17:50 字数 373 浏览 0 评论 0原文

我正在寻找使用 OpenMP 的多线程 for 循环。据我了解，当你做一个循环时；

    #pragma omp parallel for num_threads(NTHREADS)
    for (size_t i = 0; i < length; i++)
    {
    ...

所有线程都会抓住一个 i 并继续他们的工作。对于我的实现，我需要让它们“按顺序”并行工作。我的意思是，例如，对于具有 8 个线程的 800 长度，我需要线程 1 在 0 到 99 上工作，线程 2 在 100-199 上工作，依此类推。

OpenMP 可以做到这一点吗？

原文

I'm looking to multithread a for loop using OpenMP.
As I understood when you do a loop like;

    #pragma omp parallel for num_threads(NTHREADS)
    for (size_t i = 0; i < length; i++)
    {
    ...

All the threads will just grab an i and move on with their work.
For my implementation, I need to have it that they work "sequentially" in parallel.
By that I mean that e.g., for a length of 800 with 8 threads, I need thread 1 to work on 0 to 99, thread 2 to work on 100-199 and so on.

Is this possible with OpenMP?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

娇妻 2025-01-18 00:17:50

您想要的行为是默认行为。可以通过多种方式安排循环，schedule(static) 是默认设置：循环分为多个块，第一个线程获取第一个块，等等。

所以你最初的理解是错误的：线程获取的不是索引，而是块。

只是要注意：如果您希望线程获取较小的块，您可以指定 schedule(static,8) 或任何适合您的数字，但少于 8 会遇到缓存性能问题。

回复收藏 0 原文

等风也等你 2025-01-18 00:17:50

来自 OpenMP 规范：

schedule([修饰符 [, 修饰符]:]kind[, chunk_size])

当 kind 是静态时，迭代被分为大小块
chunk_size，并且块被分配给组中的线程
按照线程号的顺序进行循环方式。每个块
包含 chunk_size 次迭代，包含以下内容的块除外
按顺序进行最后一次迭代，迭代次数可能更少。当没有
chunk_size指定，迭代空间被划分为chunk
大小大约相等，并且最多有一个块
分发到每个线程。块的大小未指定
本案。
当指定monotonic修饰符时，每个线程都会执行
它按递增的逻辑迭代顺序分配的块。
对于一组 p 个线程和一个 n 次迭代的循环，令 n∕p 为
满足 n = p * q - r 的整数 q，其中 0 <= r < p。一合规
实施静态计划（没有指定chunk_size）
其行为就像使用值 q 指定了 chunk_size 一样。
另一个兼容的实现会将 q 次迭代分配给
第一个 p - r 线程，以及对剩余 r 线程的 q - 1 次迭代。
这说明了为什么合格的程序不能依赖细节
特定实现的。

默认计划取自 def-sched-var< /a> 并且它是实现定义的，因此如果您的程序依赖于它，请显式定义它：

schedule(monotonic:static,chunk_size)

在这种情况下，它清楚地定义了您的程序的行为方式，并且根本不依赖于实现。另请注意，您的代码不应依赖于线程数，因为 OpenMP 不保证您的并行区域获得所有请求/可用的线程。另请注意，monotonic 修饰符是 static 计划情况下的默认修饰符，因此您不必显式声明它。

因此，如果上述“近似”块大小或确切的线程数在您的情况下不是问题，那么您的代码应该是

#pragma omp parallel for schedule(static) num_threads(NTHREADS)

另一方面，如果您需要控制 chunk_size 和/或确切的线程数使用过，你应该使用类似的东西

#pragma omp parallel num_threads(NTHREADS)
{
  #pragma omp single
  {
    int nthreads=omp_get_num_threads();
      //calculate chunk_size based on nthreads
      chunk_size=.....
  }

  #pragma omp for schedule(static,chunk_size) 
  for(...)
}

From OpenMP specification:

schedule([modifier [, modifier]:]kind[, chunk_size])

When kind is static, iterations are divided into chunks of size
chunk_size, and the chunks are assigned to the threads in the team in
a round-robin fashion in the order of the thread number. Each chunk
contains chunk_size iterations, except for the chunk that contains the
sequentially last iteration, which may have fewer iterations. When no
chunk_size is specified, the iteration space is divided into chunks
that are approximately equal in size, and at most one chunk is
distributed to each thread. The size of the chunks is unspecified in
this case.
When the monotonic modifier is specified then each thread executes the
chunks that it is assigned in increasing logical iteration order.
For a team of p threads and a loop of n iterations, let n∕p be the
integer q that satisfies n = p * q - r, with 0 <= r < p. One compliant
implementation of the static schedule (with no specified chunk_size)
would behave as though chunk_size had been specified with value q.
Another compliant implementation would assign q iterations to the
first p - r threads, and q - 1 iterations to the remaining r threads.
This illustrates why a conforming program must not rely on the details
of a particular implementation.

The default schedule is taken from def-sched-var and it is implementation defined, so if your program relies on it, define it explicitly:

schedule(monotonic:static,chunk_size)

In this case it is clearly defined how your program behaves and does not depend on the implementation at all. Note also that your code should not depend on the number of threads, because OpenMP does not guarantee that your parallel region gets all the requested/available threads. Note also that the monotonic modifier is the default in the case of static schedule, so you do not have to state it explicitly.

So, if the above mentioned 'approximate' chunk size or the exact number of threads is not an issue in your case, your code should be

#pragma omp parallel for schedule(static) num_threads(NTHREADS)

On the other hand, if you need a control on chunk_size and/or on the exact number of threads used, you should use something like

#pragma omp parallel num_threads(NTHREADS)
{
  #pragma omp single
  {
    int nthreads=omp_get_num_threads();
      //calculate chunk_size based on nthreads
      chunk_size=.....
  }

  #pragma omp for schedule(static,chunk_size) 
  for(...)
}

回复收藏 0 原文

~没有更多了~