OpenMP:让线程执行for for colop

发布于 2025-02-04 03:50:05 字数 362 浏览 4 评论 0原文

我想运行以下内容:

for (int index = 0; index < num; index++)

我想用四个线程运行for循环,并按顺序执行线程:0,1,2,3,4,5,6,7,8, ETC... 也就是说,要使线程在index = n,(n+1),(n+2),(n+3)(在任何特定的有序中,但始终以这种模式),我希望index = 0,1,2,...(n-1)的迭代已经完成。 有办法这样做吗?在这里订购并不能真正起作用,因为使身体成为订购的部分基本上会为我删除所有并行性,而安排似乎不起作用,因为我不希望线程在线程上工作k-&gt; k+index /4。 感谢您的帮助!

I'd like to run something like the following:

for (int index = 0; index < num; index++)

I'd want to run the for loop with four threads, with the threads executing in the order: 0,1,2,3,4,5,6,7,8, etc...
That is, for the threads to be working on index =n,(n+1),(n+2),(n+3) (in any particular ordering but always in this pattern), I want iterations of index = 0,1,2,...(n-1) to already be finished.
Is there a way to do this? Ordered doesn't really work here as making the body an ordered section would basically remove all parallelism for me, and scheduling doesn't seem to work because I don't want a thread to be working on threads k->k+index/4.
Thanks for any help!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

睫毛溺水了 2025-02-11 03:50:05

您可以使用它来做到这一点,而不是循环的平行线,而是一个平行的区域,该区域在内部管理自己的循环,以及一个障碍,以确保所有跑步线程在能够继续之前都达到了相同的点。示例:

#include <stdatomic.h>
#include <stdio.h>
#include <omp.h>

int main()
{
  atomic_int chunk = 0;
  int num = 12;
  int nthreads = 4;
  
  omp_set_num_threads(nthreads);
  
#pragma omp parallel shared(chunk, num, nthreads)
  {
    for (int index; (index = atomic_fetch_add(&chunk, 1)) < num; ) {
      printf("In index %d\n", index);
      fflush(stdout);
#pragma omp barrier

      // For illustrative purposes only; not needed in real code
#pragma omp single
      {
        puts("After barrier");
        fflush(stdout);
      }
    }
  }

  puts("Done");
  return 0;
}

一个可能的输出:

$ gcc -std=c11 -O -fopenmp -Wall -Wextra demo.c
$ ./a.out
In index 2
In index 3
In index 1
In index 0
After barrier
In index 4
In index 6
In index 5
In index 7
After barrier
In index 10
In index 9
In index 8
In index 11
After barrier
Done

You can do this with, not a parallel for loop, but a parallel region that manages its own loop inside, plus a barrier to make sure all running threads have hit the same point in it before being able to continue. Example:

#include <stdatomic.h>
#include <stdio.h>
#include <omp.h>

int main()
{
  atomic_int chunk = 0;
  int num = 12;
  int nthreads = 4;
  
  omp_set_num_threads(nthreads);
  
#pragma omp parallel shared(chunk, num, nthreads)
  {
    for (int index; (index = atomic_fetch_add(&chunk, 1)) < num; ) {
      printf("In index %d\n", index);
      fflush(stdout);
#pragma omp barrier

      // For illustrative purposes only; not needed in real code
#pragma omp single
      {
        puts("After barrier");
        fflush(stdout);
      }
    }
  }

  puts("Done");
  return 0;
}

One possible output:

$ gcc -std=c11 -O -fopenmp -Wall -Wextra demo.c
$ ./a.out
In index 2
In index 3
In index 1
In index 0
After barrier
In index 4
In index 6
In index 5
In index 7
After barrier
In index 10
In index 9
In index 8
In index 11
After barrier
Done
左耳近心 2025-02-11 03:50:05

我不确定我正确理解您的请求。如果我尝试总结一下我的解释方式,那将是:“我希望4个线程共享一个循环的迭代,并且总是在循环的连续4个迭代中最多运行4个线程”。

如果那是您想要的,那么这样的事情:

int nths = 4;
#pragma omp parallel num_thread( nths )
for( int index_outer = 0; index_outer < num; index_outer += nths ) {
    int end = min( index_outer + nths, num );
    #pragma omp for
    for( int index = index_outer; index < end; index++ ) {
        // the loop body just as before
    } // there's a thread synchronization here
}

I'm not sure I understand your request correctly. If I try to summarize how I interpret it, that would be something like: "I want 4 threads sharing the iterations of a loop, with always the 4 threads running at most on 4 consecutive iterations of the loop".

If that's what you want, what about something like this:

int nths = 4;
#pragma omp parallel num_thread( nths )
for( int index_outer = 0; index_outer < num; index_outer += nths ) {
    int end = min( index_outer + nths, num );
    #pragma omp for
    for( int index = index_outer; index < end; index++ ) {
        // the loop body just as before
    } // there's a thread synchronization here
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文