OpenMP 动态循环分解块

发布于 2024-10-31 11:42:29 字数 268 浏览 4 评论 0原文

我正在使用 OpenMP 并行执行一个大循环。假设我正在处理的数组总共有 N 个条目。我希望一个线程执行前 N/2 个条目,另一个线程执行最后 N/2 个条目。

我必须避免线程在彼此相邻的条目上工作。大小 N 总是比线程数大得多,因此如果我能让 OpenMP 按照我上面概述的方式分配工作,我就不需要担心锁。

如果大小 N 在编译时已知,我可以使用#pragma omp parallel for Schedule(static,N/2)。不幸的是事实并非如此。那么,如何动态定义块大小呢?

I am using OpenMP to go through a large loop in parallel. Let's say the array I'm working on has N entries in total. I would like one thread to do the first N/2 entries and the other thread the last N/2.

I have to avoid that the threads work on entries that are next to each other. The size N is always much bigger than the number of threads, so I don't need to worry about locks if I can get OpenMP to distribute the work the way I outlined above.

If the size N is known at compiletime, I can use #pragma omp parallel for schedule(static,N/2). Unfortunately it isn't. So, how do I define the chunk size dynamically?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

鸵鸟症 2024-11-07 11:42:29

只要运行时知道 N 就没有问题;我不知道为什么你认为它必须在编译时知道。如果一切都必须在编译时知道的话,OMP 循环结构的用途确实非常有限。

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

int main(int argc, char **argv) {
    int n;
    int chunksize;

    if (argc != 2) {
        fprintf(stderr,"Usage: %s n, where n = number of iterations.\n", argv[0]);
        exit(-1);
    }
    n = atoi(argv[1]);
    if (n<1 || n>200) n = 10;

    chunksize = n/2;

    #pragma omp parallel num_threads(2) default(none) shared(n,chunksize)
    {
        int nthread = omp_get_thread_num();
        #pragma omp for schedule(static,chunksize) 
        for (int i=0; i<n; i++) {
            printf("Iter %d being done by thread %d\n", i, nthread);
        }
    }

    return 0;
}

它的运行非常简单,如下所示:

$ gcc -v
[...]
gcc version 4.4.0 (GCC) 

$ gcc -o loop loop.c -fopenmp

$ ./loop 10
Iter 5 being done by thread 1
Iter 6 being done by thread 1
Iter 7 being done by thread 1
Iter 8 being done by thread 1
Iter 9 being done by thread 1
Iter 0 being done by thread 0
Iter 1 being done by thread 0
Iter 2 being done by thread 0
Iter 3 being done by thread 0
Iter 4 being done by thread 0

There's no problem as long as N is known at runtime; I'm not sure why you think it has to be known at compile time. OMP loop constructs would be of very limited use indeed if everything had to be known at compile time.

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

int main(int argc, char **argv) {
    int n;
    int chunksize;

    if (argc != 2) {
        fprintf(stderr,"Usage: %s n, where n = number of iterations.\n", argv[0]);
        exit(-1);
    }
    n = atoi(argv[1]);
    if (n<1 || n>200) n = 10;

    chunksize = n/2;

    #pragma omp parallel num_threads(2) default(none) shared(n,chunksize)
    {
        int nthread = omp_get_thread_num();
        #pragma omp for schedule(static,chunksize) 
        for (int i=0; i<n; i++) {
            printf("Iter %d being done by thread %d\n", i, nthread);
        }
    }

    return 0;
}

And it runs simply enough, as so:

$ gcc -v
[...]
gcc version 4.4.0 (GCC) 

$ gcc -o loop loop.c -fopenmp

$ ./loop 10
Iter 5 being done by thread 1
Iter 6 being done by thread 1
Iter 7 being done by thread 1
Iter 8 being done by thread 1
Iter 9 being done by thread 1
Iter 0 being done by thread 0
Iter 1 being done by thread 0
Iter 2 being done by thread 0
Iter 3 being done by thread 0
Iter 4 being done by thread 0
暖树树初阳… 2024-11-07 11:42:29

如果您不想使用内置的 openmp 调度选项 @Jonathan Dursi 的答案表明您可以自己实现所需的选项:

#include <stdio.h>
#include <omp.h>
/* $ gcc -O3 -fopenmp -Wall *.c && ./a.out  */

static void doloop(int n) {
  int thread_num, num_threads, start, end, i;
#pragma omp parallel private(i,thread_num,num_threads,start,end)
  {
    thread_num = omp_get_thread_num();
    num_threads = omp_get_num_threads();
    start = thread_num * n / num_threads;
    end = (thread_num + 1) * n / num_threads;

    for (i = start; i != end; ++i) {
      printf("%d %d\n", thread_num, i);
    }
  }
}

int main() {
  omp_set_num_threads(2);
  doloop(10);
  return 0;
}

输出

0 0
0 1
0 2
0 3
0 4
1 5
1 6
1 7
1 8
1 9

If you don't want to use builtin openmp scheduling options as @Jonathan Dursi's answer shows then you could implement required options yourself:

#include <stdio.h>
#include <omp.h>
/* $ gcc -O3 -fopenmp -Wall *.c && ./a.out  */

static void doloop(int n) {
  int thread_num, num_threads, start, end, i;
#pragma omp parallel private(i,thread_num,num_threads,start,end)
  {
    thread_num = omp_get_thread_num();
    num_threads = omp_get_num_threads();
    start = thread_num * n / num_threads;
    end = (thread_num + 1) * n / num_threads;

    for (i = start; i != end; ++i) {
      printf("%d %d\n", thread_num, i);
    }
  }
}

int main() {
  omp_set_num_threads(2);
  doloop(10);
  return 0;
}

Output

0 0
0 1
0 2
0 3
0 4
1 5
1 6
1 7
1 8
1 9
夏日落 2024-11-07 11:42:29

我在 dotNET 上遇到了类似的问题,最终编写了一个智能队列对象,一旦对象可用,它就会一次返回十几个对象。一旦我手头有一个包,我就会决定一个可以一次性处理所有包的线程。

在解决这个问题时,我牢记 W 队列比 M 队列更好。与多个工人一起排长队比为每个工人排一条队要好。

I had a similar problem on dotNET, and ended up writing a smart queue object that would return a dozen objects at a time, once they are available. Once I have a package in hand, I'd decide on a thread that can process all of them in one go.

When working on this problem, I kept in mind that W-queues are better than M-queues. It's better to have one long line with multiple workers, than to have a line for each worker.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文