openmp c parallel-processing nested-loops

多维嵌套 OpenMP 循环

发布于 2024-10-21 19:42:02 字数 801 浏览 7 评论 0原文

在 OpenMP 中并行化多维并行循环的正确方法是什么？维数在编译时已知，但哪些维会很大则未知。其中任何一个都可以是一、二或一百万。当然，我不希望 N 维循环有 N 个 omp 并行...

想法：

这个问题在概念上很简单。只有最外层的“大”循环需要并行化，但循环维度在编译时未知，并且可能会发生变化。
动态设置 omp_set_num_threads(1) 和 #pragma omp for Schedule(static, Huge_number) 会使某些循环并行化成为无操作吗？这会产生不良的副作用/开销吗？感觉就像是一个拼凑。
OpenMP 规范（2.10、A.38、A.39）讲述了符合标准之间的区别和不合格的嵌套并行性，但没有提出解决此问题的最佳方法。
重新排序循环是可能的，但可能会导致大量缓存未命中。展开是可能的，但并不简单。还有其他方法吗？

这是我想要并行化的内容：

for(i0=0; i0<n[0]; i0++) {
  for(i1=0; i1<n[1]; i1++) {
    ...
       for(iN=0; iN<n[N]; iN++) {
         <embarrasingly parallel operations>
       }
    ...
  }
}

谢谢！

原文

What is the proper way to parallelize a multi-dimensional embarrassingly parallel loop in OpenMP? The number of dimensions is known at compile-time, but which dimensions will be large is not. Any of them may be one, two, or a million. Surely I don't want N omp parallel's for an N-dimensional loop...

Thoughts:

The problem is conceptually simple. Only the outermost 'large' loop needs to be parallelized, but the loop dimensions are unknown at compile-time and may change.
Will dynamically setting omp_set_num_threads(1) and #pragma omp for schedule(static, huge_number) make certain loop parallelizations a no-op? Will this have undesired side-effects/overhead? Feels like a kludge.
The OpenMP Specification (2.10, A.38, A.39) tells the difference between conforming and non-conforming nested parallelism, but doesn't suggest the best approach to this problem.
Re-ordering the loops is possible but may result in a lot of cache-misses. Unrolling is possible but non-trivial. Is there another way?

Here's what I'd like to parallelize:

for(i0=0; i0<n[0]; i0++) {
  for(i1=0; i1<n[1]; i1++) {
    ...
       for(iN=0; iN<n[N]; iN++) {
         <embarrasingly parallel operations>
       }
    ...
  }
}

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦年海沫深 2024-10-28 19:42:02

collapse 指令可能就是您正在寻找的内容，如所述在这里。这本质上将形成一个循环，然后将其并行化，并且专门针对此类情况而设计。所以你会这样做：

#pragma omp parallel for collapse(N)
for(int i0=0; i0<n[0]; i0++) {
  for(int i1=0; i1<n[1]; i1++) {
    ...
       for(int iN=0; iN<n[N]; iN++) {
         <embarrasingly parallel operations>
       }
    ...
  }
}

一切准备就绪。

The collapse directive is probably what you're looking for, as described here. This will essentially form a single loop, which is then parallized, and is designed for exactly these sorts of situations. So you'd do:

#pragma omp parallel for collapse(N)
for(int i0=0; i0<n[0]; i0++) {
  for(int i1=0; i1<n[1]; i1++) {
    ...
       for(int iN=0; iN<n[N]; iN++) {
         <embarrasingly parallel operations>
       }
    ...
  }
}

and be all set.

回复收藏 0 原文

~没有更多了~