多维嵌套 OpenMP 循环

发布于 2024-10-21 19:42:02 字数 801 浏览 7 评论 0原文

在 OpenMP 中并行化多维并行循环的正确方法是什么?维数在编译时已知,但哪些维会很大则未知。其中任何一个都可以是一、二或一百万。当然,我不希望 N 维循环有 N 个 omp 并行...

想法:

  • 这个问题在概念上很简单。只有最外层的“大”循环需要并行化,但循环维度在编译时未知,并且可能会发生变化。

  • 动态设置 omp_set_num_threads(1)#pragma omp for Schedule(static, Huge_number) 会使某些循环并行化成为无操作吗?这会产生不良的副作用/开销吗?感觉就像是一个拼凑。

  • OpenMP 规范(2.10、A.38、A.39)讲述了符合标准之间的区别和不合格的嵌套并行性,但没有提出解决此问题的最佳方法。

  • 重新排序循环是可能的,但可能会导致大量缓存未命中。展开是可能的,但并不简单。还有其他方法吗?

这是我想要并行化的内容:

for(i0=0; i0<n[0]; i0++) {
  for(i1=0; i1<n[1]; i1++) {
    ...
       for(iN=0; iN<n[N]; iN++) {
         <embarrasingly parallel operations>
       }
    ...
  }
}

谢谢!

What is the proper way to parallelize a multi-dimensional embarrassingly parallel loop in OpenMP? The number of dimensions is known at compile-time, but which dimensions will be large is not. Any of them may be one, two, or a million. Surely I don't want N omp parallel's for an N-dimensional loop...

Thoughts:

  • The problem is conceptually simple. Only the outermost 'large' loop needs to be parallelized, but the loop dimensions are unknown at compile-time and may change.

  • Will dynamically setting omp_set_num_threads(1) and #pragma omp for schedule(static, huge_number) make certain loop parallelizations a no-op? Will this have undesired side-effects/overhead? Feels like a kludge.

  • The OpenMP Specification (2.10, A.38, A.39) tells the difference between conforming and non-conforming nested parallelism, but doesn't suggest the best approach to this problem.

  • Re-ordering the loops is possible but may result in a lot of cache-misses. Unrolling is possible but non-trivial. Is there another way?

Here's what I'd like to parallelize:

for(i0=0; i0<n[0]; i0++) {
  for(i1=0; i1<n[1]; i1++) {
    ...
       for(iN=0; iN<n[N]; iN++) {
         <embarrasingly parallel operations>
       }
    ...
  }
}

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

梦年海沫深 2024-10-28 19:42:02

collapse 指令可能就是您正在寻找的内容,如所述在这里。这本质上将形成一个循环,然后将其并行化,并且专门针对此类情况而设计。所以你会这样做:

#pragma omp parallel for collapse(N)
for(int i0=0; i0<n[0]; i0++) {
  for(int i1=0; i1<n[1]; i1++) {
    ...
       for(int iN=0; iN<n[N]; iN++) {
         <embarrasingly parallel operations>
       }
    ...
  }
}

一切准备就绪。

The collapse directive is probably what you're looking for, as described here. This will essentially form a single loop, which is then parallized, and is designed for exactly these sorts of situations. So you'd do:

#pragma omp parallel for collapse(N)
for(int i0=0; i0<n[0]; i0++) {
  for(int i1=0; i1<n[1]; i1++) {
    ...
       for(int iN=0; iN<n[N]; iN++) {
         <embarrasingly parallel operations>
       }
    ...
  }
}

and be all set.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文