OpenMP:同一编译指示上的 nowait 和归约子句
我正在研究OpenMP,并遇到以下示例:
#pragma omp parallel shared(n,a,b,c,d,sum) private(i)
{
#pragma omp for nowait
for (i=0; i<n; i++)
a[i] += b[i];
#pragma omp for nowait
for (i=0; i<n; i++)
c[i] += d[i];
#pragma omp barrier
#pragma omp for nowait reduction(+:sum)
for (i=0; i<n; i++)
sum += a[i] + c[i];
} /*-- End of parallel region --*/
在最后一个for循环中,有一个nowait和一个reduction子句。这是正确的吗?缩减条款不需要同步吗?
I am studying OpenMP, and came across the following example:
#pragma omp parallel shared(n,a,b,c,d,sum) private(i)
{
#pragma omp for nowait
for (i=0; i<n; i++)
a[i] += b[i];
#pragma omp for nowait
for (i=0; i<n; i++)
c[i] += d[i];
#pragma omp barrier
#pragma omp for nowait reduction(+:sum)
for (i=0; i<n; i++)
sum += a[i] + c[i];
} /*-- End of parallel region --*/
In the last for loop, there is a nowait and a reduction clause. Is this correct? Doesn't the reduction clause need to be syncronized?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
第二个和最后一个循环中的 nowait 有点多余。 OpenMP 规范在区域末尾之前提到了
nowait
,因此也许可以保留该区域。但是第二个循环之前的
nowait
和它之后的显式屏障相互抵消。最后,关于
shared
和private
子句。在您的代码中,shared
不起作用,并且根本不应该使用private
:如果您需要线程私有变量,只需在内部声明它即可 平行区域。特别是,您应该在循环内部而不是之前声明循环变量。为了使
shared
有用,您需要告诉 OpenMP 默认情况下不应共享任何内容。您应该这样做以避免由于意外共享变量而导致的错误。这是通过指定default(none)
来完成的。这给我们留下了:The
nowait
s in the second and last loop are somewhat redundant. The OpenMP spec mentionsnowait
before the end of the region so perhaps this can stay in.But the
nowait
before the second loop and the explicit barrier after it cancel each other out.Lastly, about the
shared
andprivate
clauses. In your code,shared
has no effect, andprivate
simply shouldn’t be used at all: If you need a thread-private variable, just declare it inside the parallel region. In particular, you should declare loop variables inside the loop, not before.To make
shared
useful, you need to tell OpenMP that it shouldn’t share anything by default. You should do this to avoid bugs due to accidentally shared variables. This is done by specifyingdefault(none)
. This leaves us with:在某些方面,这似乎是一个家庭作业问题,我讨厌为人们做这件事。另一方面,上面的答案并不完全准确,我觉得应该更正。
首先,虽然在此示例中不需要共享子句和私有子句,但我不同意康拉德认为不应使用它们的观点。人们并行化代码最常见的问题之一是他们没有花时间去理解变量是如何使用的。没有私有化和/或保护共享变量,这是我看到的最大数量的问题。通过练习检查如何使用变量并将它们放入适当的共享、私有等子句中将大大减少您遇到的问题数量。
至于有关障碍的问题,第一个循环可以有 nowait 子句,因为第二个循环中没有使用计算值 (a)。仅当计算值 (c) 在计算值之前未使用时(即,不存在依赖性),第二个循环才可以具有 nowait 子句。在原始示例代码中,第二个循环上有一个 nowait,但在第三个循环之前有一个显式屏障。这很好,因为您的教授试图展示显式屏障的使用 - 尽管在第二个循环中省略 nowait 会使显式屏障变得多余(因为循环末尾有一个隐式屏障)。
另一方面,第二个循环上的 nowait 和显式屏障可能根本不需要。在 OpenMP V3.0 规范之前,许多人认为规范中未明确说明的事情是正确的。在 OpenMP V3.0 规范中,以下内容已添加到第 2.5.1 节循环构造,表 2-1 调度子句 kind 值,静态(调度):
现在,在您的示例中,任何循环上都没有显示时间表,因此这可能成立,也可能不成立。原因是,默认计划是实现定义的,虽然大多数实现当前将默认计划定义为静态,但不能保证这一点。如果您的教授在所有三个循环上都设置了没有 chunk-size 的静态调度类型,那么 nowait 可以在第一个和第二个循环上使用,并且不会出现任何障碍(无论是隐式的还是显式的)在第二个和第三个循环之间根本需要。
现在我们进入第三个循环以及您关于 nowait 和归约的问题。正如 Michy 指出的,OpenMP 规范允许同时指定(reduction 和 nowait)。然而,并不需要同步就能完成缩减。在示例中,可以使用 nowait 删除隐式屏障(在第三个循环结束时)。这是因为在遇到并行区域的隐式障碍之前没有使用归约(求和)。
如果您查看 OpenMP V3.0 规范的第 2.9.3.6 节缩减条款,您会发现以下内容:
这意味着,如果您想在第三次循环之后在并行区域中使用 sum 变量,那么在使用它之前您将需要一个屏障(隐式或显式)。从现在的例子来看,这是正确的。
In some regards this seems like a homework problem, which I hate to do for people. On the other hand, the answers above are not totally accurate and I feel should be corrected.
First, while in this example both the shared and private clauses are not needed, I disagree with Konrad that they shouldn't be used. One of the most common problems with people parallelizing code, is that they don't take the time to understand how the variables are being used. Not privatizing and/or protecting shared variables that should be, accounts for the largest number of problems that I see. Going through the exercise of examining how variables are used and putting them into the appropriate shared, private, etc. clauses will greatly reduce the number of problems you have.
As for the question about the barriers, the first loop can have a nowait clause, because there is no use of the value computed (a) in the second loop. The second loop can have a nowait clause only if the value computed (c) is not used before the values are calculated (i.e., there is no dependency). In the original example code there is a nowait on the second loop, but an explicit barrier before the third loop. This is fine, since your professor was trying to show the use of an explicit barrier - though leaving off the nowait on the second loop would make the explicit barrier redundant (since there is an implicit barrier at the end of a loop).
On the other hand, the nowait on the second loop and the explicit barrier may not be needed at all. Prior to the OpenMP V3.0 specification, many people assumed that something was true that was not clarified in the specification. With the OpenMP V3.0 specification the following was added to section 2.5.1 Loop Construct, Table 2-1 schedule clause kind values, static (schedule):
Now in your example, no schedule was shown on any of the loops, so this may or may not hold. The reason is, that the default schedule is implementation defined and while most implementations currently define the default schedule to be static, there is no guarantee of that. If your professor had put on a schedule type of static without a chunk-size on all three loops, then nowait could be used on the first and second loop and no barrier (either implicit or explicit) would be needed between the second and third loops at all.
Now we get to the third loop and your question about nowait and reduction. As Michy pointed out, the OpenMP specification allows both (reduction and nowait) to be specified. However, it is not true that no synchronization is needed for the reduction to be complete. In the example, the implicit barrier (at the end of the third loop) can be removed with the nowait. This is because the reduction (sum) is not being used before the implicit barrier of the parallel region has been encountered.
If you look at the OpenMP V3.0 specification, section 2.9.3.6 reduction clause, you will find the following:
This means that if you wanted to use the sum variable in the parallel region after the third loop, then you would need a barrier (either implicit or explicit) before you used it. As the example stands now, it is correct.
OpenMP 规范 说:
因此可以有更多子句,因此可以同时有reduction 和nowait 语句。
在
reduction
子句中不需要显式同步 - 由于reduction(+: sum)
和reduction(+: sum)
的作用,对sum
变量的添加是同步的先前的障碍迫使a
和b
在reduction
循环时具有最终值。 nowait 意味着如果线程完成循环中的工作,则不必等到所有其他线程完成同一循环。The OpenMP speficication says:
So there can be more clauses thus there can be both reduction and nowait statement.
There is no need of explicit synchronization in the
reduction
clause - the adding to thesum
variable is synchronized because ofreduction(+: sum)
and previous barrier forcesa
andb
having final values in the time ofreduction
loop. Thenowait
means that if the thread finishes the work in the loop, it does not have to wait until all other threads will finish the same loop.