标量代码和并行代码之间的不同行为
我想知道为什么以下代码在其标量和并行变体中产生不同的结果:
#define N 10
double P[N][N];
// zero the matrix just to be sure...
for (int i=0; i<N; i++)
for(int j=0; j<N; j++)
P[i][j]=0.0;
double xmin=-5.0,ymin=-5.0,xmax=5.0,ymax=5.0;
double x=xmin,y=ymin;
double step= abs(xmax-xmin)/(double)(N - 1 );
for (int i=0; i<N; i++)
{
#pragma omp parallel for ordered schedule(dynamic)
for ( int j=0; j<N; j++)
{
x = i*step+xmin;
y = j*step+ymin;
P[i][j]=x+y;
}
}
此代码在其两个版本中产生不完全相等的结果(标量版本仅注释了 #pragma ...
部分出去)。 我注意到并行版本中 P[i][j]
的元素中有一小部分与标量版本不同,但我想知道为什么...... 按照建议
将 #pragma
放在外循环上是一团糟......完全错误的结果。
聚苯乙烯 g++-4.4、英特尔 i7、Linux
I'm wondering why the following code produces different results in its scalar and parallel variants:
#define N 10
double P[N][N];
// zero the matrix just to be sure...
for (int i=0; i<N; i++)
for(int j=0; j<N; j++)
P[i][j]=0.0;
double xmin=-5.0,ymin=-5.0,xmax=5.0,ymax=5.0;
double x=xmin,y=ymin;
double step= abs(xmax-xmin)/(double)(N - 1 );
for (int i=0; i<N; i++)
{
#pragma omp parallel for ordered schedule(dynamic)
for ( int j=0; j<N; j++)
{
x = i*step+xmin;
y = j*step+ymin;
P[i][j]=x+y;
}
}
This code produces not completely equal results in its two version (the scalar version has just the #pragma ...
part commented out).
What I've noticed is that a very small percentual of the elements of P[i][j]
in the parallel version are different from those of the scalar version, but I'm wondering why...
Putting the #pragma
on the outer loop as suggested is mess...completely wrong results.
P.S.
g++-4.4, intel i7, linux
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
啊,现在我明白问题所在了。您对最后一个问题的评论没有足够的上下文让我看到它。但现在很清楚了。
问题在于:
x
和y
是在并行区域之外声明的,因此它们在所有线程之间共享。 (因此所有线程之间会出现令人讨厌的竞争条件...)要修复它,请将它们设置为本地:
通过此修复,您应该能够将
#pragma
放在外部循环上,而不是内循环。Ah, now I can see the problem. Your comment on the last question didn't have enough context for me to see it. But now it's clear.
The problem is here:
x
andy
are declared outside the parallel region, so they are being shared among all the threads. (and thus a nasty race condition among all the threads...)To fix it, make them local:
With this fix, you should be able to put the
#pragma
on the outer loop instead of the inner loop.