为什么编译器忽略 OpenMP 编译指示?
在以下 C 代码中,我在嵌套循环中使用 OpenMP。由于发生竞争条件,我想在最后执行原子操作:
double mysumallatomic() {
double S2 = 0.;
#pragma omp parallel for shared(S2)
for(int a=0; a<128; a++){
for(int b=0; b<128;b++){
double myterm = (double)a*b;
#pragma omp atomic
S2 += myterm;
}
}
return S2;
}
问题是 #pragma ompatomic
对程序行为没有影响,即使我删除它,也不会发生任何事情。即使我将其更改为 #pragma oh_my_god
,我也没有收到任何错误!
我想知道这里出了什么问题,我是否可以告诉编译器在检查 omp pragmas 时更加严格,或者为什么我在进行最后一次更改时没有收到错误
PS:对于我使用的编译:
gcc-4.2 -fopenmp main.c functions.c -o main_elec_gcc.exe
PS2:给我的新代码同样的问题,基于吉莱斯皮的想法:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <omp.h>
#include <math.h>
#define NRACK 64
#define NSTARS 1024
double mysumallatomic_serial(float rocks[NRACK][3], float moon[NSTARS][3],
float qr[NRACK],float ql[NSTARS]) {
int j,i;
float temp_div=0.,temp_sqrt=0.;
float difx,dify,difz;
float mod2x, mod2y, mod2z;
double S2 = 0.;
for(j=0; j<NRACK; j++){
for(i=0; i<NSTARS;i++){
difx=rocks[j][0]-moon[i][0];
dify=rocks[j][1]-moon[i][1];
difz=rocks[j][2]-moon[i][2];
mod2x=difx*difx;
mod2y=dify*dify;
mod2z=difz*difz;
temp_sqrt=sqrt(mod2x+mod2y+mod2z);
temp_div=1/temp_sqrt;
S2 += ql[i]*temp_div*qr[j];
}
}
return S2;
}
double mysumallatomic(float rocks[NRACK][3], float moon[NSTARS][3],
float qr[NRACK],float ql[NSTARS]) {
float temp_div=0.,temp_sqrt=0.;
float difx,dify,difz;
float mod2x, mod2y, mod2z;
double S2 = 0.;
#pragma omp parallel for shared(S2)
for(int j=0; j<NRACK; j++){
for(int i=0; i<NSTARS;i++){
difx=rocks[j][0]-moon[i][0];
dify=rocks[j][1]-moon[i][1];
difz=rocks[j][2]-moon[i][2];
mod2x=difx*difx;
mod2y=dify*dify;
mod2z=difz*difz;
temp_sqrt=sqrt(mod2x+mod2y+mod2z);
temp_div=1/temp_sqrt;
float myterm=ql[i]*temp_div*qr[j];
#pragma omp atomic
S2 += myterm;
}
}
return S2;
}
int main(int argc, char *argv[]) {
float rocks[NRACK][3], moon[NSTARS][3];
float qr[NRACK], ql[NSTARS];
int i,j;
for(j=0;j<NRACK;j++){
rocks[j][0]=j;
rocks[j][1]=j+1;
rocks[j][2]=j+2;
qr[j] = j*1e-4+1e-3;
//qr[j] = 1;
}
for(i=0;i<NSTARS;i++){
moon[i][0]=12000+i;
moon[i][1]=12000+i+1;
moon[i][2]=12000+i+2;
ql[i] = i*1e-3 +1e-2 ;
//ql[i] = 1 ;
}
printf(" serial: %f\n", mysumallatomic_serial(rocks,moon,qr,ql));
printf(" openmp: %f\n", mysumallatomic(rocks,moon,qr,ql));
return(0);
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
使用标志
-Wall
突出显示编译指示错误。例如,当我拼错atomic
时,我会收到以下警告。main.c:15: warning: ignoring #pragma ompatomic1
我相信你知道,但为了以防万一,你的示例应该用
reduction
来处理当您使用 ompparallel 时,默认情况下所有变量都将被共享。在你的情况下,这不是你想要的。例如,每个线程将具有不同的值
difx
。相反,你的循环应该是:Using the flag
-Wall
highlights pragma errors. For example, when I misspellatomic
I get the following warning.main.c:15: warning: ignoring #pragma omp atomic1
I'm sure you know, but just in case, your example should be handled with a
reduction
When you use omp parallel, the default is for all variables to be shared. This is not what you want in your case. For example, each thread will have a different value
difx
. Instead, your loop should be:我知道这是一篇旧帖子,但我认为问题是 gcc 的参数顺序, -fopenmp 应该位于编译行的末尾。
I know this is an old post, but I think the problem is the order of the parameters of gcc, -fopenmp should be at the end of the compilation line.
首先,根据实现的不同,缩减可能比使用原子更好。我会尝试两者并计时以确保确定。
其次,如果您忽略原子性,您可能会也可能看不到与竞争相关的问题(错误结果)。这一切都与时间有关,从一次跑步到下一次跑步可能会有很大不同。我见过 150,000 次运行中结果只有一次错误的情况,还有一些情况一直都是错误的。
第三,编译指示背后的想法是,如果它们没有效果,用户就不需要了解它们。除此之外,Unix(及其衍生版本)的哲学是,除非出现问题,否则它是安静的。话虽如此,许多实现都有某种标志,因此用户可以获得更多信息,因为他们不知道发生了什么。您可以尝试使用 gcc 的 -Wall ,至少它应该将 oh_my_god pragma 标记为被忽略。
First, depending on the implementation, reduction might be better than using atomic. I would try both and time them to see for sure.
Second, if you leave off the atomic, you may or may not see the problem (wrong result) associated with the race. It is all about timing, which from one run to the next can be quite different. I have seen cases where the result was wrong only once in 150,000 runs and others where it has been wrong all the time.
Third, the idea behind pragmas was that the user doesn't need to know about them if they don't have an effect. Besides that, the philosophy in Unix (and its derivatives) is that it is quiet unless there is a problem. Saying that, many implementations have some sort of flag so the user can get more information because they didn't know what was happening. You can try -Wall with gcc, and at least it should flag the oh_my_god pragma as being ignored.
所以
唯一的并行化将是 for 循环。
如果你想要原子或还原
你必须这样做
否则 # 之后的所有内容都会被评论
You have
So the only parallelization will be to the for loop.
If you want to have the atomic or reduction
you have to do
Otherwise everything after # will be comment