openmp 并行中的浮点异常

发布于 2024-11-08 00:14:11 字数 892 浏览 3 评论 0原文

我的项目中的一个文件有一个 for 循环，我尝试使用 OpenMP for 并行化该循环。当我运行它时，我遇到了浮点异常。我无法在单独的测试程序中重现该错误，但是，我可以使用虚拟并行区域在同一个文件中重现该错误（原始 for 循环有一些详细的数组计算，因此是虚拟代码）：

#pragma omp parallel for
for(i=0; i<8; i++)
{
  puts("hello world");
}

我仍然得到相同的结果错误。这是 gdb 输出：

    Program received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0x7ffff4c44710 (LWP 18912)]
0x0000000000402fd4 in allocate_2D_matrix.omp_fn.0 (.omp_data_i=0x0) at main.c:119
119     #pragma omp parallel for

通过反复试验，我通过向 openmp 构造添加一个时间表解决了问题：

#pragma omp parallel for schedule(dynamic)
    for(i=0; i<8; i++)
    {
      puts("hello world");
    }

并且它工作得很好。我可以在 2 个不同的系统上复制整个行为（64 位 Linux Mint 上的 gcc 4.4.5 和 64 位 Opensuse 上的 gcc 4.5.0）。有人知道可能是什么原因造成的吗？我强烈怀疑它与我的程序有关，因为我无法单独重现错误，但我不知道在哪里查看。问题当然解决了，但我很好奇。如果需要，我可以在看到此行为的地方发布整个原始函数。

原文

One of the files in my project has a for loop that I tried to parallelize using OpenMP for. When I ran it, I got a floating point exception. I couldn't reproduce the error in a separate test program, however, I could reproduce it in the same file using a dummy parallel region (the original for loop had some detailed array computations, hence the dummy code):

#pragma omp parallel for
for(i=0; i<8; i++)
{
  puts("hello world");
}

I still got the same error. Heres the gdb output:

    Program received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0x7ffff4c44710 (LWP 18912)]
0x0000000000402fd4 in allocate_2D_matrix.omp_fn.0 (.omp_data_i=0x0) at main.c:119
119     #pragma omp parallel for

By trial-and-error, I solved the problem by adding a schedule to the openmp construct:

#pragma omp parallel for schedule(dynamic)
    for(i=0; i<8; i++)
    {
      puts("hello world");
    }

and it worked just fine. I could replicate this entire behaviour on 2 different systems (gcc 4.4.5 on 64 bit Linux Mint and gcc 4.5.0 on 64 bit Opensuse).
Would anyone have any ideas as to what might have caused it? I strongly suspect it is related to my program, since I couldn't reproduce the error separately, but I dont know where to look at. The problem is solved of course, but I am curious. If need be, I can post the entire original function where I see this behaviour.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦里兽 2024-11-15 00:14:11

很可能 put 不是线程安全的。把它放在关键部分，看看会发生什么。

回复收藏 0 原文

善良天后 2024-11-15 00:14:11

我遇到了同样的问题，当使用无符号整数作为循环迭代变量时似乎会发生这种情况，这是一个存在问题和修复的示例：

/* the following code was generating a FPE: */

unsigned int m = A->m ;
unsigned int i,ij ;
NLCoeff* c = NULL ;
NLRowColumn* Ri = NULL;

#pragma omp parallel for private(i,ij,c,Ri) 
for(i=0; i<m; i++) {
    Ri = &(A->row[i]) ;       
    y[i] = 0 ;
    for(ij=0; ij<Ri->size; ij++) {
        c = &(Ri->coeff[ij]) ;
        y[i] += c->value * x[c->index] ;
    }
}

/* and this one does not: */

int m = (int)(A->m) ;
int i,ij ;
NLCoeff* c = NULL ;
NLRowColumn* Ri = NULL;

#pragma omp parallel for private(i,ij,c,Ri) 
for(i=0; i<m; i++) {
    Ri = &(A->row[i]) ;       
    y[i] = 0 ;
    for(ij=0; ij<(int)(Ri->size); ij++) {
        c = &(Ri->coeff[ij]) ;
        y[i] += c->value * x[c->index] ;
    }
}

I had the same issue, it seems to happen when using unsigned ints as loop iteration variables, here is an example that has the problem and the fix:

/* the following code was generating a FPE: */

unsigned int m = A->m ;
unsigned int i,ij ;
NLCoeff* c = NULL ;
NLRowColumn* Ri = NULL;

#pragma omp parallel for private(i,ij,c,Ri) 
for(i=0; i<m; i++) {
    Ri = &(A->row[i]) ;       
    y[i] = 0 ;
    for(ij=0; ij<Ri->size; ij++) {
        c = &(Ri->coeff[ij]) ;
        y[i] += c->value * x[c->index] ;
    }
}

/* and this one does not: */

int m = (int)(A->m) ;
int i,ij ;
NLCoeff* c = NULL ;
NLRowColumn* Ri = NULL;

#pragma omp parallel for private(i,ij,c,Ri) 
for(i=0; i<m; i++) {
    Ri = &(A->row[i]) ;       
    y[i] = 0 ;
    for(ij=0; ij<(int)(Ri->size); ij++) {
        c = &(Ri->coeff[ij]) ;
        y[i] += c->value * x[c->index] ;
    }
}

回复收藏 0 原文

~没有更多了~