小型 OpenMP 程序有时会冻结(gcc、c、linux)
只需编写一个小的 omp 测试,它就不会始终正常工作:
#include <omp.h>
int main() {
int i,j=0;
#pragma omp parallel
for(i=0;i<1000;i++)
{
#pragma omp barrier
j+= j^i;
}
return j;
}
在本示例中,使用 j
从所有线程写入是不正确的,但是
必须只有不确定的值j
我冻结了。
使用 gcc-4.3.1 -fopenmp ac -o gcc -static
编译
在 4 核 x86_Core2 Linux 服务器上运行:$ ./gcc
并冻结(有时;例如 1冻结 4-5 次快速运行)。
Strace:
[pid 13118] futex(0x80d3014, FUTEX_WAKE, 1) = 1
[pid 13119] <... futex resumed> ) = 0
[pid 13118] futex(0x80d3020, FUTEX_WAIT, 251, NULL <unfinished ...>
[pid 13119] futex(0x80d3014, FUTEX_WAKE, 1) = 0
[pid 13119] futex(0x80d3020, FUTEX_WAIT, 251, NULL
<freeze>
为什么我会出现冻结(死锁)?
Just write a small omp test, and it does not work correctly all the times:
#include <omp.h>
int main() {
int i,j=0;
#pragma omp parallel
for(i=0;i<1000;i++)
{
#pragma omp barrier
j+= j^i;
}
return j;
}
The usage of j
for writing from all threads is incorrect in this example, BUT
there must be only nondeterministic value of j
I have a freeze.
Compiled with gcc-4.3.1 -fopenmp a.c -o gcc -static
Run on 4-core x86_Core2 Linux server: $ ./gcc
and got freeze (sometimes; like 1 freeze for 4-5 fast runs).
Strace:
[pid 13118] futex(0x80d3014, FUTEX_WAKE, 1) = 1
[pid 13119] <... futex resumed> ) = 0
[pid 13118] futex(0x80d3020, FUTEX_WAIT, 251, NULL <unfinished ...>
[pid 13119] futex(0x80d3014, FUTEX_WAKE, 1) = 0
[pid 13119] futex(0x80d3020, FUTEX_WAIT, 251, NULL
<freeze>
Why do I have a freeze (deadlock)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
尝试将 i 设为私有,以便每个循环都有它自己的副本。
现在我有更多时间,我会尝试解释。默认情况下,OpenMP 中的变量是共享的。在某些情况下,默认情况下会将变量设为私有。并行区域不是其中之一(因此 High Performance Mark 的响应是错误的)。在您的原始程序中,您有两个竞争条件 - 一个在 i 上,一个在 j 上。问题出在 i 上的那个。每个线程都会执行循环一定次数,但由于每个线程都会更改 i,因此任何线程执行循环的次数是不确定的。由于所有线程都必须执行屏障才能满足屏障,因此您会想到这样的情况:您将在屏障上挂起,而该屏障永远不会结束,因为并非所有线程都会执行相同的次数。
由于 OpenMP 规范明确指出(OMP 规范 V3.0,第 2.8.3 节屏障构造)“遇到的工作共享区域和屏障区域的顺序必须是
对于团队中的每个线程都是相同的”,您的程序不合规,因此可能具有不确定的行为。
Try making i private so each loop has it's own copy.
Now that I have more time, I will try and explain. By default variables in OpenMP are shared. There are a couple of cases where there are defaults that make variables private. Parallel regions is not one of them (so High Performance Mark's response is wrong). In your original program, you have two race conditions - one on i and one on j. The problem is with the one on i. Each thread will execute the loop some number of times, but since i is being changed by each thread, the number of times any thread executes the loop is indeterminate. Since all threads have to execute the barrrier for the barrier to be satisfied, you come up with the case where you will get a hang on the barrier which will never end, since not all threads will execute it the same number of times.
Since the OpenMP spec clearly states (OMP spec V3.0, section 2.8.3 barrier Construct) that "the sequence of worksharing regions and barrier regions encountered must be the
same for every thread in a team", your program is non-compliant and as such can have indeterminate behavior.
您正在尝试从多个线程添加到同一位置。你无法并行地做你想做的事情。如果你想并行求和,你需要将其分成更小的部分,然后收集它们。
a5b 更新:正确的想法,但发现了错误的代码部分。
i
变量由两个线程更改。You're trying to add to the same location from multiple threads. You can't do what you're trying to do in parallel. If you want to do a sum in parallel, you need to divide it into smaller pieces and collect them afterwards.
Update by a5b: right idea but wrong part of code was spotted. The
i
variable is changed by both threads.抱歉 - 我刚刚看到这个问题。从技术上讲,如果您将变量“i”标记为私有,您的程序将符合 OpenMP 标准。然而,“j”上仍然存在竞争条件,尽管您的程序符合要求(因为存在存在竞争条件的有效情况),但“j”的值未指定(根据 OpenMP 规范)。
在您之前的回答之一中,您说过您正在尝试衡量屏障实施的速度。您可能需要查看多个“基准”,它们已发布各种 OpenMP 构造的结果。其中一份由 Mark Bull(EPCC、爱丁堡大学)编写,另一份 (Sphinx) 来自劳伦斯利弗莫尔国家实验室 (LLNL),第三份 (Parkbench) 来自日本计算合作伙伴。他们可能会为您提供一些指导。
Sorry - I just saw this question. Technically if you mark variable "i" as private your program will be OpenMP compliant. HOWEVER, there is still a race condition on "j" and while your program is compliant (because there are valid cases to have race conditions), the value of "j" is unspecified (according to the OpenMP spec).
In one of your previous answers you said that you were trying to measure the speed of the barrier implementation. There are several "benchmarks" that you might want to look at that have published results for a variety of OpenMP constructs. One was written by Mark Bull (EPCC, University of Edinburgh), another (Sphinx) comes from Lawrence Livermore National Labs (LLNL), and the third (Parkbench) comes from a Japanese Computing Partnership. They may offer you some guidance.