OpenMP 线程“不服从”规则奥普屏障
这是代码:
#pragma omp parallel private (myId)
{
set_affinity();
myId = omp_get_thread_num();
if (myId<myConstant)
{
#pragma omp for schedule(static,1)
for(count = 0; count < AnotherConstant; count++)
{
//Do stuff, everything runs as it should
}
}
#pragma omp barrier //all threads wait as they should
#pragma omp single
{
//everything in here is executed by one thread as it should be
}
#pragma omp barrier //this is the barrier in which threads run ahead
par_time(cc_time_tot, phi_time_tot, psi_time_tot);
#pragma omp barrier
}
//do more stuff
现在解释一下发生了什么。在并行区域开始时,myId 设置为私有,以便每个线程都有其正确的线程 ID。 set_affinity() 控制哪个线程在哪个核心上运行。我遇到的问题涉及 #pragma omp for Schedule(static,1)。
块:
if (myId<myConstant)
{
#pragma omp for schedule(static,1)
for(count = 0; count < AnotherConstant; count++)
{
//Do stuff, everything runs as it should
}
}
代表我想要在一定数量的线程(0 到 myConstant-1)上分发的一些工作。在这些线程上,我想均匀地(以 Schedule(static,1) 的方式)分配循环的迭代。这一切都执行正确。
然后代码进入一个区域,其中的所有命令都按其应有的方式执行。但是假设我将 myConstant 指定为 2。那么,如果我使用 3 个或更多线程运行,则单个材质中的所有内容都会正确执行,但 id 为 3 或更大的线程不会等待单个材质中的所有命令完成。
在单个线程中调用一些函数来创建由所有线程执行的任务。 id 为 3 或更大(通常为 myConstant 或更大)的线程继续执行 par_time(),而其他线程仍在执行由 single 中执行的函数创建的任务。 par_time() 只是为每个线程打印一些数据。
如果我注释掉 pragma omp for Schedule(static,1) 并且只有一个线程执行 for 循环(例如将 if 语句更改为 if(myId==0)),那么一切正常。所以我只是不确定为什么上述线程会继续下去。
如果有什么令人困惑的地方请告诉我,这是一个特定的问题。我一直在寻找是否有人发现我的 OMP 流量控制存在缺陷。
So here's the code:
#pragma omp parallel private (myId)
{
set_affinity();
myId = omp_get_thread_num();
if (myId<myConstant)
{
#pragma omp for schedule(static,1)
for(count = 0; count < AnotherConstant; count++)
{
//Do stuff, everything runs as it should
}
}
#pragma omp barrier //all threads wait as they should
#pragma omp single
{
//everything in here is executed by one thread as it should be
}
#pragma omp barrier //this is the barrier in which threads run ahead
par_time(cc_time_tot, phi_time_tot, psi_time_tot);
#pragma omp barrier
}
//do more stuff
Now to explain whats going on. At the start of my parallel region myId is set to private so that every thread has its correct thread id. set_affinity() controls which thread runs on which core. The issue I have involves the #pragma omp for schedule(static,1).
the block:
if (myId<myConstant)
{
#pragma omp for schedule(static,1)
for(count = 0; count < AnotherConstant; count++)
{
//Do stuff, everything runs as it should
}
}
Represents some work that I want to distribute over a certain number of threads, 0 through myConstant-1. On these threads I want to evenly (in the manner which schedule(static,1) does) distribute the iterations of the loop. This is all performed correctly.
Then the code enters a single region, all commands in there are performed as they should be. But say I specify myConstant to be 2. Then if I run with 3 threads or more, everything through the single material executes correctly, but threads with id 3 or greater do not wait for all the commands within the single to finish.
Within the single some functions are called that create tasks which are carried out by all threads. The threads with id of 3 or more (in general of myConstant or more) continue on, executing par_time() while the other threads are still carrying out tasks created by the functions executed in the single. par_time() just prints some data for each thread.
If I comment out the pragma omp for schedule(static,1) and just have a single thread execute the for loop (change if statement to if(myId==0) for instance), then everything works. So I'm just not sure why the aforementioned threads are continuing onwards.
Let me know if anything is confusing, it's kind of a specific issue. I was looking so see if anyone saw a flaw in my flow control with OMP.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您查看 OpenMP V3.0 规范,第 2.5 节“工作共享构造”会指出:
通过在 if 内进行工作共享,您违反了这两个限制,从而使您的程序不合格。根据规范,不合格的 OpenMP 程序具有“未指定”行为。
至于哪些线程将用于执行 for 循环,当调度类型为“static,1”时,第一个工作块(在本例中 count=0)将分配给线程 0。下一个块(count= 1)将被分配给线程1等,直到所有的chunk都分配完毕。如果块数多于线程数,则分配将以循环方式在线程 0 处重新启动。您可以阅读 OpenMP 规范第 2.5.1 节“循环构造”中有关 Schedule 子句的说明中的确切措辞。
If you look at the OpenMP V3.0 spec, section 2.5 Worksharing Constructs, states:
By having the the worksharing for within the if, you have violated both of these restrictions making your program non-conforming. A non-conforming OpenMP program has "unspecified" behavior according to the specification.
As to which threads will be used to execute the for loop, with the schedule type of "static,1", the first chunk of work - in this case count=0 - will be assigned to thread 0. The next chunk (count=1) will be assigned to thread 1, etc. until all chunks are assigned. If there are more chunks than threads then assignment will restart at thread 0 in a round-robin fashion. You can read the exact wording in the OpenMP spec, section 2.5.1 Loop construct, under description where it talks about the schedule clause.