避免 OpenMP 中的竞争（在并行 for 循环中）

发布于 2024-09-27 23:53:17 字数 387 浏览 7 评论 0原文

我正在用 C 编写一个 OpenMP 程序。我有这个共享数组“数据”，它正在由所有线程更新。我想确保每个线程都已完成读取部分并将值存储在 temp 中，然后再执行下一条语句 data[j] = temp 。

我尝试在两个语句之间放置 #pragma omp Barrier，但编译器会抛出错误。请帮忙。

#pragma omp parallel for shared(data)

for (j = 0; j < numints; j++){

     if (j >= max_j)

     {

              temp = data[j] + data[j - max_j];
             data[j] = temp; 
     }

}

原文

I am writing an OpenMP program in C. I have this shared array "data" which is being updated by all threads. i want to ensure that every thread has completed the reading part and stored the value in temp before the next statement data[j] = temp is executed.

I tried putting #pragma omp barrier between the two statements but the compiler throws an error. Please help.

#pragma omp parallel for shared(data)

for (j = 0; j < numints; j++){

     if (j >= max_j)

     {

              temp = data[j] + data[j - max_j];
             data[j] = temp; 
     }

}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

听风念你 2024-10-04 23:53:17

正如您所见，barrier 不起作用；对于这个特定的操作来说，“关键”是相当重量级的。 Atomic 比 Critical 更轻；你总是可以这样做

if (j >= max_j)
{
    #pragma omp atomic
    data[j] += data[j-max_j]; 
}

，但你应该始终警惕在循环中使用任何此类构造（原子的、关键的）——它会降低性能，因为它会扼杀并行性（也就是说，毕竟，它们的全部目的）。

这将有助于了解您试图用这段代码完成什么，因为即使消除了更新中的数据竞争，（例如）data[maxints-1] 中的最终结果将取决于 data[ 的顺序maxints-1-max_j],data[maxints-1-2*max_j].. 已更新，OpenMP 并行处理明确不保证这一点。（您可以使用有序构造，但这仅仅比根本不使用并行更好）。

如果 maxints 2*max_j，那么这很容易；你可以这样做

#pragma omp parallel for shared(data)
for (j = max_j; j < numints; j++){
    data[j] += data[j-max_j];
}

，根本不需要任何同步，因为每个线程只更新一个数据[j]，并且没有一个依赖于任何其他线程。但我的印象是（a）事实并非如此，（b）这是一段较大代码的片段......

As you've seen, barrier won't work; critical is rather heavy-weight for this particular operation. Atomic is lighter weight than critical; you could always do

if (j >= max_j)
{
    #pragma omp atomic
    data[j] += data[j-max_j]; 
}

but you should always be wary of having any such construct (atomic, critical) inside a loop -- it kills performance, because it kills parallelism (that is, after all, their entire purpose).

It would help to know what you're trying to accomplish with this bit of code, because even once the data races in the updates are eliminated, the final result in (say) data[maxints-1] will depend on what order data[maxints-1-max_j],data[maxints-1-2*max_j].. were updated in, which is explicitly not guaranteed by OpenMPs parallel for. (You can use the ordered construct, but that's barely better than not using a parallel for at all).

If maxints < 2*max_j, then this is easy; you can just do

#pragma omp parallel for shared(data)
for (j = max_j; j < numints; j++){
    data[j] += data[j-max_j];
}

and you don't need any synchronization at all, because every thread is only updating one data[j] and none depend on any others. But I get the impression (a) that it isn't, and (b) this is a snippet of a larger piece of code...

回复收藏 0 原文