当前位置：文江博客话题详情

atomic openmp critical-section

OpenMP 中的原子性和关键性有什么区别？

发布于 2024-12-10 16:12:50 字数 265 浏览 7 评论 0原文

OpenMP 中的原子性和关键性有什么区别？

我可以做到这一点

#pragma omp atomic
g_qCount++;

，但这不是一样吗

#pragma omp critical
g_qCount++;

？

What is the difference between atomic and critical in OpenMP?

I can do this

#pragma omp atomic
g_qCount++;

but isn't this same as

#pragma omp critical
g_qCount++;

?

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（9）

北斗星光 2024-12-17 16:12:50

对 g_qCount 的影响是相同的，但所做的事情不同。

OpenMP 临界区是完全通用的 - 它可以包围任何任意代码块。然而，每次线程进入和退出临界区时（除了序列化的固有成本之外），您都会为这种通用性付出代价。

（此外，在 OpenMP 中，所有未命名临界区都被认为是相同的（如果您愿意，所有未命名临界区只有一把锁），因此，如果一个线程位于上述一个 [未命名] 临界区中，则任何线程都不能进入任何一个临界区。 [未命名] 临界区正如您可能猜到的，您可以通过使用命名临界区来解决这个问题。

原子操作的开销要低得多。在可用的情况下，它利用提供（例如）原子增量操作的硬件；在这种情况下，进入/退出代码行时不需要锁定/解锁，它只是执行硬件告诉您不会受到干扰的原子增量。

优点是开销要低得多，并且处于原子操作中的一个线程不会阻止任何即将发生的（不同的）原子操作。缺点是原子支持的操作集有限。

当然，无论哪种情况，您都会承担序列化的成本。

回复收藏 0 原文

旧时浪漫 2024-12-17 16:12:50

在 OpenMP 中，所有未命名的关键部分都是互斥的。

关键和原子之间最重要的区别是原子只能保护单个赋值，并且您可以将其与特定运算符一起使用。

回复收藏 0 原文

没有伤那来痛 2024-12-17 16:12:50

关键部分：

确保代码块的序列化。
可以扩展到通过正确使用“名称”标签来序列化块组。
慢点！

原子操作：

更快！
仅确保特定操作的序列化。

回复收藏 0 原文

会发光的星星闪亮亮i 2024-12-17 16:12:50

最快的方法既不是关键的也不是原子的。大约，带有临界区的加法比简单加法贵200倍，原子加法比简单加法贵25倍。

最快的选项（并不总是适用）是为每个线程提供自己的计数器，并在需要总和时进行归约操作。

回复收藏 0 原文

假情假意假温柔 2024-12-17 16:12:50

atomic 的局限性很重要。它们应在 OpenMP 规范中详细说明。 MSDN 提供了一个快速备忘单，如下所示如果这不会改变，我不会感到惊讶。（Visual Studio 2012 从 2002 年 3 月开始就有 OpenMP 实现。）引用 MSDN 的话：

表达式语句必须采用以下形式之一：
xbinop=expr
x++
<代码>++x
x--
<代码>--x
在前面的表达式中：x 是标量类型的左值表达式。 expr是标量类型的表达式，它不引用x指定的对象。 binop 不是重载运算符，而是 +、*、-、/、&、^、|、<< 或 >> ;.

我建议尽可能使用atomic，否则命名关键部分。给它们命名很重要；这样你就可以避免令人头痛的调试问题。

回复收藏 0 原文

北笙凉宸 2024-12-17 16:12:50

这里已经有很好的解释了。然而，我们可以更深入一些。要理解 OpenMP 中原子和临界区概念之间的核心区别，我们必须首先理解锁的概念。让我们回顾一下为什么需要使用锁。锁。

并行程序正在由多个线程执行。当且仅当我们在这些线程之间执行同步时，才会出现确定性结果。当然，线程之间的同步并不总是需要的。我们指的是那些同步是必要的情况。

为了同步多线程程序中的线程，我们将使用锁。当需要一次仅由一个线程限制访问时，锁就发挥作用了。锁概念的实现可能因处理器而异。让我们从算法的角度了解一个简单的锁是如何工作的。

1. Define a variable called lock.
2. For each thread:
   2.1. Read the lock.
   2.2. If lock == 0, lock = 1 and goto 3    // Try to grab the lock
       Else goto 2.1    // Wait until the lock is released
3. Do something...
4. lock = 0    // Release the lock

给定的算法可以用硬件语言实现如下。我们将假设一个处理器并分析其中的锁的行为。对于此实践，我们假设使用以下处理器之一：MIPS、Alpha、ARM 或 Power。

try:    LW R1, lock
        BNEZ R1, try
        ADDI R1, R1, #1
        SW R1, lock

这个程序看似没问题，其实不然。上面的代码遇到了前面的问题；同步。我们来找出问题所在。假设lock的初始值为零。如果两个线程运行此代码，一个线程可能会在另一个线程读取 lock 变量之前到达 SW R1, lock。因此，他们都认为锁是免费的。
为了解决这个问题，提供了另一个指令，而不是简单的LW和SW。它称为“读取-修改-写入”指令。它是一个复杂的指令（由子指令组成），确保一次仅由一个单个线程完成锁获取过程。与简单的读取和写入指令相比，读取-修改-写入指令的区别在于它使用不同的加载方式。 /em> 和存储。它使用LL（加载链接）来加载锁变量，并使用SC（条件存储）来写入锁变量。附加的链接寄存器用于确保锁获取过程由单个线程完成。算法如下。

1. Define a variable called lock.
2. For each thread:
   2.1. Read the lock and put the address of lock variable inside the Link Register.
   2.2. If (lock == 0) and (&lock == Link Register), lock = 1 and reset the Link Register then goto 3    // Try to grab the lock
       Else goto 2.1    // Wait until the lock is released
3. Do something...
4. lock = 0    // Release the lock

当链接寄存器重置时，如果另一个线程假设该锁是空闲的，则它将无法再次将递增的值写入该锁。这样就获得了对lock变量的访问并发性。

关键和原子之间的核心区别来自于以下想法：

为什么要使用锁（一个新变量），而我们可以使用实际变量（我们正在对其执行操作）作为锁变量？

使用new变量作为锁会导致临界区，而使用actual变量作为锁会导致临界区到原子概念。当我们对实际变量执行大量计算（不止一行）时，临界区非常有用。这是因为，如果这些计算的结果未能写入实际变量，则应重复整个过程来计算结果。与在进入高计算区域之前等待锁释放相比，这可能会导致性能较差。因此，每当您想要执行单个计算（x++、x--、++x、--x 等）时，建议使用原子指令并使用关键指令当密集部分正在完成计算更复杂的区域时， /em> 指令。

Already great explanations here. However, we can dive a bit deeper. To understand the core difference between the atomic and critical section concepts in OpenMP, we have to understand the concept of lock first. Let's review why we need to use locks.

A parallel program is being executed by multiple threads. Deterministic results will happen if and only if we perform synchronization between these threads. Of course, synchronization between threads is not always required. We are referring to those cases that synchronization is necessary.

In order to synchronize the threads in a multi-threaded program, we'll use lock. When the access is required to be restricted by only one thread at a time, locks come into play. The lock concept implementation may vary from processor to processor. Let's find out how a simple lock may work from an algorithmic point of view.

1. Define a variable called lock.
2. For each thread:
   2.1. Read the lock.
   2.2. If lock == 0, lock = 1 and goto 3    // Try to grab the lock
       Else goto 2.1    // Wait until the lock is released
3. Do something...
4. lock = 0    // Release the lock

The given algorithm can be implemented in the hardware language as follows. We'll be assuming a single processor and analyze the behavior of locks in that. For this practice, let's assume one of the following processors: MIPS, Alpha, ARM or Power.

try:    LW R1, lock
        BNEZ R1, try
        ADDI R1, R1, #1
        SW R1, lock

This program seems to be OK, but It is not. The above code suffers from the previous problem; synchronization. Let's find the problem. Assume the initial value of lock to be zero. If two threads run this code, one might reach the SW R1, lock before the other one reads the lock variable. Thus, both of them think that the lock is free.
To solve this issue, there is another instruction provided rather than simple LW and SW. It is called Read-Modify-Write instruction. It is a complex instruction (consisting of subinstructions) which assures the lock acquisition procedure is done by only a single thread at a time. The difference of Read-Modify-Write compared to the simple Read and Write instructions is that it uses a different way of Loading and Storing. It uses LL(Load Linked) to load the lock variable and SC(Store Conditional) to write to the lock variable. An additional Link Register is used to assure the procedure of lock acquisition is done by a single thread. The algorithm is given below.

1. Define a variable called lock.
2. For each thread:
   2.1. Read the lock and put the address of lock variable inside the Link Register.
   2.2. If (lock == 0) and (&lock == Link Register), lock = 1 and reset the Link Register then goto 3    // Try to grab the lock
       Else goto 2.1    // Wait until the lock is released
3. Do something...
4. lock = 0    // Release the lock

When the link register is reset, if another thread has assumed the lock to be free, it won't be able to write the incremented value to the lock again. Thus, the concurrency of access to the lock variable is acquired.

The core difference between critical and atomic comes from the idea that:

Why to use locks (a new variable) while we can use the actual variable (which we are performing an operation on it), as a lock variable?

Using a new variable for locks will lead to critical section, while using the actual variable as a lock will lead to atomic concept. The critical section is useful when we are performing a lot of computations (more than one line) on the actual variable. That's because, if the result of those computations fails to be written on the actual variable, the whole procedure should be repeated to compute the results. This can lead to a poor performance compared to waiting for the lock to be released before entering a highly-computational region. Thus, it is recommended to use the atomic directive whenever you want to perform a single computation (x++, x--, ++x, --x, etc.) and use critical directive when a more computationally complex region is being done by the intensive section.

回复收藏 0 原文

傲性难收 2024-12-17 16:12:50

Critical 子句将可变排除应用于代码块，并保证在给定时间只有一个线程会执行代码块，并且该线程完成代码块并且其他线程可以获取锁要执行的块。

Atomic 子句仅适用于其中包含任何数学符号的单个语句，但差异不仅限于表达式的大小。原子子句保护分配给左侧元素的地址位置，并且仅保证对该变量的分配。因此您可以假设如果语句右侧存在任何函数调用，则它可以并行执行。

#pragma omp atomic
a = 5 + fnk();

这里fnk();可以被多个线程同时调用，但是对a的赋值必须是互斥的。
如下所示，fnk() 调用由另一个线程干预，我们分别得到结果 0 2 2 和 0。如果我们使用关键子句，情况就不会是这样。

Critical clause applies mutable exclusion to the code block and guarantees that only one thread will execute the code block at a given time and the thread completes the code block and outs the other threads are Wellcome to acquire the lock for the block to execute.

Atomic clause is only applicable to one single statement that has any math symbol in it but the difference is not only limited by the size of the expressions. The atomic clause protects the address location that's assigned the element to the left and only guarantees the assignment to that variable. so you may assume that if any function call exists on the right of the statement it could be executed parallel.

#pragma omp atomic
a = 5 + fnk();

here fnk(); could be called by multiple threads at the same time but the assignment to the a must be mutually exclusive.
As you can see below, fnk() call is intervined by another thread and we got the result 0 2 2 and 0 respectively. That would't be the case if we'd used critical clause.

回复收藏 0 原文

谜兔 2024-12-17 16:12:50

原子操作

如果如上面的示例所示，我们的关键部分是单个分配，OpenMP 提供了一种可能更有效的方法保护这个。

OpenMP 提供了一个原子指令，与 Critical 一样，指定下一条语句必须一次由一个线程完成：

#pragma ompatomic
全局数据++；

与关键指令不同：

指令下的语句只能是单个 C 赋值语句。
它可以采用以下形式：x++、++x、x-- 或 --x。
它也可以采用x OP=表达式的形式，其中OP是一些二元运算符。
不允许有其他声明。
原子指令的动机是某些处理器为 x++ 等操作提供单个指令。这些称为“获取并添加”指令。

通常，如果您的关键部分可以在原子指令中完成，那就应该这样做。它不会更慢，而且可能会更快。

回复收藏 0 原文

很酷又爱笑 2024-12-17 16:12:50

原子是单个语句关键部分，即您锁定一个语句执行

关键部分是代码块上的锁

好的编译器将像第一个代码一样翻译第二个代码

回复收藏 0 原文

~没有更多了~

关于作者

暂无简介

文章

评论

27 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

qq_VRzBBA45

文章 0 评论 0

痴情

文章 0 评论 0

。

文章 0 评论 0

Mu.

文章 0 评论 0

凉薄对峙

文章 0 评论 0

不落城

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文