OpenMP 中的原子性和关键性有什么区别?
OpenMP 中的原子性和关键性有什么区别?
我可以做到这一点
#pragma omp atomic
g_qCount++;
,但这不是一样吗
#pragma omp critical
g_qCount++;
?
What is the difference between atomic and critical in OpenMP?
I can do this
#pragma omp atomic
g_qCount++;
but isn't this same as
#pragma omp critical
g_qCount++;
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
对 g_qCount 的影响是相同的,但所做的事情不同。
OpenMP 临界区是完全通用的 - 它可以包围任何任意代码块。然而,每次线程进入和退出临界区时(除了序列化的固有成本之外),您都会为这种通用性付出代价。
(此外,在 OpenMP 中,所有未命名临界区都被认为是相同的(如果您愿意,所有未命名临界区只有一把锁),因此,如果一个线程位于上述一个 [未命名] 临界区中,则任何线程都不能进入任何一个临界区。 [未命名] 临界区 正如您可能猜到的,您可以通过使用命名临界区来解决这个问题。
原子操作的开销要低得多。在可用的情况下,它利用提供(例如)原子增量操作的硬件;在这种情况下,进入/退出代码行时不需要锁定/解锁,它只是执行硬件告诉您不会受到干扰的原子增量。
优点是开销要低得多,并且处于原子操作中的一个线程不会阻止任何即将发生的(不同的)原子操作。缺点是原子支持的操作集有限。
当然,无论哪种情况,您都会承担序列化的成本。
The effect on g_qCount is the same, but what's done is different.
An OpenMP critical section is completely general - it can surround any arbitrary block of code. You pay for that generality, however, by incurring significant overhead every time a thread enters and exits the critical section (on top of the inherent cost of serialization).
(In addition, in OpenMP all unnamed critical sections are considered identical (if you prefer, there's only one lock for all unnamed critical sections), so that if one thread is in one [unnamed] critical section as above, no thread can enter any [unnamed] critical section. As you might guess, you can get around this by using named critical sections).
An atomic operation has much lower overhead. Where available, it takes advantage on the hardware providing (say) an atomic increment operation; in that case there's no lock/unlock needed on entering/exiting the line of code, it just does the atomic increment which the hardware tells you can't be interfered with.
The upsides are that the overhead is much lower, and one thread being in an atomic operation doesn't block any (different) atomic operations about to happen. The downside is the restricted set of operations that atomic supports.
Of course, in either case, you incur the cost of serialization.
在 OpenMP 中,所有未命名的关键部分都是互斥的。
关键和原子之间最重要的区别是原子只能保护单个赋值,并且您可以将其与特定运算符一起使用。
In OpenMP, all the unnamed critical sections are mutually exclusive.
The most important difference between critical and atomic is that atomic can protect only a single assignment and you can use it with specific operators.
关键部分:
可以扩展到通过正确使用“名称”标签来序列化块组。
慢点!
原子操作:
更快!
仅确保特定操作的序列化。
Critical section:
Can be extended to serialise groups of blocks with proper use of "name" tag.
Slower!
Atomic operation:
Is much faster!
Only ensures the serialisation of a particular operation.
最快的方法既不是关键的也不是原子的。大约,带有临界区的加法比简单加法贵200倍,原子加法比简单加法贵25倍。
最快的选项(并不总是适用)是为每个线程提供自己的计数器,并在需要总和时进行归约操作。
The fastest way is neither critical nor atomic. Approximately, addition with critical section is 200 times more expensive than simple addition, atomic addition is 25 times more expensive then simple addition.
The fastest option (not always applicable) is to give each thread its own counter and make reduce operation when you need total sum.
atomic
的局限性很重要。它们应在 OpenMP 规范 中详细说明。 MSDN 提供了一个快速备忘单,如下所示如果这不会改变,我不会感到惊讶。 (Visual Studio 2012 从 2002 年 3 月开始就有 OpenMP 实现。)引用 MSDN 的话:我建议尽可能使用
atomic
,否则命名关键部分。给它们命名很重要;这样你就可以避免令人头痛的调试问题。The limitations of
atomic
are important. They should be detailed on the OpenMP specs. MSDN offers a quick cheat sheet as I wouldn't be surprised if this will not change. (Visual Studio 2012 has an OpenMP implementation from March 2002.) To quote MSDN:I recommend to use
atomic
when you can and named critical sections otherwise. Naming them is important; you'll avoid debugging headaches this way.这里已经有很好的解释了。然而,我们可以更深入一些。要理解 OpenMP 中原子和临界区概念之间的核心区别,我们必须首先理解锁的概念。让我们回顾一下为什么需要使用锁。锁。
为了同步多线程程序中的线程,我们将使用锁。当需要一次仅由一个线程限制访问时,锁就发挥作用了。 锁概念的实现可能因处理器而异。让我们从算法的角度了解一个简单的锁是如何工作的。
给定的算法可以用硬件语言实现如下。我们将假设一个处理器并分析其中的锁的行为。对于此实践,我们假设使用以下处理器之一:MIPS、Alpha、ARM 或 Power。
这个程序看似没问题,其实不然。上面的代码遇到了前面的问题; 同步。我们来找出问题所在。假设lock的初始值为零。如果两个线程运行此代码,一个线程可能会在另一个线程读取 lock 变量之前到达 SW R1, lock。因此,他们都认为锁是免费的。
为了解决这个问题,提供了另一个指令,而不是简单的LW和SW。它称为“读取-修改-写入”指令。它是一个复杂的指令(由子指令组成),确保一次仅由一个单个线程完成锁获取过程。与简单的读取和写入指令相比,读取-修改-写入指令的区别在于它使用不同的加载方式。 /em> 和存储。它使用LL(加载链接)来加载锁变量,并使用SC(条件存储)来写入锁变量。附加的链接寄存器用于确保锁获取过程由单个线程完成。算法如下。
当链接寄存器重置时,如果另一个线程假设该锁是空闲的,则它将无法再次将递增的值写入该锁。这样就获得了对lock变量的访问并发性。
关键和原子之间的核心区别来自于以下想法:
使用new变量作为锁会导致临界区,而使用actual变量作为锁会导致临界区到原子概念。当我们对实际变量执行大量计算(不止一行)时,临界区非常有用。这是因为,如果这些计算的结果未能写入实际变量,则应重复整个过程来计算结果。与在进入高计算区域之前等待锁释放相比,这可能会导致性能较差。因此,每当您想要执行单个计算(x++、x--、++x、--x 等)时,建议使用原子指令并使用关键指令当密集部分正在完成计算更复杂的区域时, /em> 指令。
Already great explanations here. However, we can dive a bit deeper. To understand the core difference between the atomic and critical section concepts in OpenMP, we have to understand the concept of lock first. Let's review why we need to use locks.
In order to synchronize the threads in a multi-threaded program, we'll use lock. When the access is required to be restricted by only one thread at a time, locks come into play. The lock concept implementation may vary from processor to processor. Let's find out how a simple lock may work from an algorithmic point of view.
The given algorithm can be implemented in the hardware language as follows. We'll be assuming a single processor and analyze the behavior of locks in that. For this practice, let's assume one of the following processors: MIPS, Alpha, ARM or Power.
This program seems to be OK, but It is not. The above code suffers from the previous problem; synchronization. Let's find the problem. Assume the initial value of lock to be zero. If two threads run this code, one might reach the SW R1, lock before the other one reads the lock variable. Thus, both of them think that the lock is free.
To solve this issue, there is another instruction provided rather than simple LW and SW. It is called Read-Modify-Write instruction. It is a complex instruction (consisting of subinstructions) which assures the lock acquisition procedure is done by only a single thread at a time. The difference of Read-Modify-Write compared to the simple Read and Write instructions is that it uses a different way of Loading and Storing. It uses LL(Load Linked) to load the lock variable and SC(Store Conditional) to write to the lock variable. An additional Link Register is used to assure the procedure of lock acquisition is done by a single thread. The algorithm is given below.
When the link register is reset, if another thread has assumed the lock to be free, it won't be able to write the incremented value to the lock again. Thus, the concurrency of access to the lock variable is acquired.
The core difference between critical and atomic comes from the idea that:
Using a new variable for locks will lead to critical section, while using the actual variable as a lock will lead to atomic concept. The critical section is useful when we are performing a lot of computations (more than one line) on the actual variable. That's because, if the result of those computations fails to be written on the actual variable, the whole procedure should be repeated to compute the results. This can lead to a poor performance compared to waiting for the lock to be released before entering a highly-computational region. Thus, it is recommended to use the atomic directive whenever you want to perform a single computation (x++, x--, ++x, --x, etc.) and use critical directive when a more computationally complex region is being done by the intensive section.
Critical 子句将可变排除应用于代码块,并保证在给定时间只有一个线程会执行代码块,并且该线程完成代码块并且其他线程可以获取锁要执行的块。
Atomic
子句仅适用于其中包含任何数学符号的单个语句,但差异不仅限于表达式的大小。原子子句保护分配给左侧元素的地址位置,并且仅保证对该变量的分配。因此您可以假设如果语句右侧存在任何函数调用,则它可以并行执行。这里


fnk();
可以被多个线程同时调用,但是对a的赋值必须是互斥的。如下所示,fnk() 调用由另一个线程干预,我们分别得到结果 0 2 2 和 0。如果我们使用关键子句,情况就不会是这样。
Critical
clause applies mutable exclusion to the code block and guarantees that only one thread will execute the code block at a given time and the thread completes the code block and outs the other threads are Wellcome to acquire the lock for the block to execute.Atomic
clause is only applicable to one single statement that has any math symbol in it but the difference is not only limited by the size of the expressions. The atomic clause protects the address location that's assigned the element to the left and only guarantees the assignment to that variable. so you may assume that if any function call exists on the right of the statement it could be executed parallel.here


fnk();
could be called by multiple threads at the same time but the assignment to the a must be mutually exclusive.As you can see below, fnk() call is intervined by another thread and we got the result 0 2 2 and 0 respectively. That would't be the case if we'd used critical clause.
OpenMP 同步
原子操作
如果如上面的示例所示,我们的关键部分是单个分配,OpenMP 提供了一种可能更有效的方法保护这个。
OpenMP 提供了一个原子指令,与 Critical 一样,指定下一条语句必须一次由一个线程完成:
#pragma ompatomic
全局数据++;
与关键指令不同:
指令下的语句只能是单个 C 赋值语句。
它可以采用以下形式:x++、++x、x-- 或 --x。
它也可以采用x OP=表达式的形式,其中OP是一些二元运算符。
不允许有其他声明。
原子指令的动机是某些处理器为 x++ 等操作提供单个指令。这些称为“获取并添加”指令。
通常,如果您的关键部分可以在原子指令中完成,那就应该这样做。它不会更慢,而且可能会更快。
OpenMP Synchronization
Atomic Operations
If, as in the example above, our critical section is a single assignment, OpenMP provides a potentially more efficient way of protecting this.
OpenMP provides an atomic directive which, like critical, specifies the next statement must be done by one thread at a time:
#pragma omp atomic
global_data++;
Unlike a critical directive:
The statement under the directive can only be a single C assignment statement.
It can be of the form: x++, ++x, x-- or --x.
It can also be of the form x OP= expression where OP is some binary operator.
No other statement is allowed.
The motivation for the atomic directive is that some processors provide single instructions for operations such as x++. These are called Fetch-and-add instructions.
As a rule, if your critical section can be done in an atomic directive, it should. It will not be slower, and might be faster.
原子是单个语句 关键部分,即您锁定一个语句执行
关键部分是代码块上的锁
好的编译器将像第一个代码一样翻译第二个代码
atomic is a single statement Critical section, i.e. you lock for one statement execution
critical section is a lock on a block of code
A good compiler will translate your second code the same way it does the first