互斥锁功能在没有 volatile 的情况下是否足够?
我和一位同事为 x86、x64、Itanium、PowerPC 和其他已有 10 年历史的服务器 CPU 上运行的各种平台编写软件。
我们刚刚讨论了诸如 pthread_mutex_lock() ... pthread_mutex_unlock() 之类的互斥函数本身是否足够,或者受保护的变量是否需要是易失性的。
int foo::bar()
{
//...
//code which may or may not access _protected.
pthread_mutex_lock(m);
int ret = _protected;
pthread_mutex_unlock(m);
return ret;
}
我关心的是缓存。编译器是否可以将 _protected 的副本放置在堆栈或寄存器中,并在赋值中使用该过时的值?如果不是,什么可以阻止这种情况发生?这种模式的变体容易受到攻击吗?
我认为编译器实际上并不理解 pthread_mutex_lock() 是一个特殊函数,那么我们只是受序列点保护吗?
非常感谢。
更新:好吧,我可以看到一个趋势,其中的答案解释了为什么不稳定是不好的。我尊重这些答案,但是在网上很容易找到有关该主题的文章。我在网上找不到的,也是我问这个问题的原因,是如何在没有易失性的情况下保护我的。 如果上述代码正确,如何不会受到缓存问题的影响?
A coworker and I write software for a variety of platforms running on x86, x64, Itanium, PowerPC, and other 10 year old server CPUs.
We just had a discussion about whether mutex functions such as pthread_mutex_lock() ... pthread_mutex_unlock() are sufficient by themselves, or whether the protected variable needs to be volatile.
int foo::bar()
{
//...
//code which may or may not access _protected.
pthread_mutex_lock(m);
int ret = _protected;
pthread_mutex_unlock(m);
return ret;
}
My concern is caching. Could the compiler place a copy of _protected on the stack or in a register, and use that stale value in the assignment? If not, what prevents that from happening? Are variations of this pattern vulnerable?
I presume that the compiler doesn't actually understand that pthread_mutex_lock() is a special function, so are we just protected by sequence points?
Thanks greatly.
Update: Alright, I can see a trend with answers explaining why volatile is bad. I respect those answers, but articles on that subject are easy to find online. What I can't find online, and the reason I'm asking this question, is how I'm protected without volatile. If the above code is correct, how is it invulnerable to caching issues?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
最简单的答案是多线程根本不需要
易失性
。长的答案是,像关键部分这样的序列点是依赖于平台的,就像您使用的任何线程解决方案一样,因此大多数线程安全性也依赖于平台。
C++0x 有线程和线程安全的概念,但当前标准没有,因此
易失性
有时被错误地识别为防止多线程编程的操作和内存访问重新排序的东西,而实际上它从来没有预期的并且不能以这种方式可靠地使用。在 C++ 中,
volatile
唯一的用途是允许访问内存映射设备,允许在setjmp
和longjmp
之间使用变量,以及允许在信号处理程序中使用 sig_atomic_t 变量。关键字本身并不使变量成为原子的。好消息是,C++0x 中我们将拥有 STL 构造
std::atomic
,它可用于保证变量的原子操作和线程安全构造。在您选择的编译器支持它之前,您可能需要求助于 boost 库或使用一些汇编代码来创建您自己的对象来提供原子变量。PS 很多混乱是由 Java 和 .NET 实际上使用关键字
volatile
C++ 强制执行多线程语义引起的,但是 C++ 也遵循 C 的做法,但情况并非如此。Simplest answer is
volatile
is not needed for multi-threading at all.The long answer is that sequence points like critical sections are platform dependent as is whatever threading solution you're using so most of your thread safety is also platform dependent.
C++0x has a concept of threads and thread safety but the current standard does not and therefore
volatile
is sometimes misidentified as something to prevent reordering of operations and memory access for multi-threading programming when it was never intended and can't be reliably used that way.The only thing
volatile
should be used for in C++ is to allow access to memory mapped devices, allow uses of variables betweensetjmp
andlongjmp
, and to allow uses ofsig_atomic_t
variables in signal handlers. The keyword itself does not make a variable atomic.Good news in C++0x we will have the STL construct
std::atomic
which can be used to guarantee atomic operations and thread safe constructs for variables. Until your compiler of choice supports it you may need to turn to the boost library or bust out some assembly code to create your own objects to provide atomic variables.P.S. A lot of the confusion is caused by Java and .NET actually enforcing multi-threaded semantics with the keyword
volatile
C++ however follows suit with C where this is not the case.您的线程库应包含有关互斥锁锁定和解锁的适当 CPU 和编译器屏障。对于 GCC,asm 语句上的内存破坏充当编译器屏障。
实际上,有两件事可以保护您的代码免受(编译器)缓存的影响:
pthread_mutex_*()
),这意味着编译器不知道函数不会修改全局变量,因此必须重新加载它们。pthread_mutex_*()
包含编译器障碍,例如:在 glibc/x86 上pthread_mutex_lock()
最终调用宏lll_lock()
,它有一个内存
破坏器,强制编译器重新加载变量。Your threading library should include the apropriate CPU and compiler barriers on mutex lock and unlock. For GCC, a
memory
clobber on an asm statement acts as a compiler barrier.Actually, there are two things that protect your code from (compiler) caching:
pthread_mutex_*()
), which means that the compiler doesn't know that that function doesn't modify your global variables, so it has to reload them.pthread_mutex_*()
includes a compiler barrier, e.g: on glibc/x86pthread_mutex_lock()
ends up calling the macrolll_lock()
, which has amemory
clobber, forcing the compiler to reload variables.在 C++0x 之前,情况并非如此。而且它在 C 中没有指定。所以,这实际上取决于编译器。一般来说,如果编译器不保证它会遵守涉及多线程的函数或操作的内存访问的顺序约束,您将无法使用该编译器编写多线程安全代码。请参阅 Hans J Boehm 的线程不能作为库实现。
至于编译器应该支持线程安全代码的抽象,Memory Barriers 上的维基百科条目是一个非常好的起点。
(至于为什么人们建议使用
易失性
,有些编译器将易失性
视为编译器的内存屏障。这绝对不是标准的。)Until C++0x, it is not. And it is not specified in C. So, it really depends on the compiler. In general, if the compiler does not guarantee that it will respect ordering constraints on memory accesses for functions or operations that involve multiple threads, you will not be able to write multithreaded safe code with that compiler. See Hans J Boehm's Threads Cannot be Implemented as a Library.
As for what abstractions your compiler should support for thread safe code, the wikipedia entry on Memory Barriers is a pretty good starting point.
(As for why people suggested
volatile
, some compilers treatvolatile
as a memory barrier for the compiler. It's definitely not standard.)易失性关键字向编译器暗示变量可能会在程序逻辑之外发生更改,例如内存映射的硬件寄存器可能会作为中断服务例程的一部分进行更改。这可以防止编译器假设缓存的值始终是正确的,并且通常会强制执行内存读取来检索该值。这种用法比线程早了几十年左右。我也看到它与信号操纵的变量一起使用,但我不确定用法是否正确。
由互斥锁保护的变量在被不同线程读取或写入时保证是正确的。需要线程 API 来确保此类变量视图的一致性。此访问是程序逻辑的一部分,并且 volatile 关键字在这里无关紧要。
The volatile keyword is a hint to the compiler that the variable might change outside of program logic, such as a memory-mapped hardware register that could change as part of an interrupt service routine. This prevents the compiler from assuming a cached value is always correct and would normally force a memory read to retrieve the value. This usage pre-dates threading by a couple decades or so. I've seen it used with variables manipulated by signals as well, but I'm not sure that usage was correct.
Variables guarded by mutexes are guaranteed to be correct when read or written by different threads. The threading API is required to ensure that such views of variables are consistent. This access is all part of your program logic and the volatile keyword is irrelevant here.
除了最简单的自旋锁算法之外,互斥锁代码相当复杂:良好的优化互斥锁/解锁代码包含即使是优秀的程序员也难以理解的代码。它使用特殊的比较和设置指令,不仅管理解锁/锁定状态,还管理等待队列,可以选择使用系统调用进入等待状态(用于锁定)或唤醒其他线程(用于解锁)。
无论如何,普通编译器都无法解码和“理解”所有复杂的代码(同样,除了简单的自旋锁),因此即使对于不知道互斥体是什么以及它如何关联的编译器来说对于同步,实际上编译器无法优化此类代码的任何内容。
也就是说,代码是否是“内联”的,或者可用于出于跨模块优化的目的进行分析,或者全局优化是否可用。
编译器不知道它的作用,因此不会尝试围绕它进行优化。
怎样才算是“特别”呢?它是不透明的并被如此处理。 在不透明函数中它并不特殊。
与可以访问任何其他对象的任意不透明函数没有语义差异。
是的,在透明、直接地作用于对象的代码中,通过使用变量名称或指针以编译器可以遵循的方式。不在可能使用任意指针间接使用变量的代码中。
所以是的在调用不透明函数之间。没有跨越。
还有对于只能在函数中使用的变量,按名称:对于没有获取地址或绑定到它们的引用的局部变量(这样编译器就无法进一步遵循用途)。这些确实可以在任意调用(包括锁定/解锁)中“缓存”。
函数的不透明度。非内联。汇编代码。系统调用。代码复杂度。一切都会让编译器跳出来并认为“这是复杂的东西,只需调用它”。
编译器的默认位置始终是“让我们愚蠢地执行,我无论如何都不明白正在做什么”而不是“我将优化它/让我们重写我更了解的算法”。大多数代码没有以复杂的非本地方式进行优化。
现在让我们假设绝对更差(从外部的角度来看,编译器应该放弃,从优化算法的角度来看,这绝对是最好的):
那么我们可能会遇到问题,因为编译器可以围绕函数调用进行优化。 这个问题可以通过插入一个编译器屏障来简单地修复,例如一个空的 asm 语句,其中包含其他可访问变量的“clobber”。这意味着编译器只是假设被调用函数可以访问的任何内容都是“被砸烂了”。
您可以将其设置为易失性,其通常原因是使事物变得易失性:确保能够访问调试器中的变量,以防止浮点变量在运行时具有错误的数据类型等。
将其设置为易失性实际上不会甚至修复上述问题,因为易失性本质上是抽象机中的内存操作,具有 I/O 操作的语义,因此仅针对
Volatile 没有相对于非易失性内存副作用进行排序。 这使得 volatile < em>实际上对于编写线程安全代码毫无用处(对于实际用途来说无用),即使是在最具体的情况下,挥发性会先验帮助,在不需要内存栅栏的情况下:在 a 上编程线程原语单CPU上的分时系统。 (这可能是 C 或 C++ 中最不被理解的方面之一。)
因此,虽然 volatile 确实可以防止“缓存”,但易失性甚至不会阻止编译器对锁定/解锁操作进行重新排序,除非所有共享变量都是易失性的。
With the exception of the simplest spin lock algorithm, mutex code is quite involved: a good optimized mutex lock/unlock code contains the kind of code even excellent programmer struggle to understand. It uses special compare and set instructions, manages not only the unlocked/locked state but also the wait queue, optionally uses system calls to go into a wait state (for lock) or wake up other threads (for unlock).
There is no way the average compiler can decode and "understand" all that complex code (again, with the exception of the simple spin lock) no matter way, so even for a compiler not aware of what a mutex is, and how it relates to synchronization, there is no way in practice a compiler could optimize anything around such code.
That's if the code was "inline", or available for analyse for the purpose of cross module optimization, or if global optimization is available.
The compiler does not know what it does, so does not try to optimize around it.
How is it "special"? It's opaque and treated as such. It is not special among opaque functions.
There is no semantic difference with an arbitrary opaque function that can access any other object.
Yes, in code that act on objects transparently and directly, by using the variable name or pointers in a way that the compiler can follow. Not in code that might use arbitrary pointers to indirectly use variables.
So yes between calls to opaque functions. Not across.
And also for variables which can only be used in the function, by name: for local variables that don't have either their address taken or a reference bound to them (such that the compiler cannot follow all further uses). These can indeed be "cached" across arbitrary calls include lock/unlock.
Opacity of the functions. Non inlining. Assembly code. System calls. Code complexity. Everything that make compilers bail out and think "that's complicated stuff just make calls to it".
The default position of a compiler is always the "let's execute stupidly I don't understand what is being done anyway" not "I will optimize that/let's rewrite the algorithm I know better". Most code is not optimized in complex non local way.
Now let's assume the absolute worse (from out point of view which is that the compiler should give up, that is the absolute best from the point of view of an optimizing algorithm):
then we might have a problem as the compiler could optimize around the function call. This is fixed trivially by inserting a compiler barrier such as an empty asm statement with a "clobber" for other accessible variables. That means that compiler just assumes that anything that might be accessible to a called function is "clobbered".
You can make it volatile for the usual reason you make things volatile: to be certain to be able to access the variable in the debugger, to prevent a floating point variable from having the wrong datatype at runtime, etc.
Making it volatile would actually not even fix the issue described above as volatile is essentially a memory operation in the abstract machine that has the semantics of an I/O operation and as such is only ordered with respect to
Volatile is not ordered with respect to non volatile memory side effects. That makes volatile practically useless (useless for practical uses) for writing thread safe code in even the most specific case where volatile would a priori help, the case where no memory fence is ever needed: when programming threading primitives on a time sharing system on a single CPU. (That may be one of the least understood aspects of either C or C++.)
So while volatile does prevent "caching", volatile doesn't even prevent compiler reordering of lock/unlock operation unless all shared variables are volatile.
锁/同步原语确保数据不会缓存在寄存器/CPU 缓存中,这意味着数据会传播到内存。如果两个线程使用 in 锁访问/修改数据,则可以保证从内存中读取数据并将数据写入内存。在这个用例中我们不需要 volatile。
但是,如果您的代码经过双重检查,编译器可以优化代码并删除冗余代码,以防止我们需要易失性。
示例:参见单例模式示例
https://en.m.wikipedia.org/wiki/Singleton_pattern#Lazy_initialization
为什么有人会写这种代码?
答:不获取锁会带来性能优势。
PS:这是我第一篇关于堆栈溢出的文章。
Locks/synchronisation primitives make sure the data is not cached in registers/cpu cache, that means data propagates to memory. If two threads are accessing/ modifying data with in locks, it is guaranteed that data is read from memory and written to memory. We don't need volatile in this use case.
But the case where you have code with double checks, compiler can optimise the code and remove redundant code, to prevent that we need volatile.
Example: see singleton pattern example
https://en.m.wikipedia.org/wiki/Singleton_pattern#Lazy_initialization
Why do some one write this kind of code?
Ans: There is a performance benefit of not accuiring lock.
PS: This is my first post on stack overflow.
如果您锁定的对象是易失性的,例如:如果它表示的值取决于程序的外部事物(硬件状态),则不会。
易失性
不应该用来表示执行程序的结果的任何类型的行为。如果它实际上是易失性的,我个人会做的是锁定指针/地址的值,而不是底层对象。
例如:
请注意,只有在线程中使用该对象的所有代码都锁定同一地址时,它才有效。因此,在将线程与属于 API 一部分的某些变量一起使用时请注意这一点。
Not if the object you're locking is volatile, eg: if the value it represents depends on something foreign to the program (hardware state).
volatile
should NOT be used to denote any kind of behavior that is the result of executing the program.If it's actually
volatile
what I personally would do is locking the value of the pointer/address, instead of the underlying object.eg:
Please note that it only works if ALL the code ever using the object in a thread locks the same address. So be mindful of that when using threads with some variable that is part of an API.