使用关键部分避免 Delphi 中的缓存一致性问题?
我刚刚读了一篇 MSDN 文章,“同步和多处理器问题”,解决多处理器计算机上的内存缓存一致性问题。这真的让我大开眼界,因为我不会想到他们提供的示例中可能存在竞争条件。本文解释了对内存的写入实际上可能不会按照我的代码中写入的顺序发生(从另一个 cpu 的角度来看)。这对我来说是一个新概念!
本文提供了 2 个解决方案:
- 对需要跨多个 cpu 缓存一致性的变量使用“易失性”关键字。这是一个 C/C++ 关键字,在 Delphi 中我无法使用。
- 使用 InterlockExchange() 和 InterlockCompareExchange()。如果必须的话,这是我可以在德尔福做的事情。只是看起来有点乱。
文章还提到“以下同步函数使用适当的屏障来确保内存排序:•进入或离开临界区的函数”。
这是我不明白的部分。这是否意味着任何仅限于使用关键部分的函数的内存写入都不会受到缓存一致性和内存排序问题的影响?我并不反对 Interlock*() 函数,但我的工具带中的另一个工具会很好!
I just read a MSDN article, "Synchronization and Multiprocessor Issues", that addresses memory cache consistency issues on multiprocessor machines. This was really eye opening to me, because I would not have thought there could be a race condition in the example they provide. This article explains that writes to memory might not actually occur (from the perspective of the other cpu) in the order written in my code. This is a new concept to me!
This article provides 2 solutions:
- Using the "volatile" keyword on variables that need cache consistency across multiple cpus. This is a C/C++ keyword, and not available to me in Delphi.
- Using InterlockExchange() and InterlockCompareExchange(). This is something I could do in Delphi if I had to. It just seems a little messy.
The article also mentions that "The following synchronization functions use the appropriate barriers to ensure memory ordering: •Functions that enter or leave critical sections".
This is the part I don't understand. Does this mean that any writes to memory that are limited to functions that use critical sections are immune from cache consistency and memory ordering issues? I have nothing against the Interlock*() functions, but another tool in my tool belt would be good to have!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这篇 MSDN 文章只是多线程应用程序开发的第一步:简而言之,它的意思是“用锁(又名临界区)保护你的共享变量,因为你不确定你读/写的数据对于所有线程来说都是相同的”线程”。
CPU 每核缓存只是可能的问题之一,这将导致读取错误的值。另一个可能导致竞争条件的问题是两个线程同时写入资源:不可能知道随后将存储哪个值。
由于代码期望数据是一致的,因此某些多线程程序可能会出现错误行为。使用多线程,您不确定通过单独的指令编写的代码在处理共享变量时是否按预期执行。
InterlockedExchange/InterlockedIncrement 函数是带有 LOCK 前缀的低级 asm 操作码(或按设计锁定,如 XCHG EDX,[EAX] 操作码),这确实会强制所有 CPU 核心的缓存一致性,从而使 asm 操作码执行线程安全。
例如,以下是在分配字符串值时如何实现字符串引用计数(请参阅 System.pas 中的
_LStrAsg
- 这来自 我们针对 Delphi 7/2002 的 RTL 优化版本 - 因为 Delphi 原始代码是受版权保护):第一个
INC ECX
和LOCK INC [EDX-skew].StrRec.refCnt
之间存在差异 - 不仅第一个增量 ECX,而且不增加引用计数变量,但第一个不是线程安全的,而第二个以 LOCK 为前缀,因此是线程安全的。顺便说一下,这个 LOCK 前缀是 RTL 中的多线程扩展 - 对于较新的 CPU 来说效果更好,但仍然不完美。
因此,使用关键部分是使代码线程安全的最简单方法:
使用局部变量使关键部分更短,因此您的应用程序将更好地扩展并充分利用 CPU 内核的全部功能。在
EnterCriticalSection
和LeaveCriticalSection
之间,只有一个线程会运行:其他线程将在EnterCriticalSection
调用中等待...因此临界区越短,您的应用程序速度越快。一些设计错误的多线程应用程序实际上可能比单线程应用程序慢!并且不要忘记,如果临界区中的代码可能引发异常,则应始终编写显式的
try ...finally LeaveCriticalSection() end;
块来保护锁释放,并防止任何异常。您的应用程序死锁。如果您使用锁(即关键部分)保护共享数据,Delphi 是完全线程安全的。请注意,即使是引用计数变量(如字符串)也应该受到保护,即使它们的 RTL 函数内有 LOCK:此 LOCK 的作用是假设正确的引用计数并避免内存泄漏,但它不是线程安全的。为了使其尽可能快,请参阅此问题。
InterlockExchange
和InterlockCompareExchange
的目的是更改共享指针变量值。您可以将其视为访问指针值的关键部分的“轻型”版本。在所有情况下,编写工作多线程代码并不容易 - 甚至困难,作为一名 Delphi 专家刚刚在他的博客中写道。
您应该编写根本没有共享数据的简单线程(在线程启动之前制作数据的私有副本,或者使用只读共享数据 - 这本质上是线程安全的),或者调用一些设计良好且经过验证的库- 就像 http://otl.17slon.com - 这将为您节省大量调试时间。
This MSDN article is just the first step of multi-thread application development: in short, it means "protect your shared variables with locks (aka critical sections), because you are not sure that the data you read/write is the same for all threads".
The CPU per-core cache is just one of the possible issues, which will lead into reading wrong values. Another issue which may lead into race condition is two threads writing to a resource at the same time: it's impossible to know which value will be stored afterward.
Since code expects the data to be coherent, some multi-thread programs may behave wrongly. With multi-threading, you are not sure that the code you write, via individual instructions, is executed as expected, when it deals with shared variables.
InterlockedExchange/InterlockedIncrement
functions are low-level asm opcodes with a LOCK prefix (or locked by design, like theXCHG EDX,[EAX]
opcode), which will indeed force the cache coherency for all CPU cores, and therefore make the asm opcode execution thread-safe.For instance, here is how a string reference count is implemented when you assign a string value (see
_LStrAsg
in System.pas - this is from our optimized version of the RTL for Delphi 7/2002 - since Delphi original code is copyrighted):There is a difference between the first
INC ECX
andLOCK INC [EDX-skew].StrRec.refCnt
- not only the first increments ECX and not the reference count variable, but the first is not thread-safe, whereas the 2nd is prefixed by a LOCK therefore will be thread-safe.By the way, this LOCK prefix is one of the problem of multi-thread scaling in the RTL - it's better with newer CPUs, but still not perfect.
So using critical sections is the easiest way of making a code thread-safe:
Using a local variable makes the critical section shorter, therefore your application will better scale and make use of the full power of your CPU cores. Between
EnterCriticalSection
andLeaveCriticalSection
, only one thread will be running: other threads will wait inEnterCriticalSection
call... So the shorter the critical section is, the faster your application is. Some wrongly designed multi-threaded applications can actually be slower than mono-threaded apps!And do not forget that if your code inside the critical section may raise an exception, you should always write an explicit
try ... finally LeaveCriticalSection() end;
block to protect the lock release, and prevent any dead lock of your application.Delphi is perfectly thread-safe if you protect your shared data with a lock, i.e. a Critical Section. Be aware that even reference-counted variables (like strings) should be protected, even if there is a LOCK inside their RTL functions: this LOCK is there to assume correct reference counting and avoid memory leaks, but it won't be thread-safe. To make it as fast as possible, see this SO question.
The purpose of
InterlockExchange
andInterlockCompareExchange
is to change a shared pointer variable value. You can see it as a a "light" version of the critical section to access a pointer value.In all cases, writing working multi-threaded code is not easy - it's even hard, as a Delphi expert just wrote in his blog.
You should either write simple threads with no shared data at all (make a private copy of the data before the thread starts, or use read-only shared data - which is thread-safe by essence), or call some well designed and proven libraries - like http://otl.17slon.com - which will save you a lot of debugging time.
首先,根据语言标准,易失性并没有按照文章所说的那样进行。易失性的获取和释放语义是 MSVC 特定的。如果您使用其他编译器或在其他平台上进行编译,这可能会出现问题。 C++11 引入了语言支持的原子变量,希望在适当的时候最终能够结束(错误)使用 volatile 作为线程构造的情况。
关键部分和互斥体确实已实现,以便所有线程都可以正确地看到受保护变量的读取和写入。
我认为将关键部分和互斥锁(锁)视为实现序列化的设备是最好的方式。也就是说,受此类锁保护的代码块是连续执行的,一个接一个,没有重叠。序列化也适用于内存访问。不会因缓存一致性或读/写重新排序而出现问题。
互锁功能是使用内存总线上基于硬件的锁来实现的。这些函数由无锁算法使用。这意味着他们不使用像关键部分这样的重量级锁,而是使用这些轻量级硬件锁。
无锁算法比基于锁的算法更有效,但无锁算法可能很难正确编写。优先选择关键部分而不是无锁,除非性能影响是明显的。
另一篇值得一读的文章是“双重检查锁定被破坏”声明。
First of all, according to the language standards, volatile doesn't do what the article says it does. The acquire and release semantics of volatile are MSVC specific. This can be a problem if you compile with other compilers or on other platforms. C++11 introduces language supported atomic variables which will hopefully, in due course, finally put an end to the (mis-)use of volatile as a threading construct.
Critical sections and mutexes are indeed implemented so that reads and writes of protected variables will be seen correctly from all threads.
I think the best way to think of critical sections and mutexes (locks) is as devices to bring about serialization. That is, blocks of code protected by such locks are executed serially, one after another without overlap. The serialization applies to memory access also. There can be no problems due to cache coherence or read/write reordering.
Interlocked functions are implemented using hardware based locks on the memory bus. These functions are used by lock free algorithms. What this means is that they don't use heavy weight locks like critical sections, but rather these light weight hardware locks.
Lock free algorithms can be more efficient than those based on locks, but lock free algorithms can be very much harder to write correctly. Prefer critical sections over lock free unless the performance implications are discernable.
Another article well worth reading is The "Double-Checked Locking is Broken" Declaration.