Interlocked.CompareExchange 是否使用内存屏障?
我正在阅读 Joe Duffy 关于 易失性读写和及时性,我试图了解有关帖子中最后一个代码示例的一些内容:
while (Interlocked.CompareExchange(ref m_state, 1, 0) != 0) ;
m_state = 0;
while (Interlocked.CompareExchange(ref m_state, 1, 0) != 0) ;
m_state = 0;
…
当执行第二个 CMPXCHG 操作时,它是否使用内存屏障来确保 m_state 的值确实是写入的最新值?或者它只会使用已经存储在处理器缓存中的一些值? (假设m_state未声明为易失性)。
如果我理解正确,如果 CMPXCHG 不使用内存屏障,那么整个锁获取过程将不公平,因为第一个获取锁的线程很可能将获取 <强>所有以下锁。我理解正确吗,还是我在这里错过了一些东西?
编辑:主要问题实际上是在尝试读取 m_state 的值之前调用 CompareExchange 是否会导致内存障碍。因此,当所有线程再次尝试调用 CompareExchange 时,分配 0 是否对所有线程都可见。
I'm reading Joe Duffy's post about Volatile reads and writes, and timeliness, and i'm trying to understand something about the last code sample in the post:
while (Interlocked.CompareExchange(ref m_state, 1, 0) != 0) ;
m_state = 0;
while (Interlocked.CompareExchange(ref m_state, 1, 0) != 0) ;
m_state = 0;
…
When the second CMPXCHG operation is executed, does it use a memory barrier to ensure that the value of m_state is indeed the latest value written to it? Or will it just use some value that is already stored in the processor's cache? (assuming m_state isn't declared as volatile).
If I understand correctly, if CMPXCHG won't use a memory barrier, then the whole lock acquisition procedure won't be fair since it's highly likely that the thread that was the first to acquire the lock, will be the one that will acquire all of following locks. Did I understand correctly, or am I missing out on something here?
Edit: The main question is actually whether calling to CompareExchange will cause a memory barrier before attempting to read m_state's value. So whether assigning 0 will be visible to all of the threads when they try to call CompareExchange again.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
任何具有 lock 前缀的 x86 指令都具有完整内存屏障。如 Abel 的回答所示,Interlocked* API 和 CompareExchanges 使用 lock 前缀指令,例如
lock cmpxchg
。所以,它暗示着记忆栅栏。是的,Interlocked.CompareExchange 使用内存屏障。
为什么?因为 x86 处理器就是这样做的。摘自英特尔的第 3A 卷:系统编程指南第 1 部分,第 7.1.2.2 节:
易失性
与此讨论无关。这是关于原子操作的;为了支持 CPU 中的原子操作,x86 保证完成所有先前的加载和存储。Any x86 instruction that has lock prefix has full memory barrier. As shown Abel's answer, Interlocked* APIs and CompareExchanges use lock-prefixed instruction such as
lock cmpxchg
. So, it implies memory fence.Yes, Interlocked.CompareExchange uses a memory barrier.
Why? Because x86 processors did so. From Intel's Volume 3A: System Programming Guide Part 1, Section 7.1.2.2:
volatile
has nothing to do with this discussion. This is about atomic operations; to support atomic operations in CPU, x86 guarantees all previous loads and stores to be completed.ref
不遵守通常的volatile
规则,尤其是在以下情况下:这里,
RunMethod
不能保证发现的外部更改isDone
即使基础字段 (myField
) 是易失性
;RunMethod
不知道这一点,因此没有正确的代码。然而!这应该不是问题:
Interlocked
,则使用Interlocked
来所有访问该字段Interlocked
) >lock,然后使用lock
对字段的所有访问遵循这些规则,它应该可以正常工作。
重新编辑;是的,该行为是 Interlocked 的关键部分。老实说,我不知道它是如何实现的(内存屏障等 - 请注意它们是“InternalCall”方法,所以我无法检查;-p) - 但是是的:来自一个线程的更新将立即可见所有其他只要他们使用
互锁
方法(因此我上面的观点)。ref
doesn't respect the usualvolatile
rules, especially in things like:Here,
RunMethod
is not guaranteed to spot external changes toisDone
even though the underlying field (myField
) isvolatile
;RunMethod
doesn't know about it, so doesn't have the right code.However! This should be a non-issue:
Interlocked
, then useInterlocked
for all access to the fieldlock
, then uselock
for all access to the fieldFollow those rules and it should work OK.
Re the edit; yes, that behaviour is a critical part of
Interlocked
. To be honest, I don't know how it is implemented (memory barrier, etc - note they are "InternalCall" methods, so I can't check ;-p) - but yes: updates from one thread will be immediately visible to all others as long as they use theInterlocked
methods (hence my point above).似乎与同名的 Win32 API 函数有一些比较,但这个线程都是关于 C#
Interlocked
类的。从它的描述来看,可以保证它的操作是原子的。我不确定这如何转化为其他答案中提到的“完全记忆障碍”,但请自行判断。在单处理器系统上,没有什么特别的事情发生,只有一条指令:
但是在多处理器系统上,硬件锁用于防止其他内核同时访问数据:
有趣的阅读,这里和那里有一些错误的结论,但所有- 在这个主题上非常出色的是这篇 CompareExchange 上的博客文章。
ARM 更新
通常,答案是“这取决于情况”。看来在 2.1 之前,ARM 有一个半屏障。对于 2.1 版本,此行为已更改为
的完全屏障
联锁操作。
当前代码可以找到 这里和此处是 CompareExchange 的实际实现。关于生成的 ARM 程序集的讨论以及生成代码的示例可以在上述 PR 中看到。
There seems to be some comparison with the Win32 API functions by the same name, but this thread is all about the C#
Interlocked
class. From its very description, it is guaranteed that its operations are atomic. I'm not sure how that translates to "full memory barriers" as mentioned in other answers here, but judge for yourself.On uniprocessor systems, nothing special happens, there's just a single instruction:
But on multiprocessor systems, a hardware lock is used to prevent other cores to access the data at the same time:
An interesting read with here and there some wrong conclusions, but all-in-all excellent on the subject is this blog post on CompareExchange.
Update for ARM
As often, the answer is, "it depends". It appears that prior to 2.1, the ARM had a half-barrier. For the 2.1 release, this behavior was changed to a full barrier for the
Interlocked
operations.The current code can be found here and actual implementation of CompareExchange here. Discussions on the generated ARM assembly, as well as examples on generated code can be seen in the aforementioned PR.
MSDN 介绍了 Win32 API 函数:
“大多数互锁函数在所有 Windows 平台上都提供完整的内存屏障”
(具有显式获取/释放语义的互锁函数除外)
由此我可以得出结论,C# 运行时的互锁函数做出了相同的保证,因为它们的记录具有相同的行为(并且它们解析为我所知道的平台上的内部 CPU 语句)。不幸的是,由于 MSDN 倾向于提供示例而不是文档,因此没有明确说明。
MSDN says about the Win32 API functions:
"Most of the interlocked functions provide full memory barriers on all Windows platforms"
(the exceptions are Interlocked functions with explicit Acquire / Release semantics)
From that I would conclude that the C# runtime's Interlocked makes the same guarantees, as they are documented withotherwise identical behavior (and they resolve to intrinsic CPU statements on the platforms i know). Unfortunately, with MSDN's tendency to put up samples instead of documentation, it isn't spelled out explicitly.
根据 ECMA-335(第 I.12.6.5 节):
因此,这些操作遵循“最小惊讶原则”。
According to ECMA-335 (section I.12.6.5):
So, these operations follow principle of least astonishment.
互锁函数保证在解析操作数时停止总线和 CPU。直接的后果是,无论是你的CPU还是其他CPU上的线程切换都不会在执行过程中中断互锁函数。
由于您传递的是对 C# 函数的引用,因此底层汇编代码将使用实际整数的地址,因此变量访问不会被优化。它将完全按照预期工作。
编辑:这里有一个链接可以更好地解释asm指令的行为: http://faydoc.tripod .com/cpu/cmpxchg.htm
正如您所看到的,总线通过强制写入周期而停止,因此尝试同时使用总线的任何其他“线程”(读取:其他 cpu 核心)将被放入等待队列中。
The interlocked functions are guaranteed to stall the bus and the cpu while it resolves the operands. The immediate consequence is that no thread switch, on your cpu or another one, will interrupt the interlocked function in the middle of its execution.
Since you're passing a reference to the c# function, the underlying assembler code will work with the address of the actual integer, so the variable access won't be optimized away. It will work exactly as expected.
edit: Here's a link that explains the behaviour of the asm instruction better: http://faydoc.tripod.com/cpu/cmpxchg.htm
As you can see, the bus is stalled by forcing a write cycle, so any other "threads" (read: other cpu cores) that would try to use the bus at the same time would be put in a waiting queue.