Interlocked.CompareExchange 是否使用内存屏障？

发布于 2024-08-07 23:45:15 字数 723 浏览 10 评论 0原文

while (Interlocked.CompareExchange(ref m_state, 1, 0) != 0) ;
m_state = 0;
while (Interlocked.CompareExchange(ref m_state, 1, 0) != 0) ;
m_state = 0;
…

当执行第二个 CMPXCHG 操作时，它是否使用内存屏障来确保 m_state 的值确实是写入的最新值？或者它只会使用已经存储在处理器缓存中的一些值？（假设m_state未声明为易失性）。
如果我理解正确，如果 CMPXCHG 不使用内存屏障，那么整个锁获取过程将不公平，因为第一个获取锁的线程很可能将获取 <强>所有以下锁。我理解正确吗，还是我在这里错过了一些东西？

编辑：主要问题实际上是在尝试读取 m_state 的值之前调用 CompareExchange 是否会导致内存障碍。因此，当所有线程再次尝试调用 CompareExchange 时，分配 0 是否对所有线程都可见。

原文

I'm reading Joe Duffy's post about Volatile reads and writes, and timeliness, and i'm trying to understand something about the last code sample in the post:

while (Interlocked.CompareExchange(ref m_state, 1, 0) != 0) ;
m_state = 0;
while (Interlocked.CompareExchange(ref m_state, 1, 0) != 0) ;
m_state = 0;
…

When the second CMPXCHG operation is executed, does it use a memory barrier to ensure that the value of m_state is indeed the latest value written to it? Or will it just use some value that is already stored in the processor's cache? (assuming m_state isn't declared as volatile).
If I understand correctly, if CMPXCHG won't use a memory barrier, then the whole lock acquisition procedure won't be fair since it's highly likely that the thread that was the first to acquire the lock, will be the one that will acquire all of following locks. Did I understand correctly, or am I missing out on something here?

Edit: The main question is actually whether calling to CompareExchange will cause a memory barrier before attempting to read m_state's value. So whether assigning 0 will be visible to all of the threads when they try to call CompareExchange again.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不打扰别人 2024-08-14 23:45:15

任何具有 lock 前缀的 x86 指令都具有完整内存屏障。如 Abel 的回答所示，Interlocked* API 和 CompareExchanges 使用 lock 前缀指令，例如 lock cmpxchg。所以，它暗示着记忆栅栏。

是的，Interlocked.CompareExchange 使用内存屏障。

为什么？因为 x86 处理器就是这样做的。摘自英特尔的第 3A 卷：系统编程指南第 1 部分，第 7.1.2.2 节：

对于 P6 系列处理器，锁定操作会序列化所有未完成的加载和存储操作（即等待它们完成）。此规则也适用于 Pentium 4 和 Intel Xeon 处理器，但有一个例外。引用弱有序内存类型（例如 WC 内存类型）的加载操作可能无法序列化。

易失性与此讨论无关。这是关于原子操作的；为了支持 CPU 中的原子操作，x86 保证完成所有先前的加载和存储。

回复收藏 0 原文

空气里的味道 2024-08-14 23:45:15

ref 不遵守通常的 volatile 规则，尤其是在以下情况下：

volatile bool myField;
...
RunMethod(ref myField);
...
void RunMethod(ref bool isDone) {
    while(!isDone) {} // silly example
}

这里，RunMethod 不能保证发现 的外部更改isDone 即使基础字段 (myField) 是 易失性； RunMethod 不知道这一点，因此没有正确的代码。

然而！这应该不是问题：

如果您使用 Interlocked，则使用 Interlocked 来所有访问该字段
（如果您使用 Interlocked） >lock，然后使用 lock 对字段的所有访问

遵循这些规则，它应该可以正常工作。

重新编辑；是的，该行为是 Interlocked 的关键部分。老实说，我不知道它是如何实现的（内存屏障等 - 请注意它们是“InternalCall”方法，所以我无法检查;-p） - 但是是的：来自一个线程的更新将立即可见所有其他只要他们使用互锁方法（因此我上面的观点）。

ref doesn't respect the usual volatile rules, especially in things like:

volatile bool myField;
...
RunMethod(ref myField);
...
void RunMethod(ref bool isDone) {
    while(!isDone) {} // silly example
}

Here, RunMethod is not guaranteed to spot external changes to isDone even though the underlying field (myField) is volatile; RunMethod doesn't know about it, so doesn't have the right code.

However! This should be a non-issue:

if you are using Interlocked, then use Interlocked for all access to the field
if you are using lock, then use lock for all access to the field

Follow those rules and it should work OK.

Re the edit; yes, that behaviour is a critical part of Interlocked. To be honest, I don't know how it is implemented (memory barrier, etc - note they are "InternalCall" methods, so I can't check ;-p) - but yes: updates from one thread will be immediately visible to all others as long as they use the Interlocked methods (hence my point above).

回复收藏 0 原文

潇烟暮雨 2024-08-14 23:45:15

似乎与同名的 Win32 API 函数有一些比较，但这个线程都是关于 C# Interlocked 类的。从它的描述来看，可以保证它的操作是原子的。我不确定这如何转化为其他答案中提到的“完全记忆障碍”，但请自行判断。

在单处理器系统上，没有什么特别的事情发生，只有一条指令：

FASTCALL_FUNC CompareExchangeUP,12
        _ASSERT_ALIGNED_4_X86 ecx
        mov     eax, [esp+4]    ; Comparand
        cmpxchg [ecx], edx
        retn    4               ; result in EAX
FASTCALL_ENDFUNC CompareExchangeUP

但是在多处理器系统上，硬件锁用于防止其他内核同时访问数据：

FASTCALL_FUNC CompareExchangeMP,12
        _ASSERT_ALIGNED_4_X86 ecx
        mov     eax, [esp+4]    ; Comparand
  lock  cmpxchg [ecx], edx
        retn    4               ; result in EAX
FASTCALL_ENDFUNC CompareExchangeMP

有趣的阅读，这里和那里有一些错误的结论，但所有- 在这个主题上非常出色的是这篇 CompareExchange 上的博客文章。

ARM 更新

通常，答案是“这取决于情况”。看来在 2.1 之前，ARM 有一个半屏障。对于 2.1 版本，此行为已更改为 的完全屏障联锁操作。

当前代码可以找到这里和此处是 CompareExchange 的实际实现。关于生成的 ARM 程序集的讨论以及生成代码的示例可以在上述 PR 中看到。

There seems to be some comparison with the Win32 API functions by the same name, but this thread is all about the C# Interlocked class. From its very description, it is guaranteed that its operations are atomic. I'm not sure how that translates to "full memory barriers" as mentioned in other answers here, but judge for yourself.

On uniprocessor systems, nothing special happens, there's just a single instruction:

FASTCALL_FUNC CompareExchangeUP,12
        _ASSERT_ALIGNED_4_X86 ecx
        mov     eax, [esp+4]    ; Comparand
        cmpxchg [ecx], edx
        retn    4               ; result in EAX
FASTCALL_ENDFUNC CompareExchangeUP

But on multiprocessor systems, a hardware lock is used to prevent other cores to access the data at the same time:

FASTCALL_FUNC CompareExchangeMP,12
        _ASSERT_ALIGNED_4_X86 ecx
        mov     eax, [esp+4]    ; Comparand
  lock  cmpxchg [ecx], edx
        retn    4               ; result in EAX
FASTCALL_ENDFUNC CompareExchangeMP

An interesting read with here and there some wrong conclusions, but all-in-all excellent on the subject is this blog post on CompareExchange.

Update for ARM

As often, the answer is, "it depends". It appears that prior to 2.1, the ARM had a half-barrier. For the 2.1 release, this behavior was changed to a full barrier for the Interlocked operations.

The current code can be found here and actual implementation of CompareExchange here. Discussions on the generated ARM assembly, as well as examples on generated code can be seen in the aforementioned PR.

回复收藏 0 原文