Interlocked.CompareExchange 是否使用内存屏障?

发布于 2024-08-07 23:45:15 字数 723 浏览 5 评论 0原文

我正在阅读 Joe Duffy 关于 易失性读写和及时性,我试图了解有关帖子中最后一个代码示例的一些内容:

while (Interlocked.CompareExchange(ref m_state, 1, 0) != 0) ;
m_state = 0;
while (Interlocked.CompareExchange(ref m_state, 1, 0) != 0) ;
m_state = 0;
… 

当执行第二个 CMPXCHG 操作时,它是否使用内存屏障来确保 m_state 的值确实是写入的最新值?或者它只会使用已经存储在处理器缓存中的一些值? (假设m_state未声明为易失性)。
如果我理解正确,如果 CMPXCHG 不使用内存屏障,那么整个锁获取过程将不公平,因为第一个获取锁的线程很可能将获取 <强>所有以下锁。我理解正确吗,还是我在这里错过了一些东西?

编辑:主要问题实际上是在尝试读取 m_state 的值之前调用 CompareExchange 是否会导致内存障碍。因此,当所有线程再次尝试调用 CompareExchange 时,分配 0 是否对所有线程都可见。

I'm reading Joe Duffy's post about Volatile reads and writes, and timeliness, and i'm trying to understand something about the last code sample in the post:

while (Interlocked.CompareExchange(ref m_state, 1, 0) != 0) ;
m_state = 0;
while (Interlocked.CompareExchange(ref m_state, 1, 0) != 0) ;
m_state = 0;
… 

When the second CMPXCHG operation is executed, does it use a memory barrier to ensure that the value of m_state is indeed the latest value written to it? Or will it just use some value that is already stored in the processor's cache? (assuming m_state isn't declared as volatile).
If I understand correctly, if CMPXCHG won't use a memory barrier, then the whole lock acquisition procedure won't be fair since it's highly likely that the thread that was the first to acquire the lock, will be the one that will acquire all of following locks. Did I understand correctly, or am I missing out on something here?

Edit: The main question is actually whether calling to CompareExchange will cause a memory barrier before attempting to read m_state's value. So whether assigning 0 will be visible to all of the threads when they try to call CompareExchange again.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

不打扰别人 2024-08-14 23:45:15

任何具有 lock 前缀的 x86 指令都具有完整内存屏障。如 Abel 的回答所示,Interlocked* API 和 CompareExchanges 使用 lock 前缀指令,例如 lock cmpxchg。所以,它暗示着记忆栅栏。

是的,Interlocked.CompareExchange 使用内存屏障。

为什么?因为 x86 处理器就是这样做的。摘自英特尔的第 3A 卷:系统编程指南第 1 部分,第 7.1.2.2 节:

对于 P6 系列处理器,锁定操作会序列化所有未完成的加载和存储操作(即等待它们完成)。此规则也适用于 Pentium 4 和 Intel Xeon 处理器,但有一个例外。引用弱有序内存类型(例如 WC 内存类型)的加载操作可能无法序列化。

易失性与此讨论无关。这是关于原子操作的;为了支持 CPU 中的原子操作,x86 保证完成所有先前的加载和存储。

Any x86 instruction that has lock prefix has full memory barrier. As shown Abel's answer, Interlocked* APIs and CompareExchanges use lock-prefixed instruction such as lock cmpxchg. So, it implies memory fence.

Yes, Interlocked.CompareExchange uses a memory barrier.

Why? Because x86 processors did so. From Intel's Volume 3A: System Programming Guide Part 1, Section 7.1.2.2:

For the P6 family processors, locked operations serialize all outstanding load and store operations (that is, wait for them to complete). This rule is also true for the Pentium 4 and Intel Xeon processors, with one exception. Load operations that reference weakly ordered memory types (such as the WC memory type) may not be serialized.

volatile has nothing to do with this discussion. This is about atomic operations; to support atomic operations in CPU, x86 guarantees all previous loads and stores to be completed.

空气里的味道 2024-08-14 23:45:15

ref 不遵守通常的 volatile 规则,尤其是在以下情况下:

volatile bool myField;
...
RunMethod(ref myField);
...
void RunMethod(ref bool isDone) {
    while(!isDone) {} // silly example
}

这里,RunMethod 不能保证发现 的外部更改isDone 即使基础字段 (myField) 是 易失性RunMethod 不知道这一点,因此没有正确的代码。

然而!这应该不是问题:

  • 如果您使用 Interlocked,则使用 Interlocked所有访问该字段
  • (如果您使用 Interlocked) >lock,然后使用 lock 对字段的所有访问

遵循这些规则,它应该可以正常工作。


重新编辑;是的,该行为是 Interlocked 的关键部分。老实说,我不知道它是如何实现的(内存屏障等 - 请注意它们是“InternalCall”方法,所以我无法检查;-p) - 但是是的:来自一个线程的更新将立即可见所有其他只要他们使用互锁方法(因此我上面的观点)。

ref doesn't respect the usual volatile rules, especially in things like:

volatile bool myField;
...
RunMethod(ref myField);
...
void RunMethod(ref bool isDone) {
    while(!isDone) {} // silly example
}

Here, RunMethod is not guaranteed to spot external changes to isDone even though the underlying field (myField) is volatile; RunMethod doesn't know about it, so doesn't have the right code.

However! This should be a non-issue:

  • if you are using Interlocked, then use Interlocked for all access to the field
  • if you are using lock, then use lock for all access to the field

Follow those rules and it should work OK.


Re the edit; yes, that behaviour is a critical part of Interlocked. To be honest, I don't know how it is implemented (memory barrier, etc - note they are "InternalCall" methods, so I can't check ;-p) - but yes: updates from one thread will be immediately visible to all others as long as they use the Interlocked methods (hence my point above).

潇烟暮雨 2024-08-14 23:45:15

似乎与同名的 Win32 API 函数有一些比较,但这个线程都是关于 C# Interlocked 类的。从它的描述来看,可以保证它的操作是原子的。我不确定这如何转化为其他答案中提到的“完全记忆障碍”,但请自行判断。

在单处理器系统上,没有什么特别的事情发生,只有一条指令:

FASTCALL_FUNC CompareExchangeUP,12
        _ASSERT_ALIGNED_4_X86 ecx
        mov     eax, [esp+4]    ; Comparand
        cmpxchg [ecx], edx
        retn    4               ; result in EAX
FASTCALL_ENDFUNC CompareExchangeUP

但是在多处理器系统上,硬件锁用于防止其他内核同时访问数据:

FASTCALL_FUNC CompareExchangeMP,12
        _ASSERT_ALIGNED_4_X86 ecx
        mov     eax, [esp+4]    ; Comparand
  lock  cmpxchg [ecx], edx
        retn    4               ; result in EAX
FASTCALL_ENDFUNC CompareExchangeMP

有趣的阅读,这里和那里有一些错误的结论,但所有- 在这个主题上非常出色的是这篇 CompareExchange 上的博客文章

ARM 更新

通常,答案是“这取决于情况”。看来在 2.1 之前,ARM 有一个半屏障。对于 2.1 版本,此行为已更改为 的完全屏障联锁操作。

当前代码可以找到 这里此处是 CompareExchange 的实际实现。关于生成的 ARM 程序集的讨论以及生成代码的示例可以在上述 PR 中看到。

There seems to be some comparison with the Win32 API functions by the same name, but this thread is all about the C# Interlocked class. From its very description, it is guaranteed that its operations are atomic. I'm not sure how that translates to "full memory barriers" as mentioned in other answers here, but judge for yourself.

On uniprocessor systems, nothing special happens, there's just a single instruction:

FASTCALL_FUNC CompareExchangeUP,12
        _ASSERT_ALIGNED_4_X86 ecx
        mov     eax, [esp+4]    ; Comparand
        cmpxchg [ecx], edx
        retn    4               ; result in EAX
FASTCALL_ENDFUNC CompareExchangeUP

But on multiprocessor systems, a hardware lock is used to prevent other cores to access the data at the same time:

FASTCALL_FUNC CompareExchangeMP,12
        _ASSERT_ALIGNED_4_X86 ecx
        mov     eax, [esp+4]    ; Comparand
  lock  cmpxchg [ecx], edx
        retn    4               ; result in EAX
FASTCALL_ENDFUNC CompareExchangeMP

An interesting read with here and there some wrong conclusions, but all-in-all excellent on the subject is this blog post on CompareExchange.

Update for ARM

As often, the answer is, "it depends". It appears that prior to 2.1, the ARM had a half-barrier. For the 2.1 release, this behavior was changed to a full barrier for the Interlocked operations.

The current code can be found here and actual implementation of CompareExchange here. Discussions on the generated ARM assembly, as well as examples on generated code can be seen in the aforementioned PR.

奶气 2024-08-14 23:45:15

MSDN 介绍了 Win32 API 函数:
大多数互锁函数在所有 Windows 平台上都提供完整的内存屏障

(具有显式获取/释放语义的互锁函数除外)

由此我可以得出结论,C# 运行时的互锁函数做出了相同的保证,因为它们的记录具有相同的行为(并且它们解析为我所知道的平台上的内部 CPU 语句)。不幸的是,由于 MSDN 倾向于提供示例而不是文档,因此没有明确说明。

MSDN says about the Win32 API functions:
"Most of the interlocked functions provide full memory barriers on all Windows platforms"

(the exceptions are Interlocked functions with explicit Acquire / Release semantics)

From that I would conclude that the C# runtime's Interlocked makes the same guarantees, as they are documented withotherwise identical behavior (and they resolve to intrinsic CPU statements on the platforms i know). Unfortunately, with MSDN's tendency to put up samples instead of documentation, it isn't spelled out explicitly.

明天过后 2024-08-14 23:45:15

根据 ECMA-335(第 I.12.6.5 节):

5.
显式原子操作。
类库提供了多种原子操作

系统.线程.互锁
班级。这些操作(例如,增量、
递减、交换和比较交换)执行隐式获取/释放
操作。


因此,这些操作遵循“最小惊讶原则”。

According to ECMA-335 (section I.12.6.5):

5.
Explicit atomic operations.
The class library provides a variety of atomic operations
in the
System.Threading.Interlocked
class. These operations (e.g., Increment,
Decrement, Exchange, and CompareExchange) perform implicit acquire/release
operations
.

So, these operations follow principle of least astonishment.

壹場煙雨 2024-08-14 23:45:15

互锁函数保证在解析操作数时停止总线和 CPU。直接的后果是,无论是你的CPU还是其他CPU上的线程切换都不会在执行过程中中断互锁函数。

由于您传递的是对 C# 函数的引用,因此底层汇编代码将使用实际整数的地址,因此变量访问不会被优化。它将完全按照预期工作。

编辑:这里有一个链接可以更好地解释asm指令的行为: http://faydoc.tripod .com/cpu/cmpxchg.htm
正如您所看到的,总线通过强制写入周期而停止,因此尝试同时使用总线的任何其他“线程”(读取:其他 cpu 核心)将被放入等待队列中。

The interlocked functions are guaranteed to stall the bus and the cpu while it resolves the operands. The immediate consequence is that no thread switch, on your cpu or another one, will interrupt the interlocked function in the middle of its execution.

Since you're passing a reference to the c# function, the underlying assembler code will work with the address of the actual integer, so the variable access won't be optimized away. It will work exactly as expected.

edit: Here's a link that explains the behaviour of the asm instruction better: http://faydoc.tripod.com/cpu/cmpxchg.htm
As you can see, the bus is stalled by forcing a write cycle, so any other "threads" (read: other cpu cores) that would try to use the bus at the same time would be put in a waiting queue.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文