C 中机器代码的比较和交换
如何使用嵌入式机器代码(假设是 x86 架构)在 C 中编写一个函数,对整数值进行原子比较和交换?如果只为 i7 处理器编写,可以更具体吗?
翻译是否充当内存栅栏,或者它只是确保比较和交换中包含的内存位置上的排序关系?与内存栅栏相比,它的成本有多高?
谢谢。
How would you write a function in C which does an atomic compare and swap on an integer value, using embedded machine code (assuming, say, x86 architecture)? Can it be any more specific if its written only for the i7 processor?
Does the translation act as a memory fence, or does it just ensure ordering relation just on that memory location included in the compare and swap? How costly is it compared to a memory fence?
Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
最简单的方法可能是使用编译器内部函数 就像_InterlockedCompareExchange()。它看起来像一个函数,但实际上是编译器中的一个特殊情况,可以归结为单个机器操作。对于 MSVC x86 内在函数,它也可以用作读/写栅栏,但在其他平台上不一定如此。 (例如,在 PowerPC 上,您需要显式发出 lwsync 来隔离内存重新排序。)
一般来说,在许多常见系统上,比较和交换操作通常仅在其所触及的一个地址上强制执行原子事务。其他内存访问可以重新排序,并且在多核系统中,除了您交换的内存地址之外的内存地址在核心之间可能不一致。
The easiest way to do it is probably with a compiler intrinsic like _InterlockedCompareExchange(). It looks like a function but is actually a special case in the compiler that boils down to a single machine op. In the case of the MSVC x86 intrinsic, that works as a read/write fence as well, but that's not necessarily true on other platforms. (For example, on the PowerPC, you'd need to explicitly issue a lwsync to fence memory reordering.)
In general, on many common systems, a compare-and-swap operation usually only enforces an atomic transaction upon the one address it's touching. Other memory access can be reordered, and in multicore systems, memory addresses other than the one you've swapped may not be coherent between the cores.
您可以使用带有
LOCK
前缀的 CMPXCHG 指令来进行原子执行。例如
或
将 EAX 寄存器中的值与 EBX 寄存器中存储的地址处的值进行比较,如果相同,则将 EDX 寄存器中的值存储到该位置,否则加载 EBX 中存储的地址处的值注册到 EAX 中。
您需要有 486 或更高版本才能使用此指令。
You can use the
CMPXCHG
instruction with theLOCK
prefix for atomic execution.E.g.
or
This compares the value in the EAX register with the value at the address stored in the EBX register and stores the value in the EDX register to that location if they are the same, otherwise it loads the value at the address stored in the EBX register into EAX.
You need to have a 486 or later for this instruction to be available.
如果您的整数值是 64 位,则在 IA32 x86 下使用 cmpxchg8b 8 字节比较和交换。
变量必须是 8 字节对齐。
If your integer value is 64 bit than use cmpxchg8b 8 byte compare and exchange under IA32 x86.
Variable must be 8 byte aligned.
如果原子处理器指令中省略 LOCK 前缀,则将无法保证跨多处理器环境的原子操作。
不带LOCK 前缀操作将保证不会被当前处理器/核心上的任何事件(中断)中断。
If the LOCK prefix is omitted in atomic processor instructions, atomic operation across multiprocessor environment will not be guaranteed.
Without LOCK prefix the operation will guarantee not being interrupted by any event (interrupt) on current processor/core only.
有趣的是,有些处理器不提供比较交换,而是提供一些其他指令(“加载链接”和“条件存储”),这些指令可用于合成不幸命名的比较和交换(名称听起来应该类似于“compare-exchange”,但实际上应该称为“compare-and-store”,因为它进行比较,如果值匹配则存储,并指示值是否匹配并执行存储)。这些指令无法综合比较交换语义(提供比较失败时读取的值),但在某些情况下可以避免比较交换中存在的 ABA 问题。许多算法都是用“CAS”操作来描述的,因为它们可以在两种类型的 CPU 上使用。
“加载链接”指令告诉处理器读取内存位置并以某种方式观察它是否可能被写入。仅当自上次“加载链接”操作以来没有任何内容可以写入时,“条件存储”指令才指示处理器写入内存位置。请注意,该决定可能是悲观的;例如,处理中断可能会使“加载链接”/“条件存储”序列无效。同样,在多处理器系统中,LL/CS 序列可能会因另一个 CPU 访问与正在监视的位置相同的高速缓存行上的位置而无效,即使没有触及正在监视的实际位置。在典型用法中,LL/CS 非常紧密地一起使用,并带有重试循环,因此错误的失效可能会稍微减慢速度,但不会造成太大麻烦。
It's interesting to note that some processors don't provide a compare-exchange, but instead provide some other instructions ("Load Linked" and "Conditional Store") that can be used to synthesize the unfortunately-named compare-and-swap (the name sounds like it should be similar to "compare-exchange" but should really be called "compare-and-store" since it does the comparison, stores if the value matches, and indicates whether the value matched and the store was performed). The instructions cannot synthesize compare-exchange semantics (which provides the value that was read in case the compare failed), but may in some cases avoid the ABA problem which is present with Compare-Exchange. Many algorithms are described in terms of "CAS" operations because they can be used on both styles of CPU.
A "Load Linked" instruction tells the processor to read a memory location and watch in some way to see if it might be written. A "Conditional Store" instruction instructs the processor to write a memory location only if nothing can have written it since the last "Load Linked" operation. Note that the determination may be pessimistic; processing an interrupt, for example, may invalidate a "Load-Linked"/"Conditional Store" sequence. Likewise in a multi-processor system, an LL/CS sequence may be invalidated by another CPU accessing to a location on the same cache line as the location being watched, even if the actual location being watched wasn't touched. In typical usage, LL/CS are used very close together, with a retry loop, so that erroneous invalidations may slow things down a little but won't cause much trouble.