32/64位处理器上的8/16位原子
在C ++ 11和C11中,可以使用8和16位原子。是否有在实际的现代32-和64位CPU上使用它们的陷阱?他们没有锁吗?它们比本地大小的原子慢吗?我对标准所说的内容以及它在共同体系结构上的实际实施感兴趣。
In C++11 and C11 it is possible to use 8- and 16-bit atomics. Are there any pitfalls of using them on actual modern 32- and 64-bit CPUs? Are they lock-free? Are they slower than native-size atomics? I'm interested in both what standard says about it and how it's actually implemented on common architectures.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
没有共同的陷阱或任何理由期望。
标准对此一无所知,但基本上没有任何保证的保证。但是实际上,如果
atomic< int
是没有锁定的,那么几乎可以肯定的是,atomic< int16_t>
andandomic< intomic< int8_t; int8_t>
无锁。如果有任何主流实现,那是不正确的,我会感到惊讶。X86硬件以与其他操作数大小相同的速度直接支持它们。例如
mov
load/store,对于原子RMW,锁定XADD字节[rdi],al al
在字节操作数大小以及word/dword/qword中存在。所有其他原子RMW指令都相同,包括XCHG
和cmpxchg
。其他ISA可能会对狭窄的商店(也许也可以加载)较小的放缓,例如纯载或纯存储的额外延迟周期。与核间延迟相比,这几乎可以忽略不计,即使缓存线已经很热,也很小。参见是否有现代的CPU,一个缓存的字节商店实际上比单词商店慢?(它不是原子操作的独特之处。)
大多数非X86 isas也具有相同说明的字节和16位版本提供原子RMW,例如ARM
LDREXB
/strexb
。当然,对于原子RMW,可以安全地做一个包含单词的RMW,并且可以“自然地”使用
fetch_or
或其他位boolean或cas的“自然”完成。但是我认为大多数使用的ISA都对字节和16位操作都有直接的支持,因此不需要这种技巧。There are no common pitfalls or any reason to expect any.
The standard say nothing about it, but basically nothing about performance guarantees in general. But in practice, if
atomic<int>
is lock-free, it's almost certain thatatomic<int16_t>
andatomic<int8_t>
are also lock-free. I'd be surprised if there are any mainstream implementations where that's not true.x86 hardware supports them directly, at the same speed as other operand-sizes. e.g.
mov
load/store, and for atomic RMWs,lock xadd byte [rdi], al
exists in byte operand-size as well as word/dword/qword. Same for all other atomic RMW instructions, includingxchg
andcmpxchg
.Other ISAs may have minor slowdowns for narrow stores (and maybe also loads), like a cycle of extra latency for a pure-load or pure-store. This is pretty much negligible compared to inter-core latency, and pretty minor even when a cache line is already hot. See Are there any modern CPUs where a cached byte store is actually slower than a word store? (it's not unique to atomic operations.)
Most non-x86 ISAs also have byte and 16-bit versions of the same instructions they provide for atomic RMWs, like ARM
ldrexb
/strexb
.Of course for an atomic RMW, it's also safe to do an RMW of the containing word, and that can be done "naturally" with minimal extra work for a
fetch_or
or other bitwise boolean, or a CAS. But I think most widely used ISAs have direct support for byte and 16-bit operations, so don't need that trick.