32/64位处理器上的8/16位原子

发布于 2025-02-07 01:42:56 字数 110 浏览 3 评论 0原文

在C ++ 11和C11中,可以使用8和16位原子。是否有在实际的现代32-和64位CPU上使用它们的陷阱?他们没有锁吗?它们比本地大小的原子慢吗?我对标准所说的内容以及它在共同体系结构上的实际实施感兴趣。

In C++11 and C11 it is possible to use 8- and 16-bit atomics. Are there any pitfalls of using them on actual modern 32- and 64-bit CPUs? Are they lock-free? Are they slower than native-size atomics? I'm interested in both what standard says about it and how it's actually implemented on common architectures.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

夏天碎花小短裙 2025-02-14 01:42:56

没有共同的陷阱或任何理由期望。

标准对此一无所知,但基本上没有任何保证的保证。但是实际上,如果atomic< int是没有锁定的,那么几乎可以肯定的是,atomic< int16_t> and andomic< intomic< int8_t; int8_t>无锁。如果有任何主流实现,那是不正确的,我会感到惊讶。

X86硬件以与其他操作数大小相同的速度直接支持它们。例如mov load/store,对于原子RMW,锁定XADD字节[rdi],al al在字节操作数大小以及word/dword/qword中存在。所有其他原子RMW指令都相同,包括XCHGcmpxchg

其他ISA可能会对狭窄的商店(也许也可以加载)较小的放缓,例如纯载或纯存储的额外延迟周期。与核间延迟相比,这几乎可以忽略不计,即使缓存线已经很热,也很小。参见是否有现代的CPU,一个缓存的字节商店实际上比单词商店慢?(它不是原子操作的独特之处。)

大多数非X86 isas也具有相同说明的字节和16位版本提供原子RMW,例如ARM LDREXB/strexb

当然,对于原子RMW,可以安全地做一个包含单词的RMW,并且可以“自然地”使用fetch_or或其他位boolean或cas的“自然”完成。但是我认为大多数使用的ISA都对字节和16位操作都有直接的支持,因此不需要这种技巧。

There are no common pitfalls or any reason to expect any.

The standard say nothing about it, but basically nothing about performance guarantees in general. But in practice, if atomic<int> is lock-free, it's almost certain that atomic<int16_t> and atomic<int8_t> are also lock-free. I'd be surprised if there are any mainstream implementations where that's not true.

x86 hardware supports them directly, at the same speed as other operand-sizes. e.g. mov load/store, and for atomic RMWs, lock xadd byte [rdi], al exists in byte operand-size as well as word/dword/qword. Same for all other atomic RMW instructions, including xchg and cmpxchg.

Other ISAs may have minor slowdowns for narrow stores (and maybe also loads), like a cycle of extra latency for a pure-load or pure-store. This is pretty much negligible compared to inter-core latency, and pretty minor even when a cache line is already hot. See Are there any modern CPUs where a cached byte store is actually slower than a word store? (it's not unique to atomic operations.)

Most non-x86 ISAs also have byte and 16-bit versions of the same instructions they provide for atomic RMWs, like ARM ldrexb / strexb.

Of course for an atomic RMW, it's also safe to do an RMW of the containing word, and that can be done "naturally" with minimal extra work for a fetch_or or other bitwise boolean, or a CAS. But I think most widely used ISAs have direct support for byte and 16-bit operations, so don't need that trick.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文