C++0X memory_order 无栅栏、支持的应用程序、芯片

发布于 2024-10-31 06:13:23 字数 408 浏览 0 评论 0原文

作为我之前的问题的后续atomic 类使用 memory_order 参数指定大多数操作。与栅栏相比,此内存顺序仅影响其操作的原子。据推测,通过使用几个这样的原子,您可以构建一个并发算法,其中其他内存的顺序并不重要。

所以我有两个问题:

  1. 有人能给我指出一个算法/情况的示例吗?该算法/情况将受益于单个原子变量的排序并且不需要栅栏?
  2. 哪些现代处理器支持这种类型的行为?也就是说,编译器不会只是将特定顺序转换为正常的栅栏。

As a followup from my previous question, the atomic<T> class specifies most operations with a memory_order parameter. In contrast to a fence this memory order affects only the atomic on which it operates. Presumably by using several such atomics you can build a concurrent algorithm where the ordering of other memory is unimportant.

So I have two questions:

  1. Can somebody point me to an example of an algorithm/situation that would benefit from the ordering of individual atomic variables and not require fences?
  2. Which modern processors support this type of behavior? That is, where the compiler won't just translate the specific order into a normal fence.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

定格我的天空 2024-11-07 06:13:23

std::atomic 变量上的操作的内存排序参数不会影响该操作本身的排序,它会影响该操作与其他操作创建的排序关系运营。

例如 a.store(std::memory_order_release) 本身并没有告诉您关于 a 的操作如何相对于其他任何内容进行排序,而是与对 a 的调用配对使用code>a.load(std::memory_order_acquire) 来自另一个线程,然后排序其他操作——所有写入其他变量(包括非如果加载读取了存储的值,则由对 a 进行存储的线程完成的原子操作对于执行加载的线程是可见的。

在现代处理器上,一些内存操作顺序是无操作的。例如,在 x86 上,memory_order_acquirememory_order_consumememory_order_release 隐含在加载和存储指令中,并且不需要单独的栅栏。在这些情况下,顺序只会影响编译器可以执行的指令重新排序。

说明:指令中的隐式栅栏可能意味着,如果所有内存排序约束都附加到原子变量的各个操作,则编译器不需要发出任何显式栅栏指令。如果您对所有内容都使用 Memory_order_relaxed ,并添加显式栅栏,那么编译器很可能必须显式发出这些栅栏作为指令。

例如,在 x86 上,XCHG 指令带有隐式 memory_order_seq_cst 栅栏。因此,在 x86 上为下面的两个交换操作生成的代码之间没有区别 --- 它们都映射到单个 XCHG 指令:

std::atomic<int> ai;
ai.exchange(3,std::memory_order_relaxed);
ai.exchange(3,std::memory_order_seq_cst);

但是,我还不知道有任何编译器可以摆脱以下代码中的显式围栏指令:

std::atomic_thread_fence(std::memory_order_seq_cst);
ai.exchange(3,std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);

我希望编译器最终会处理该优化,但在其他类似情况下,隐式围栏将允许更好的优化。

此外,std::memory_order_consume 只能应用于变量的直接操作。

The memory ordering parameter on operations on std::atomic<T> variables does not affect the ordering of that operation per se, it affects the ordering relationships that operation creates with other operations.

e.g. a.store(std::memory_order_release) on its own tells you nothing about how operations on a are ordered with respect to anything else, but paired with a call to a.load(std::memory_order_acquire) from another thread, this then order other operations --- all writes to other variables (including non-atomic ones) done by the thread that did the store to a are visible to the thread that did the load, if that load reads the value stored.

On modern processors, some memory orderings on operations are no-ops. e.g. on x86, memory_order_acquire, memory_order_consume and memory_order_release are implicit in the load and store instructions, and do not require separate fences. In these cases the orderings just affect the instruction reordering the compiler can do.

Clarification: The implicit fences in the instructions can mean that the compiler does not need to issue any explicit fence instructions if all the memory ordering constraints are attached to individual operations on atomic variables. If you use memory_order_relaxed for everything, and add explicit fences then the compiler may well have to explicitly issue those fences as instructions.

e.g. on x86, the XCHG instruction carries with it an implicit memory_order_seq_cst fence. There is thus no difference between the generated code for the two exchange operations below on x86 --- they both map to a single XCHG instruction:

std::atomic<int> ai;
ai.exchange(3,std::memory_order_relaxed);
ai.exchange(3,std::memory_order_seq_cst);

However, I'm not yet aware of any compiler that get rid of the explicit fence instructions in the following code:

std::atomic_thread_fence(std::memory_order_seq_cst);
ai.exchange(3,std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);

I expect compilers will handle that optimization eventually, but there are other similar cases where the implicit fences will allow better optimization.

Also, std::memory_order_consume can only be applied to direct operations on variables.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文