C++0X memory_order 无栅栏、支持的应用程序、芯片
作为我之前的问题的后续,atomic
类使用 memory_order
参数指定大多数操作。与栅栏相比,此内存顺序仅影响其操作的原子。据推测,通过使用几个这样的原子,您可以构建一个并发算法,其中其他内存的顺序并不重要。
所以我有两个问题:
- 有人能给我指出一个算法/情况的示例吗?该算法/情况将受益于单个原子变量的排序并且不需要栅栏?
- 哪些现代处理器支持这种类型的行为?也就是说,编译器不会只是将特定顺序转换为正常的栅栏。
As a followup from my previous question, the atomic<T>
class specifies most operations with a memory_order
parameter. In contrast to a fence this memory order affects only the atomic on which it operates. Presumably by using several such atomics you can build a concurrent algorithm where the ordering of other memory is unimportant.
So I have two questions:
- Can somebody point me to an example of an algorithm/situation that would benefit from the ordering of individual atomic variables and not require fences?
- Which modern processors support this type of behavior? That is, where the compiler won't just translate the specific order into a normal fence.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
std::atomic
变量上的操作的内存排序参数不会影响该操作本身的排序,它会影响该操作与其他操作创建的排序关系运营。例如
a.store(std::memory_order_release)
本身并没有告诉您关于a
的操作如何相对于其他任何内容进行排序,而是与对a
的调用配对使用code>a.load(std::memory_order_acquire) 来自另一个线程,然后排序其他操作——所有写入其他变量(包括非如果加载读取了存储的值,则由对a
进行存储的线程完成的原子操作对于执行加载的线程是可见的。在现代处理器上,一些内存操作顺序是无操作的。例如,在 x86 上,
memory_order_acquire
、memory_order_consume
和memory_order_release
隐含在加载和存储指令中,并且不需要单独的栅栏。在这些情况下,顺序只会影响编译器可以执行的指令重新排序。说明:指令中的隐式栅栏可能意味着,如果所有内存排序约束都附加到原子变量的各个操作,则编译器不需要发出任何显式栅栏指令。如果您对所有内容都使用 Memory_order_relaxed ,并添加显式栅栏,那么编译器很可能必须显式发出这些栅栏作为指令。
例如,在 x86 上,
XCHG
指令带有隐式memory_order_seq_cst
栅栏。因此,在 x86 上为下面的两个交换操作生成的代码之间没有区别 --- 它们都映射到单个XCHG
指令:但是,我还不知道有任何编译器可以摆脱以下代码中的显式围栏指令:
我希望编译器最终会处理该优化,但在其他类似情况下,隐式围栏将允许更好的优化。
此外,
std::memory_order_consume
只能应用于变量的直接操作。The memory ordering parameter on operations on
std::atomic<T>
variables does not affect the ordering of that operation per se, it affects the ordering relationships that operation creates with other operations.e.g.
a.store(std::memory_order_release)
on its own tells you nothing about how operations ona
are ordered with respect to anything else, but paired with a call toa.load(std::memory_order_acquire)
from another thread, this then order other operations --- all writes to other variables (including non-atomic ones) done by the thread that did the store toa
are visible to the thread that did the load, if that load reads the value stored.On modern processors, some memory orderings on operations are no-ops. e.g. on x86,
memory_order_acquire
,memory_order_consume
andmemory_order_release
are implicit in the load and store instructions, and do not require separate fences. In these cases the orderings just affect the instruction reordering the compiler can do.Clarification: The implicit fences in the instructions can mean that the compiler does not need to issue any explicit fence instructions if all the memory ordering constraints are attached to individual operations on atomic variables. If you use
memory_order_relaxed
for everything, and add explicit fences then the compiler may well have to explicitly issue those fences as instructions.e.g. on x86, the
XCHG
instruction carries with it an implicitmemory_order_seq_cst
fence. There is thus no difference between the generated code for the two exchange operations below on x86 --- they both map to a singleXCHG
instruction:However, I'm not yet aware of any compiler that get rid of the explicit fence instructions in the following code:
I expect compilers will handle that optimization eventually, but there are other similar cases where the implicit fences will allow better optimization.
Also,
std::memory_order_consume
can only be applied to direct operations on variables.