无序执行和内存栅栏

发布于 2024-12-03 10:34:04 字数 695 浏览 3 评论 0原文

我知道现代 CPU 可能会无序执行，但是它们总是按顺序收回结果，如维基百科所述。

“乱序处理器用其他准备好的指令及时填充这些“槽”，然后在最后对结果重新排序，使指令看起来像是正常处理的。”

现在内存据说使用多核平台时需要栅栏，因为由于乱序执行，可能会在此处打印错误的 x 值。

Processor #1:
 while f == 0
  ;
 print x; // x might not be 42 here

Processor #2:
 x = 42;
 // Memory fence required here
 f = 1

现在我的问题是，由于无序处理器（我假设是多核处理器的核心）总是按顺序退出结果，那么内存栅栏的必要性是什么。多核处理器的核心是否只能看到从其他核心退役的结果，或者它们也看到正在运行的结果？

我的意思是，在上面给出的示例中，当处理器 2 最终将放弃结果时，x 的结果应该出现在 f 之前，对吗？我知道在乱序执行期间，它可能会在 x 之前修改 f，但它一定没有在 x 之前退休它，对吗？

现在，随着结果按顺序退出和缓存一致性机制到位，为什么在 x86 中还需要内存栅栏呢？

原文

I know that modern CPUs can execute out of order, However they always retire the results in-order, as described by wikipedia.

"Out of Oder processors fill these "slots" in time with other instructions that are ready, then re-order the results at the end to make it appear that the instructions were processed as normal."

Now memory fences are said to be required when using multicore platforms, because owing to Out of Order execution, wrong value of x can be printed here.

Processor #1:
 while f == 0
  ;
 print x; // x might not be 42 here

Processor #2:
 x = 42;
 // Memory fence required here
 f = 1

Now my question is, since Out of Order Processors (Cores in case of MultiCore Processors I assume) always retire the results In-Order, then what is the necessity of Memory fences. Don't the cores of a multicore processor sees results retired from other cores only or they also see results which are in-flight?

I mean in the example I gave above, when Processor 2 will eventually retire the results, the result of x should come before f, right? I know that during out of order execution it might have modified f before x but it must have not retired it before x, right?

Now with In-Order retiring of results and cache coherence mechanism in place, why would you ever need memory fences in x86?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凉世弥音 2024-12-10 10:34:04

本教程解释了这些问题：http://www.hpl .hp.com/techreports/Compaq-DEC/WRL-95-7.pdf

FWIW，现代 x86 处理器上会发生内存排序问题，原因是虽然 x86 内存一致性模型提供了相当多的内存排序问题，但强一致性，需要显式屏障来处理写后读一致性。这是由于所谓的“存储缓冲区”的原因。

也就是说，x86 是顺序一致的（很好并且很容易推理），除了加载可能会相对于较早的存储重新排序。也就是说，如果处理器

store x
load y

在处理器总线上执行该序列，则这可以被视为。

load y
store x

这种行为的原因是前面提到的存储缓冲区，它是用于在写入到系统总线上之前写入的小缓冲区。 OTOH，加载延迟是性能的一个关键问题，因此允许加载“插队”。

请参阅 http://download.intel.com/design/processor/manuals 中的第 8.2 节/253668.pdf

This tutorial explains the issues: http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-95-7.pdf

FWIW, where memory ordering issues happen on modern x86 processors, the reason is that while the x86 memory consistency model offers quite strong consistency, explicit barriers are needed to handle read-after-write consistency. This is due to something called the "store buffer".

That is, x86 is sequentially consistent (nice and easy to reason about) except that loads may be reordered wrt earlier stores. That is, if the processor executes the sequence

store x
load y

then on the processor bus this may be seen as

load y
store x

The reason for this behavior is the afore-mentioned store buffer, which is a small buffer for writes before they go out on the system bus. Load latency is, OTOH, a critical issue for performance, and hence loads are permitted to "jump the queue".

See Section 8.2 in http://download.intel.com/design/processor/manuals/253668.pdf

回复收藏 0 原文