无序执行和内存栅栏
我知道现代 CPU 可能会无序执行,但是它们总是按顺序收回结果,如维基百科所述。
“乱序处理器用其他准备好的指令及时填充这些“槽”,然后在最后对结果重新排序,使指令看起来像是正常处理的。”
现在内存据说使用多核平台时需要栅栏,因为由于乱序执行,可能会在此处打印错误的 x 值。
Processor #1:
while f == 0
;
print x; // x might not be 42 here
Processor #2:
x = 42;
// Memory fence required here
f = 1
现在我的问题是,由于无序处理器(我假设是多核处理器的核心)总是按顺序退出结果,那么内存栅栏的必要性是什么。多核处理器的核心是否只能看到从其他核心退役的结果,或者它们也看到正在运行的结果?
我的意思是,在上面给出的示例中,当处理器 2 最终将放弃结果时,x 的结果应该出现在 f 之前,对吗?我知道在乱序执行期间,它可能会在 x 之前修改 f,但它一定没有在 x 之前退休它,对吗?
现在,随着结果按顺序退出和缓存一致性机制到位,为什么在 x86 中还需要内存栅栏呢?
I know that modern CPUs can execute out of order, However they always retire the results in-order, as described by wikipedia.
"Out of Oder processors fill these "slots" in time with other instructions that are ready, then re-order the results at the end to make it appear that the instructions were processed as normal."
Now memory fences are said to be required when using multicore platforms, because owing to Out of Order execution, wrong value of x can be printed here.
Processor #1:
while f == 0
;
print x; // x might not be 42 here
Processor #2:
x = 42;
// Memory fence required here
f = 1
Now my question is, since Out of Order Processors (Cores in case of MultiCore Processors I assume) always retire the results In-Order, then what is the necessity of Memory fences. Don't the cores of a multicore processor sees results retired from other cores only or they also see results which are in-flight?
I mean in the example I gave above, when Processor 2 will eventually retire the results, the result of x should come before f, right? I know that during out of order execution it might have modified f before x but it must have not retired it before x, right?
Now with In-Order retiring of results and cache coherence mechanism in place, why would you ever need memory fences in x86?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
本教程解释了这些问题:http://www.hpl .hp.com/techreports/Compaq-DEC/WRL-95-7.pdf
FWIW,现代 x86 处理器上会发生内存排序问题,原因是虽然 x86 内存一致性模型提供了相当多的内存排序问题,但强一致性,需要显式屏障来处理写后读一致性。这是由于所谓的“存储缓冲区”的原因。
也就是说,x86 是顺序一致的(很好并且很容易推理),除了加载可能会相对于较早的存储重新排序。也就是说,如果处理器
在处理器总线上执行该序列,则这可以被视为。
这种行为的原因是前面提到的存储缓冲区,它是用于在写入到系统总线上之前写入的小缓冲区。 OTOH,加载延迟是性能的一个关键问题,因此允许加载“插队”。
请参阅 http://download.intel.com/design/processor/manuals 中的第 8.2 节/253668.pdf
This tutorial explains the issues: http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-95-7.pdf
FWIW, where memory ordering issues happen on modern x86 processors, the reason is that while the x86 memory consistency model offers quite strong consistency, explicit barriers are needed to handle read-after-write consistency. This is due to something called the "store buffer".
That is, x86 is sequentially consistent (nice and easy to reason about) except that loads may be reordered wrt earlier stores. That is, if the processor executes the sequence
then on the processor bus this may be seen as
The reason for this behavior is the afore-mentioned store buffer, which is a small buffer for writes before they go out on the system bus. Load latency is, OTOH, a critical issue for performance, and hence loads are permitted to "jump the queue".
See Section 8.2 in http://download.intel.com/design/processor/manuals/253668.pdf
内存栅栏可确保栅栏之前的变量的所有更改对所有其他核心都可见,以便所有核心都具有最新的数据视图。
如果不设置内存栅栏,内核可能会处理错误的数据,尤其是在多个内核处理相同数据集的场景中。在这种情况下,您可以确保当 CPU 0 执行某些操作时,对数据集所做的所有更改现在对所有其他核心可见,然后这些核心可以使用最新信息。
如果核心开始处理数据集中的过时数据,它如何才能获得正确的结果?如果最终结果呈现得好像一切都按正确的顺序完成一样,那也没关系。
关键位于存储缓冲区中,它位于高速缓存和 CPU 之间,其作用如下:
这意味着东西将被写入此缓冲区,然后在某个时刻将缓冲区写入高速缓存。因此,缓存可能包含不是最新数据的视图,因此另一个 CPU 通过缓存一致性也不会拥有最新数据。存储缓冲区刷新对于最新数据可见是必要的,我认为这本质上是内存栅栏在硬件级别导致发生的情况。
编辑:
对于您用作示例的代码,维基百科是这样说的:
The memory fence ensures that all changes to variables before the fence are visible to all other cores, so that all cores have an up to date view of the data.
If you don't put a memory fence, the cores might be working with wrong data, this can be seen especially in scenario's, where multiple cores would be working on the same datasets. In this case you can ensure that when CPU 0 has done some action, that all changes done to the dataset are now visible to all other cores, whom can then work with up to date information.
If a core were to start working with outdated data on the dataset, how could it ever get the correct results? It couldn't no matter if the end result were to be presented as-if all was done in the right order.
The key is in the store buffer, which sits between the cache and the CPU, and does this:
That means that things will be written to this buffer, and then at some point will the buffer be written to the cache. So the cache could contain a view of data that is not the most recent, and therefore another CPU, through cache coherency, will also not have the latest data. A store buffer flush is necessary for the latest data to be visible, this, I think is essentially what the memory fence will cause to happen at hardware level.
EDIT:
For the code you used as an example, Wikipedia says this:
只是为了明确前面答案中隐含的内容,这是正确的,但与内存访问不同:
指令的退出与执行内存访问是分开的,内存访问可能在指令退出的不同时间完成。
每个核心都会表现得好像它自己的内存访问发生在退休时,但其他核心可能会在不同时间看到这些访问。
(在 x86 和 ARM 上,我认为只有存储明显受到此影响,但例如,Alpha 可能会从内存加载旧值。x86 SSE2 的指令比正常的 x86 行为具有更弱的保证)。
附言。根据记忆,废弃的 Sparc ROCK 实际上可能会无序退役,它消耗了电力和晶体管来确定何时这是无害的。由于功耗和晶体管数量,它被废弃了……我不相信市场上有任何通用 CPU 会因无序退役而被购买。
Just to make explicit what is implicit in the previous answers, this is correct, but is distinct from memory accesses:
Retirement of the instruction is separate from performing the memory access, the memory access may complete at a different time to instruction retirement.
Each core will act as if it's own memory accesses occur at retirement, but other cores may see those accesses at different times.
(On x86 and ARM, I think only stores are observably subject to this, but e.g., Alpha may load an old value from memory. x86 SSE2 has instructions with weaker guarentees than normal x86 behaviour).
PS. From memory the abandoned Sparc ROCK could in fact retire out-of-order, it spent power and transistors determining when this was harmless. It got abandoned because of power consumption and transistor count... I don't believe any general purpose CPU has been bought to market with out-of-order retirement.