x86 上的竞争条件
有人可以解释一下这个说法:
shared variables
x = 0, y = 0
Core 1 Core 2
x = 1; y = 1;
r1 = y; r2 = x;
x86 处理器上怎么可能有 r1 == 0
和 r2 == 0
?
Could someone explain this statement:
shared variables
x = 0, y = 0
Core 1 Core 2
x = 1; y = 1;
r1 = y; r2 = x;
How is it possible to have r1 == 0
and r2 == 0
on x86 processors?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
该问题可能是由于涉及指令重新排序的优化而出现的。换句话说,两个处理器都可以在分配变量
x
和yr1
和r2
code>,如果他们发现这会产生更好的性能。这可以通过添加内存屏障来解决,这将强制执行排序约束。引用您在帖子中提到的幻灯片:
关于 x86 架构,最好阅读的资源是 Intel® 64 和 IA-32架构软件开发人员手册(第8.2 内存排序章节)。第 8.2.1 和 8.2.2 节描述了由
Intel486、奔腾、英特尔酷睿 2 双核、英特尔凌动、英特尔酷睿双核、奔腾 4、英特尔
Xeon 和 P6 系列处理器:称为处理器排序的内存模型,与较旧的 Intel386 架构的程序排序(强排序)相反(其中读和写指令总是按照它们在指令流中出现的顺序发出)。
该手册描述了处理器排序内存模型的许多排序保证(例如负载不会与其他负载重新排序,存储不会与其他存储重新排序,存储是不使用较旧的负载重新排序等),但它也描述了允许的重新排序规则,该规则会导致OP帖子中的竞争条件:
另一方面,如果指令的原始顺序被调换:
在这种情况下,处理器保证
r1 = 1
和r2 = 1
情况是不允许(由于8.2.3.3 存储不会在早期加载时重新排序保证),这意味着这些指令永远不会在各个内核中重新排序。要将其与不同的体系结构进行比较,请查看这篇文章:现代微处理器中的内存排序< /a>.您可以看到 Itanium (IA-64) 比 IA-32 架构进行了更多的重新排序:
The problem can arise due to optimizations involving reordering of instructions. In other words, both processors can assign
r1
andr2
before assigning variablesx
andy
, if they find that this would yield better performance. This can be solved by adding a memory barrier, which would enforce the ordering constraint.To quote the slideshow you mentioned in your post:
Regarding the x86 architecture, the best resource to read is Intel® 64 and IA-32 Architectures Software Developer’s Manual (Chapter 8.2 Memory Ordering). Sections 8.2.1 and 8.2.2 describe the memory-ordering implemented by
Intel486, Pentium, Intel Core 2 Duo, Intel Atom, Intel Core Duo, Pentium 4, Intel
Xeon, and P6 family processors: a memory model called processor ordering, as opposed to program ordering (strong ordering) of the older Intel386 architecture (where read and write instructions were always issued in the order they appeared in the instruction stream).
The manual describes many ordering guarantees of the processor ordering memory model (such as Loads are not reordered with other loads, Stores are not reordered with other stores, Stores are not reordered with older loads etc.), but it also describes the allowed reordering rule which causes the race condition in the OP's post:
On the other hand, if the original order of the instructions was switched:
In this case, processor guarantees that
r1 = 1
andr2 = 1
situation is not allowed (due to 8.2.3.3 Stores Are Not Reordered With Earlier Load guarantee), meaning that those instructions would never be reordered in individual cores.To compare this with different architectures, check out this article: Memory Ordering in Modern Microprocessors. You can see that Itanium (IA-64) does even more reordering than the IA-32 architecture:
在内存一致性模型较弱的处理器(例如 SPARC、PowerPC、Itanium、ARM 等)上,可能会发生上述情况,因为在没有显式内存屏障指令的情况下,写入时缺乏强制缓存一致性。因此,基本上,
Core1
在y
之前看到x
上的写入,而Core2
在y< 上看到写入/code> 在
x
之前。在这种情况下,不需要完整的栅栏指令……基本上,您只需要在这种情况下强制执行写入或释放语义,以便在对已写入的变量进行读取之前,所有写入都已提交并且对所有处理器可见。写给.具有强大内存一致性模型(例如 x86)的处理器架构通常不需要这样做,但正如 Groo 指出的那样,编译器本身可以重新排序操作。您可以在 C 和 C++ 中使用 volatile 关键字来防止编译器在给定线程中对操作进行重新排序。这并不是说 易失性 将创建线程安全代码来管理线程之间读写的可见性......这将需要内存屏障。因此,虽然使用 volatile 仍然会创建不安全的线程代码,但在给定线程内,它将在编译的机器代码级别强制执行顺序一致性。On processors with a weaker memory consistency model (such as SPARC, PowerPC, Itanium, ARM, etc.), the above condition can take place because of a lack of enforced cache-coherency on writes without an explicit memory barrier instruction. So basically
Core1
sees the write onx
beforey
, whileCore2
sees the write ony
beforex
. A full fence instruction wouldn't be required in this case ... basically you would only need to enforce write or release semantics with this scenario so that all writes are committed and visible to all processors before reads take place on those variables that have been written to. Processor architectures with strong memory consistency models like x86 typically make this unnecessary, but as Groo points out, the compiler itself could re-order the operations. You can use thevolatile
keyword in C and C++ to prevent the re-ordering of operations by the compiler within a given thread. That is not to say thatvolatile
will create thread-safe code that manages the visibility of reads and writes between threads ... a memory barrier would be required that. So while the use ofvolatile
can still create unsafe threaded code, within a given thread it will enforce sequential consistency at the complied machine-code level.这就是为什么有人说: 被认为有害的线程
问题是两个线程都不强制其两个语句之间的任何顺序,因为它们不是相互依赖的。
编译器知道x和y没有别名,因此不需要对操作进行排序。
CPU 知道 x 和 y 没有别名,因此它可能会重新排序它们以提高速度。发生这种情况的一个很好的例子是当 CPU 检测到写入组合的机会时。如果它可以在不违反其一致性模型的情况下这样做,它可以将一个写入与另一个写入合并。
这种相互依赖看起来很奇怪,但实际上与任何其他竞争条件没有什么不同。直接编写共享内存线程代码是相当困难的,这就是为什么开发了并行语言和消息传递并行框架,以便将并行危害隔离到小内核并消除应用程序本身的危害。
This is why some say: Threads Considered Harmful
The problem is that neither thread enforces any ordering between its two statements, because they are not inter-dependent.
The compiler knows that x and y are not aliased, and so it is not required to order the operations.
The CPU knows that x and y are not aliased, so it may reorder them for speed. A good example of when this happens is when the CPU detects an opportunity for write combining. It may merge one write with another if it can do so without violating its coherency model.
The mutual dependency looks odd but it's really no different than any other race condition. Directly writing shared-memory-threaded code is quite difficult, and that's why parallel languages and message-passing parallel frameworks have been developed, in order to isolate the parallel hazards to a small kernel and remove the hazards from the applications themselves.