不可分割的操作在多处理器和多核系统上仍然是不可分割的吗?
根据标题,加上限制和陷阱。
例如,在 x86 处理器上,大多数数据类型的对齐是可选的 - 这是一种优化而不是要求。这意味着指针可能存储在未对齐的地址处,这又意味着指针可能会在缓存页边界上分割。
显然,如果您在任何处理器上足够努力(挑选特定字节等),则可以完成此操作,但不是以您仍然期望写入操作不可分割的方式完成。
我严重怀疑多核处理器能否确保其他核心能够保证在这种未对齐的跨页边界写入情况下写入指针的一致的前后视图。
我说得对吗?还有我没有想到的类似问题吗?
As per the title, plus what are the limitations and gotchas.
For example, on x86 processors, alignment for most data types is optional - an optimisation rather than a requirement. That means that a pointer may be stored at an unaligned address, which in turn means that pointer might be split over a cache page boundary.
Obviously this could be done if you work hard enough on any processor (picking out particular bytes etc), but not in a way where you'd still expect the write operation to be indivisible.
I seriously doubt that a multicore processor can ensure that other cores can guarantee a consistent all-before or all-after view of a written pointer in this unaligned-write-crossing-a-page-boundary situation.
Am I right? And are there any similar gotchas I haven't thought of?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
所有线程可见的单个内存的概念不再适用于具有单独缓存的多个核心。 StackOverflow 关于内存屏障的问题可能会引起兴趣;比如说,这个。
我认为可以用一个例子来说明“单一内存”模型的问题:
最初,x = y = 0。
线程 1:
线程 2:
当然,存在竞争条件。除了明显的竞争条件之外,第二个问题是一种可能的结果是 X=1,Y=1。即使没有编译器优化(即使你用汇编语言编写上述两个线程)。
The very notion of a single memory visible to all threads ceases to work with several cores having individual caches. StackOverflow questions on memory barriers may be of interest; say, this one.
I think an example to illustrate the problem with the "single memory" model is this one:
Initially, x = y = 0.
Thread 1:
Thread 2:
Of course, there is a race condition. The secondary problem besides the obvious race condition is that one possible outcome is X=1, Y=1. Even without compiler optimizations (even if you write the above two threads in assembly).
在 x86 上,如果汇编器操作以锁定指令为前缀,则答案是肯定的,然后处理器断言硬件信号,以确保后续指令是原子的(在某些处理器中,缓存协调以确保操作是原子的)。
使操作原子化是编译器不会做的事情,在多处理器系统上,原子汇编语言操作非常昂贵,通常用于实现 OS / C 库提供的锁定原语。
任何纯粹的高级语言内存操作都不应被视为原子操作。如果您有多个线程写入同一共享内存位置,那么您需要使用某种互斥/锁定机制来避免竞争。
On a x86 then the answer is yes if the assembler operation was prefixed by a lock instruction then the processor asserts a hardware signal that ensures that the following instruction is atomic (in some processors the caches coordinate to ensure the operation is atomic).
Making operations atomic is something compilers don't do, on multiprocessor systems atomic assembly language operations are very expensive and are generally used to implement the locking primitives offered by the OS / C library.
No purely high level language memory operations should be regarded as atomic. If you have multiple threads writing to the same shared memory location then you need to use some mutex/lock mechanism to avoid races.
也许我误解了这个例子,但是“未对齐的指针”问题
与单核执行相同。如果一个数据可以部分
写入内存然后不同的线程可以看到部分更新(如果
任何机器上都没有适当的锁定)
多任务处理(即使在单 CPU 系统上)。
除非你正在编写驱动程序,否则你不必担心缓存
对于支持 DMA 的外设。现代多处理器都是缓存
一致,因此硬件保证处理器 A 上的线程将
与处理器 B 上的线程具有相同的内存视图。如果该线程
A 上的线程读取 B 上缓存的内存位置,然后 A 上的线程读取
将从Bs缓存中获取正确的值。
您确实必须担心寄存器中的值以及来自
从编程的角度来看,差异可能不是可见的,但是
在我看来,并发讨论中经常涉及到缓存
只会带来不必要的混乱。
编程手册中标记为“不可分割”的任何操作
因为 ISA 必须合理地在多处理中保持不可分割性
使用 ISA 或向后兼容性的处理器构建的系统
会坏的。然而,这并不意味着
从未承诺不可分割,却偏偏在某个特定的地方
处理器实现,未来将是不可分割的
实现(例如在多处理器系统中)。
[编辑] 完成下面的评论
线程,无论核心数量如何(在高速缓存中一致
系统)。
在存在抢占的情况下由不同步的线程读取(甚至
在单核系统上)。
如果指针被写入单个原子中的未对齐地址
写入然后缓存一致性硬件将确保所有
线程看到它已完成,或者根本没有。如果指针被写成
非原子地(例如使用两个单独的写入操作)然后任何
即使在单核系统上,线程也可能会看到部分更新
真正的先发制人。
Maybe I misunderstand the example but the "unaligned pointer" problem
is the same as on a single-core execution. If a datum can be partially
written to memory then different threads can see partial updates (if
there's no appropriate locking) on any machine with preemtive
multitasking (even on a single-CPU system).
You don't have to worry about the cache unless you are writing drivers
for DMA-capable peripherals. Modern multi-processors are cache
coherent so the hardware guarantees that a thread on processor A will
have the same view of memory as a thread on processor B. If the thread
on A reads a memory location that is cached on B then the thread on A
will get the correct value from Bs cache.
You do have to worry about values in registers and from a
programming standpoint that difference may not be a visible one, but
in my opinion involving the cache in a concurrency discussion often
just introduces unnecessary confusion.
Any operation that is labeled "indivisible" by the programming manual
for a ISA must reasonably keep being indivisible in a multiprocessing
system built with processors using that ISA or backwards compatibility
would break. However, this does not mean that operations that were
never promised to be indivisible, but happened to be in a particular
processor implementation, will be indivisible in future
implementations (such as in a multiprocessor system).
[Edit] Completion to the comment below
threads, regardless of the number of cores (in a cache coherent
system).
read by unsynchronized threads in the presence of preemption (even
on a single-core system).
If the pointer is written to an unaligned address in a single, atomic
write then the cache coherence hardware will make sure that all
threads see it completed, or not at all. If the pointer is written
non-atomically (such as with two separate write operations) then any
threads may see the partial update even on a single-core system
with true preemption.
这个类有可能输出“False”吗?
是的,在多核机器上。值类型(例如布尔值)可以存储在机器寄存器中,并且同步寄存器的顺序是特定于机器的。
underwearOn
可以在trousersOn
之前同步。您可以锁定分配和 while 循环,但这会损害性能。更好的解决方案是将 bool 变量声明为 volatile。此类变量不存储在寄存器中。
编辑:
这是 Threading Complete 上提供的演示文稿的简化示例。
Is it possible for this class to output “False”?
Yes, on multicore machines. Value types, such as bools, can be stored in machine registers, and the order registers are synchronized is machine-specific.
underwearOn
could be synchronized beforetrousersOn
.You could lock the assignments and while loop, but this will harm performance. A better solution is to declare the bool variables volatile. Such variables are not stored in registers.
Edit:
This is a simplified example from a presentation available at Threading Complete.
“不可分割”(或“原子”)的概念在顺序(单核单线程)系统中没有什么意义。为了知道某个东西是否不可分割,你需要一个外部观察者,而这个外部观察者只能是另一个线程,无论是调度在同一个核心上还是不同的核心上。不可分割意味着没有外部观察者可以观察到中间状态。让我推荐《多核编程的艺术》一书,以更深入地了解这些概念。
您可能会问的是,看似不可分割的操作(例如单行语句 x = 3)实际上是否是不可分割的。答案是否定的,有一个众所周知的例子:Java 中双精度数的处理。双精度数存储在两个 32 位字中,并且 JVM 规范不保证双精度数操作是原子的(尽管它们实际上在所有主流 JVM 上都是如此)。另一个线程可能会观察到只有两个单词中的一个已更新的状态。再次强调,这与两个线程是调度在同一核心还是不同核心上无关。
在任何情况下,每当您想要观察一致状态的共享数据时,您都应该始终依赖同步事件(例如对易失性变量、屏障、锁等的读取或写入)。避免这些问题的另一个选择是完全避免共享状态。这可以通过纯函数式或消息传递语言来实现。
The notion of being "indivisible" (or "atomic") makes little sense in a sequential (single-core single-thread) system. In order to know if something is indivisible or not, you need an external observer, and this external observer can only be another thread, whether scheduled on the same core or on a different core. Indivisible means that no external observer can observe an intermediate state. Let me recommend the book "The Art of Multicore Programming" for more insight into these concepts.
What you're probably asking is whether seemingly indivisible operations (such as the one-liner statement x = 3) are actually indivisible. The answer is no, and there exists a well known example: the handling of doubles in Java. A double is stored in two 32-bit words, and the JVM spec does not guarantee that operations on doubles are atomic (although they are in practice on all mainstream JVMs). Another thread may observe a state where only one word out of two has been updated. Once again, this is irrelevant of whether the two threads are scheduled on the same core or on different cores.
In any case, you should always rely on synchronization events (such as reads or writes to volatile variables, barriers, locks, etc) whenever you want to observe shared data in a consistent state. Another option for avoiding these problems is to avoid shared state altogether. This is possible with purely functional or message-passing languages.