缓存一致性有什么意义?
在像 x86 这样提供缓存一致性的 CPU 上,从实际角度来看这有何用处?据我所知,这个想法是让一个核心上完成的内存更新在所有其他核心上立即可见。这是一个有用的属性。但是,如果不是用汇编语言编写,则不能过分依赖它,因为编译器可以将变量赋值存储在寄存器中,并且永远不会将它们写入内存。这意味着仍然必须采取明确的步骤来确保其他线程中完成的操作在当前线程中可见。那么,从实际的角度来看,缓存一致性到底实现了什么?
On CPUs like x86, which provide cache coherency, how is this useful from a practical perspective? I understand that the idea is to make memory updates done on one core immediately visible on all other cores. This is a useful property. However, one can't rely too heavily on it if not writing in assembly language, because the compiler can store variable assignments in registers and never write them to memory. This means that one must still take explicit steps to make sure that stuff done in other threads is visible in the current thread. Therefore, from a practical perspective, what has cache coherency achieved?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
简而言之,非缓存一致性系统非常难以编程,特别是如果您想保持效率 - 这也是当今大多数 NUMA 系统都是缓存一致性的主要原因。
如果缓存不一致,则“显式步骤”必须强制执行一致性 - 显式步骤通常是诸如临界区/互斥体之类的东西(例如,C/C++ 中的 volatile 很少足够)。对于互斥体等服务来说,仅跟踪所有缓存中发生变化且需要更新的内存是非常困难的,即使不是不可能,它也可能必须更新所有内存,也就是说,如果它甚至可以跟踪哪些核心的缓存中有哪些内存片段。
据推测,硬件可以更好、更高效地跟踪已更改的内存地址/范围,并保持它们同步。
并且,想象一个进程在核心 1 上运行并被抢占。当它再次被调度时,它被调度到核心 2 上。
如果缓存不一致,这将是非常致命的,否则核心 1 的缓存中可能会有进程数据的残留,而核心 2 中不存在这些数据缓存。不过,对于以这种方式工作的系统,操作系统必须在调度线程时强制执行缓存一致性 - 这可能是“更新所有核心之间缓存中的所有内存”操作,或者它可能可以通过以下方式跟踪脏页:在 MMU 的帮助下,仅同步已更改的内存页面 - 同样,硬件可能以更细粒度和更有效的方式保持缓存的一致性。
The short story is, non-cache coherent system are exceptionally difficult to program especially if you want to maintain efficiency - which is also the main reason even most NUMA systems today are cache-coherent.
If the caches wern't coherent, the "explicit steps" would have to enforce the coherency - explicit steps are usually things like critical sections/mutexes(e.g. volatile in C/C++ is rarly enough) . It's quite hard, if not impossible for services such as mutexes to keep track of only the memory that have changes and needs to be updated in all the caches -it would probably have to update all the memory, and that is if it could even track which cores have what pieces of that memory in their caches.
Presumable the hardware can do a much better and efficient job at tracking the memory addresses/ranges that have been changed, and keep them in sync.
And, imagine a process running on core 1 and gets preempted. When it gets scheduled again, it got scheduled on core 2.
This would be pretty fatal if the caches weren't choerent as otherwise there might be remnants of the process data in the cache of core 1, which doesn't exist in core 2's cache. Though, for systems working that way, the OS would have to enforce the cache coherency as threads are scheduled - which would probably be an "update all the memory in caches between all the cores" operation, or perhaps it could track dirty pages vith the help of the MMU and only sync the memory pages that have been changed - again, the hardware likely keep the caches coherent in a more finegrainded and effcient way.
其他作者的精彩回应并未涵盖一些细微差别。
首先,考虑一下 CPU 不是逐字节处理内存,而是处理缓存行。一行可能有 64 个字节。现在,如果我在位置 P 分配一块 2 字节的内存,另一个 CPU 在位置 P + 8 分配一块 8 字节的内存,并且 P 和 P + 8 都位于同一缓存行上,则观察到没有缓存一致性两个 CPU 无法同时更新 P 和 P + 8,而不破坏彼此的更改!因为每个 CPU 都会在高速缓存行上执行读取-修改-写入操作,因此它们可能都会写出不包含其他 CPU 更改的行的副本!最后一位作家将获胜,并且您对记忆的修改之一将“消失”!
另一件要记住的事情是连贯性和一致性之间的区别。因为即使是 x86 派生的 CPU 也使用存储缓冲区,因此无法保证已完成的指令会以其他 CPU 可以看到这些修改的方式修改内存,即使编译器已决定将值写回到内存(也许是因为
易失性
?)。相反,模组可能会存放在商店缓冲区中。几乎所有常用的 CPU 都是高速缓存一致性的,但很少有 CPU 具有像 x86 那样宽容的一致性模型。例如,查看 http://www.cs。 nmsu.edu/~pfeiffer/classes/573/notes/consistency.html 有关此主题的更多信息。希望这会有所帮助,顺便说一句,我在 Corensic 工作,这家公司正在构建一个并发调试器,您可能想看看。当有关并发性、连贯性和一致性的假设被证明毫无根据时,它有助于收拾残局:)
There are some nuances not covered by the great responses from the other authors.
First off, consider that a CPU doesn't deal with memory byte-by-byte, but with cache lines. A line might have 64 bytes. Now, if I allocate a 2 byte piece of memory at location P, and another CPU allocates an 8 byte piece of memory at location P + 8, and both P and P + 8 live on the same cache line, observe that without cache coherence the two CPUs can't concurrently update P and P + 8 without clobbering each others changes! Because each CPU does read-modify-write on the cache line, they might both write out a copy of the line that doesn't include the other CPU's changes! The last writer would win, and one of your modifications to memory would have "disappeared"!
The other thing to bear in mind is the distinction between coherency and consistency. Because even x86 derived CPUs use store buffers, there aren't the guarantees you might expect that instructions that have already finished have modified memory in such a way that other CPUs can see those modifications, even if the compiler has decided to write the value back to memory (maybe because of
volatile
?). Instead the mods may be sitting around in store buffers. Pretty much all CPUs in general use are cache coherent, but very few CPUs have a consistency model that is as forgiving as the x86's. Check out, for example, http://www.cs.nmsu.edu/~pfeiffer/classes/573/notes/consistency.html for more information on this topic.Hope this helps, and BTW, I work at Corensic, a company that's building a concurrency debugger that you may want to check out. It helps pick up the pieces when assumptions about concurrency, coherence, and consistency prove unfounded :)
想象一下您这样做:
如果没有缓存一致性,则最后一个
unlock()
必须确保globalint
现在在任何地方都可见,而缓存一致性就是您所需要做的一切就是将其写入内存并让硬件发挥作用。软件解决方案将跟踪哪些内存存在于哪些缓存中、哪些核心上,并以某种方式确保它们自动同步。如果您能找到一种软件解决方案来跟踪缓存中存在的所有需要保持同步的内存块,并且比当前的硬件解决方案更高效,那么您将赢得奖项。
Imagine you do this:
If there were no cache coherence, that last
unlock()
would have to assure thatglobalint
are now visible everywhere, with cache coherance all you need to do is to write it to memory and let the hardware do the magic. A software solution would have keep tack of which memory exists in which caches, on which cores, and somehow make sure they're atomically in sync.You'd win an award if you can find a software solution that keeps track of all the pieces of memory that exist in the caches that needs to be keept in sync, that's more efficient than a current hardware solution.
当您处理多个线程并从多个线程访问同一变量时,缓存一致性变得极其重要。在这种特殊情况下,您必须确保所有处理器/核心在同时访问该变量时确实看到相同的值,否则您将出现非常不确定的行为。
Cache coherency becomes extremely important when you are dealing with multiple threads and are accessing the same variable from multiple threads. In that particular case, you have to ensure that all processors/cores do see the same value if they access the variable at the same time, otherwise you'll have wonderfully non-deterministic behaviour.
不需要它来锁定。如果需要的话,锁定代码将包括缓存刷新。主要需要确保不同处理器对同一缓存行中不同变量的并发更新不会丢失。
It's not needed for locking. The locking code would include cache flushing if that was needed. It's mainly needed to ensure that concurrent updates by different processors to different variables in the same cache line aren't lost.
缓存一致性是在硬件中实现的,因为程序员在多核/多处理器环境中操作时不必担心确保所有线程都能看到内存位置的最新值。缓存一致性给出了一个抽象概念,即所有核心/处理器都在单个统一缓存上运行,尽管每个核心/处理器都有自己的单独缓存。
它还确保旧的多线程代码在新的处理器模型/多处理器系统上正常工作,而无需进行任何代码更改以确保数据一致性。
Cache coherency is implemented in hardware because the programmer doesn't have to worry about making sure all threads see the latest value of a memory location while operating in multicore/multiprocessor enviroment. Cache coherence gives an abstraction that all cores/processors are operating on a single unified cache, though every core/processor has it own individual cache.
It also makes sure the legacy multi-threaded code works as is on new processors models/multi processor systems, without making any code changes to ensure data consistency.