缓存在处理设备时是否有限制?
在我的计算机体系结构课上,教授向我们提出了以下问题:
"Caches can have serious limitations when dealing with devices. Why is this?"
我不知道为什么会有限制或它们可能是什么。在谷歌上搜索后,我似乎找不到这个问题的准确答案。有人可以解释为什么存在限制以及这些限制可能是什么,或者为我指明可以帮助我回答问题的方向吗?
In my computer architecture class, the prof posed the following question to us:
"Caches can have serious limitations when dealing with devices. Why is this?"
I have no idea why there are limitations or what they could be. After a search on Google, I can't seem to find an accurate answer to this. Can someone explain why there are limitations and what those limitations may be, or point me in a direction that may help me in answering the question?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
换句话说,您无法控制实际发送到设备的内容/时间。
In other words, you have no control on what/when is actually sent to device.
缓存和设备有两个问题。第一个问题是基本功能正确性。也就是说,系统通常必须将设备的内存映射寄存器放置在完全绕过高速缓存的地址范围内。想象一下这没有发生并且缓存“妨碍”。在这种情况下,尝试读取设备上的状态寄存器的软件将读取缓存提供的陈旧值!祝您的设备驱动程序正常工作。一些 CPU 为不可缓存的访问提供特殊指令,但基本结果是相同的,即缓存没有提供任何好处,只会在处理设备内存时使问题变得复杂。
第二个问题是能够对内存进行直接内存访问 (DMA) 事务的智能设备的性能问题。当设备执行 DMA 写入时,系统中的硬件总线逻辑会按照 MESI 协议完全远离 CPU 监听高速缓存行。核心在很大程度上依赖于将数据保存在封闭的缓存中以提高效率。现在,设备刚刚拉走了所有缓存线,这迫使内核在下一次软件访问时以高延迟重新加载线。即使对于 DMA 读取,也通常会发生相同的窥探,因为 CPU 通常会避免数据线处于共享状态。
集成到 CPU 本身的设备可能能够将高速缓存行保留在 CPU 的最后一级高速缓存中,与 CPU 封装之外的设备相比,这可以显着提高性能。
There are two problems with cache and devices. The first problem is one of basic functional correctness. Namely, the system must generally place the device's memory-mapped registers in an address range that bypasses the cache entirely. Imagine this did not happen and cache was "in the way". In that case, software trying to read a status register on the device would read instead the stale value provided by the cache! Good luck getting your device drivers to work. Some CPUs provide special instructions for uncacheable accesses, but the basic result is same, in that cache offers no benefit and only complicates matters when dealing with device memory.
The second problem is a performance issue with smart devices capable of doing direct memory access (DMA) transactions with memory. When a device performs a DMA write then hardware bus logic in the system snoops the cache lines completely away from the CPU following the MESI protocol. Cores depend strongly on keeping data in close caches for efficiency. Now, the device has just yanked all the cache lines away, which forces the core into a high latency reload of the lines on the next software access. The same snooping usually happens even for DMA reads since CPU's often avoid data lines the shared state.
Devices integrated into the CPU itself may be able to leave the cache lines resident in the CPU's last level cache, which can be a significant performance boost vs. devices outside the CPU package.
基本上,您必须有一种方法来准确了解数据何时从缓存中刷新到您正在交谈的设备中。 RAM 与其他设备一样,它会记住最后写入某个位置的数据。如果它不存储在 RAM 中,那么设备也不会看到写入。 Linux 使用称为读/写屏障的功能,这些功能依赖于体系结构。
Linux 是如何处理的
Basically you have to have a way to know exactly when data is flushed out of cache and into the device you're talking to. RAM is a device like any other and it remembers the last data that was written to a location. If it wouldn't be stored in RAM, then a device also wont see the write. Linux uses functions called read/write barriers that are architecture dependent.
How Linux deals with it