深度 RISC 流水线中的缓存未命中损失
为什么深度流水线处理器中的高速缓存未命中损失更大?
是否是因为如果在管道的后期发生未命中,则停顿时间会更长?或者是因为管道中的指令太多了?
Why is the cache miss penalty greater in a deeply pipelined processor?
Is it because the stalling period will be more if the miss occurs at some late stage of the pipeline? Or because there are simply too many instructions in the pipeline?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
通常,您会实施更深的管道以减少每个管道阶段的循环时间。
考虑两个有序的单问题流水线处理器微体系结构。
uA1 具有 5 级流水线和 2 ns 周期时间。
uA2 具有 10 级流水线和 1 ns 循环时间。
完整的高速缓存未命中必须(至少)从 DRAM 加载整个高速缓存行。
假设需要 100 ns,包括行激活、行字的突发读取和行预充电。
当uA1发生高速缓存未命中时,它会停止100 ns,例如50个时钟周期,例如50个发布槽。
当uA2发生高速缓存未命中时,它会停止100 ns,例如100个时钟周期,例如100个发布槽。
这里,高速缓存未命中损失(以未命中的指令发布槽表示)在更深的流水线处理器中是两倍。
Usually you implement a deeper pipeline to reduce the cycle time of each pipe stage.
Consider two in-order single-issue pipelined processor microarchitectures.
uA1 has a 5 stage pipeline and a 2 ns cycle time.
uA2 has a 10 stage pipeline and a 1 ns cycle time.
A full cache miss must (at least) load an entire cache line from DRAM.
Assume that takes 100 ns, including row activation, burst reads of the line words, and row precharge.
When uA1 takes a cache miss, it stalls for 100 ns, e.g. 50 clock cycles, e.g. 50 issue slots.
When uA2 takes a cache miss, it stalls for 100 ns, e.g. 100 clock cycles, e.g. 100 issue slots.
Here the cache miss penalty (expressed in instruction issue slots missed), is twice as large in the more deeply pipelined processor.