进行内存指令穿过负载店队列并在Microharchituctor中发行队列

发布于 2025-01-31 22:20:25 字数 76 浏览 1 评论 0原文

问题队列和LSQ队列有什么区别 内存说明?进行内存说明通过两个队列,或者仅通过 通过LSQ队列。 如果他们通过两个队列,他们的命令是什么?

What is the difference between the issue queue and lsq queue for
memory instructions? Do memory instructions pass through both queues, or do they only pass
through the lsq queue.
If they pass through both queues what is their order?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

山有枢 2025-02-07 22:20:25

我假设您在此处使用类似ARM的命名法,因此问题队列是Intel所说的RS(预订站),并且您的意思是 evarese 您的意思是发送一个UOP准备执行。

答案是内存指令需要通过两者。所有指令都需要发出(除了可以在没有执行的情况下消除的指令,例如寄存器移动,零习惯,NOP等)。让我们重新研究 - 所有需要通过ALU首先进行问题过程的说明。内存指令将简单地使用该步骤来计算其地址。
对于负载来说,这是正确的,对于商店来说,通常会在商店 - 地址和商店数据中进行内部拆分,因此商店 - 地址在这种意义上的表现就像负载,并在该步骤中计算其地址。

通常有一个专用的执行端口和专用的执行单元,因为地址计算通常遵循少数特定的地址模式之一(每个体系结构都有不同的集合),但是除此之外,执行需要遵循相同的规则CPU中的其他操作 - 它需要准备好源并从寄存器文件中读取或从飞行操作中绕过,当执行端口是免费的并由相同的老化规则对执行端口进行优先级时,它需要进行仲裁,因此这是有道理的它使用通用路径。

内存操作完成后,它将发送到LSU(负载商店单元或Intel上的DCU,DATA-CACHE单元),并使用生成的地址执行实际的内存访问。 LSU管道将负责地址翻译,TLB查找,如果需要(尽管有时是在专用单元中完成的),地址范围和属性检查,缓存查找(如果可缓存)并将失误发送到如果需要,下一个缓存级别或内存。它也可能触发预取,作为过程的一部分。

对于负载,当LSU管道完成时(如果L1中的数据不可用,则可能需要多次通过和唤醒),LSU将再次发出问题的标志,以唤醒依赖结果的任何人。
对于商店,商店地址可以作为优化提前提前到缓存,但实际的下一步通常是在退休后唤醒(因为在投机时可能不会将商店派往存储器,除非您有一些技巧来处理此操作)。

还值得一提的是,某些CPU尝试优化可以直接从先前商店转发数据的负载,而不是从高速缓存/内存中获取数据。这可以包括转发(非常常见)或重命名(较少常见)。前者通常在内部由LSU处理,但是后者可以更早地完成,没有LSU(尽管通常仍在激活LSU管以验证结果)。

I'm assuming you use the arm-like nomenclature here so the issue queue is what Intel calls RS (reservation station) and by issue you mean sending a uop ready for execution.

The answer is that memory instructions need to pass both. All instructions need to be issued (except the ones that can be eliminated without execution, for example register moves, zero idioms, nops, etc..). Let's rephrase - all instructions that need to go through an ALU need to go through the issue process first. Memory instructions will simply use that step to calculate their addresses.
This is true for loads, for stores there is usually an internal split into store-address and store-data, so the store-address will behave like a load in that sense and calculate its address during that step.

There is usually a dedicated execution port for that and dedicated execution units because the address calculation usually follows one of few specific addressing modes (each architecture has a different set of these), but aside from that the execution needs to follow the same rules like any other operation in the CPU - it needs to have its sources ready and read from the register file or bypassed from an in flight operation, it needs to get arbitrated when the execution port is free and prioritized by the same aging rules, so it makes sense that it uses the common path.

Once the memory operation has finished execution, it will be sent to the LSU (load-store unit, or the DCU, data-cache unit on Intel) and perform the actual memory access using the generated address. The LSU pipe will take care of the address translation, TLB lookups, the page walk if needed (though this is sometimes done in a dedicated unit), the address range and property checks, the cache lookup (if cacheable) and sending a miss to the next cache level or memory if needed. It may also trigger prefetches as part of the process.

For a load, when the LSU pipe has completed (which may require multiple passes and wakeups if the data was not available in the L1), the LSU will signal the issue queue again in order to wakeup anyone who was depended on the result.
For a store, store-address may fetch the line to the cache in advance as an optimization but the actual next step is usually to wakeup after retirement (since stores may not be dispatched to memory while speculative, unless you have some tricks to handle that).

It's also worth to mention that some CPUs try to optimize loads that can forward the data directly from prior stores instead of fetching it from the cache/memory. This can include forwarding (very common) or memory renaming (less common). The former is usually handled by the LSU internally, but the latter can be done much earlier and without the LSU (though the LSU pipe is usually still activated to validate the result).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文