x86 CPU 中的取指阶段是否将 eip(PC) 递增到下一条指令?
在 x86 CPU 指令周期的取指阶段,我想知道 eip(PC) 寄存器是否会递增以在该阶段(取指阶段)结束时或执行阶段之后存储下一条指令?
我知道 MIPS CPU 在获取阶段结束时增加 eip,但是 x86 CPU 也在这样做吗?
我认为确实如此,因为在查看某个程序的编译代码后,我注意到“相对调用指令”编码中的地址是相对于下一条指令的,而不是相对于下一条指令的。当前指令。
During the fetch phase of the instruction cycle in an x86 CPU, I've wondered if the eip(PC) register gets incremented to store the next instruction at the end of that phase(fetch phase) or after the execution phase?
I know that MIPS CPUs increment eip by the end of the fetch phase, but x86 CPUs are also doing it?
I assume it does because after I looked at a compiled code of some program, I've noticed that the address in the encoding of a "relative call instruction" is relative to the next instruction and not to the current instruction.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
“获取阶段?” 你用的是哪种芯片,多力多力片? 例如 386?甚至 486 也是流水线化的,P5 Pentium 是双发超标量。因此,386 是唯一具有 EIP 的非流水线 x86,而不仅仅是 IP(至少来自英特尔)。当然,所有商用 MIPS CPU 也都是流水线的,这实际上就是 RISC ISA 设计和名称(无互锁流水线级的微处理器)的全部要点。
x86 机器代码是可变长度 x86 指令的字节流,因此在解码之前您绝对无法知道指令的结尾。
对于流水线获取/解码,x86 CPU 必须只获取块流并从获取缓冲区解码窗口。因此,获取地址在获取阶段(而不是阶段)中递增,与解码和处理先前获取结果的后续阶段并行。 (现代 x86 CPU 最多具有 4 宽旧解码(例如 Zen 2< /a>,或 Skylake 将每个时钟 4 条指令解码为最多 5 个 uops,从 4 个 uops 增加到href="https://www.realworldtech.com/sandy-bridge/4/" rel="nofollow noreferrer">Sandybridge)。通常它们依赖于已经的 uop 缓存。 - 为 5 或 6 uops 宽的管道提供解码指令;传统解码很难扩展)
作为解码的一部分,任何 x86 CPU 都会记录结束地址 (并行解码的每条指令),因为这是相对跳转/调用的相关内容,对于 x86-64 RIP 相对寻址模式也是如此。这也是调用必须推送的返回地址。
只有某些类型的异常才需要起始地址,其中将推送错误指令的地址。 (因此操作系统可以修复这种情况,例如 #PF 页面错误,并返回用户空间重新运行指令并希望能够成功。)但是考虑到推测执行,现代 x86 确实可以这样做 还必须记下每条指令的起始地址,并在整个管道中跟踪它以及结束地址。 (或者开始+长度或结束长度,因为长度最多为 4 位而不是 64 位。)
即使原始 8086 也具有与解码分开的流水线预取,但是,解码会在解码时增加 IP,因此它具有指令的结束(但不是开始)。
8086 在解码期间根本不记得指令的起始地址(这可以迭代任意数量的前缀(15 字节最大 insn 长度限制直到后来才制定)。它没有没有现代 x86 的许多例外(甚至没有
#UD
非法指令陷阱:每个字节序列都作为某物执行。)甚至 8086 #DE 除法异常推送了最终地址,与后来的 x86 不同(甚至在可中断指令(如rep cs movsb)期间处理中断也只推送了最后一个前缀的地址,而不是推送了最后一个前缀的地址。首先,所以它将恢复为 稍后x86 CPU 修复了该设计缺陷并更改了
#DE
语义。)"fetch phase?" What kind of chip you got in there, a Dorito? e.g. a 386? Even 486 was pipelined and P5 Pentium was dual-issue superscalar. So 386 was the only non-pipelined x86 with an EIP, not just an IP (at least from Intel). Of course, all commercial MIPS CPUs were pipelined as well, that was literally the whole point of the RISC ISA design and name (Microprocessor without Interlocked Pipelines Stages).
x86 machine code is a byte-stream of variable-length x86 instructions, so you definitely can't know the end of an instruction until after decoding it.
For pipelined fetch/decode, x86 CPUs have to just fetch a stream of blocks and decode a window from a fetch buffer. So the fetch address increments in the fetch stage (not phase), in parallel with decode and later stage(s) working on the results of previous fetches. (Modern x86 CPUs have up to 4-wide legacy decode (e.g. in Zen 2, or Skylake decoding 4 instructions per clock into up-to-5 uops, up from 4 insns -> 4 uops in Sandybridge). Perhaps even wider in Alder Lake. Usually they depend on the uop cache of already-decoded instructions to feed pipelines that are 5 or 6 uops wide; legacy decode is too hard to scale up)
As part of decode, any x86 CPU takes note of the end address (of each instruction decoded in parallel), because that's what relative jumps/calls are relative to, and same for x86-64 RIP-relative addressing modes. It's also the return address call has to push.
The start address is only needed for some kinds of exceptions, where the address of the faulting instruction is pushed. (So the OS can repair the situations, e.g. for a #PF page fault, and return to user-space to re-run the instruction and hopefully have it succeed.) But given speculative execution, a modern x86 does have to also note the start address of every instruction and track it throughout the pipeline, along with the end. (Or a start+length or end-length, since the length is at most 4 bits instead of 64 bits.)
Even original 8086 had pipelined prefetch separate from decode, but yes, decode would increment IP as it decoded, so it had the end (but not the start) of the instruction.
8086 did not remember the start address of the instruction at all during decode (which could iterate over an arbitrary number of prefixes (the 15-byte max insn length limit wasn't instituted until later). It didn't have many of the exceptions that modern x86 has (not even a
#UD
illegal-instruction trap: every byte-sequence executed as something.)Even 8086
#DE
divide exception pushed the final address, unlike later x86. (And even handling interrupts during interruptible instructions likerep cs movsb
only pushed the address of the last prefix, not the first, so it would resume ascs movsb
! Later x86 CPUs fixed that design flaw along with changing#DE
semantics.)