Intel 处理器的虚拟操作处理
诚然,我有一个有点愚蠢的问题。基本上,我想知道是否 英特尔处理器提供了一些特殊的机制来有效地 执行一系列虚拟指令,即 NOP 指令?例如,我可以想象那里 可能是某种识别 NOPS 并丢弃它们的预取机制 并尝试获取一些有用的指令。或者这些 NOPS 是否已派出 作为正常指令发送到执行单元,这意味着我可以粗略地处理 每个周期 5 个 nop(假设有 5 个执行单元)
谢谢, 莱因哈德
Admittedly, I have a bit silly question. Basically, I am wondering if
there are some special mechanisms provided by Intel processors to efficiently
execute a series of dummy, i.e., NOP instructions? For instance,I could imagine there
could be some kind of pre-fetch mechanism that identifies NOPS, discards them
and tries to fetch some useful instructions instead. Or are these NOPS dispatched
to the execution unit as normal instructions, meaning that i can roughly process
5 nops each cycle (under the assumption that there are 5 execution units)
Thanks,
Reinhard
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
丢弃它们将是一个非常糟糕的主意:它们通常用于忙等待。如果丢弃 NOP,等待循环就会比应有的更紧,并且可能会带来相当大的通信开销。
如果您觉得
NOP
效率低下,您可以尝试HLT
,它可以节省一些能源。或者您甚至可以让 CPU 进入睡眠状态。然而,只有当您想在相当长的时间内“不执行任何操作”时,这些才有意义,并且它们通常需要管理员权限。Discarding them would be pretty bad idea: they are often used for busy-waiting. If you discard
NOP
s, you make your wait-loop much tighter than it should be and potentially introduce considerable communications overhead.If you feel that
NOP
s are inefficient, you could tryHLT
which saves some energy. Or you could even send the CPU into a sleep state. However, these only make sense if you want to "do nothing" for a considerable amount of time and they usually require suvervisor privileges.不会。它们按照正常指令进行解码和执行;有硬件支持来消除错误的依赖关系,否则会在 EAX 寄存器上为单字节 NOP、0x90(实际上是
xchg eax, eax
)引入这种依赖关系,但仅此而已。参考:Intel(R) 64 和 IA-32 架构优化参考手册 - 第 3.5.1.8 节,“使用 NOP”。
No. They are decoded and executed as normal instructions; there is hardware support to remove the false dependency that would otherwise be introduced on the EAX register for the single byte NOP, 0x90 (which is really
xchg eax, eax
), but that's all.Reference: Intel(R) 64 and IA-32 Architectures Optimization Reference Manual - section 3.5.1.8, "Using NOPs".
在 x86 架构上几乎不需要优化无操作序列,因为它具有不同长度的无操作编码。可以只使用单个多字节无操作,而不是许多单字节无操作。解码器需要做更多的工作,但实际的执行单元只能看到一条要执行的指令。
There's very little need for optimizing sequences of no-ops on the x86 architecture because it has no-op encodings of varying lengths. Instead of many one-byte no-ops, one can just use a single multi-byte no-op. Somewhat more work for the decoder, but the actual execution units only see a single instruction to execute.