EMMS指令执行时间?
我正在阅读《汇编的艺术:MMX 指令集》,在执行一些 MMX 指令后,需要执行 EMMS
指令来重置 FPU。它指出EMMS 指令非常慢,
但是当我分析 EMMS 执行时间以查看它到底有多慢时(使用 RDTSC
来计算时钟周期),它似乎在 0 个周期内执行,这
是怎么回事?日期?
I'm reading The Art of Assembly: The MMX Instruction Set", After executing some MMX instructions, the EMMS
instruction needs to be executed to reset the FPU. It states the EMMS instruction is quite slow.
However when I profiled the EMMS
execution time to see just how slow it was, (using RDTSC
to count clock cycles), it appears to execute in 0 cycles.
What's going on? Have I made a mistake somewhere or is Art Of Assembly out of date?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在古老的 Pentium MMX 上速度很慢,但在更现代的处理器上速度非常快。
尽管如此,MMX 如今基本上已经过时了。 使用 SSE2,与 FPU 复用不会出现任何问题。
此外,RDTSC 指令可以与其他指令并行执行,这解释了您的测量 - CPU 只是在同一时钟周期内同时开始执行 RDTSC 和 EMMS...如果您想测量一段代码所花费的时间,您必须根据代码序列化两个 RDTSC - 通常使用 CPUID 指令。 由于您使用的序列化指令本身占用 CPU 周期,因此您还必须测量在您编写的测量装置之间没有代码的情况下需要多少个周期。
最后一点是,即使在 Pentium MMX 上,EMMS 指令本身也很快完成 - 这是之后的第一个 FPU 指令,出现了严重的延迟......
It was slow on the ancient Pentium MMX, but on more modern processors it is very fast.
Still, MMX is mostly obsolete today. Use SSE2, and you'll have no problems multiplexing with the FPU.
Also, the RDTSC instruction can be executed in parallel with other instructions, which explains your measurement - the CPU simply started executing both RDTSCs and the EMMS simultaneously in the same clock cycle... If you want to measure the time a piece of code takes, you must serialize both RDTSCs with regard to the code - usually the CPUID instruction is used for that. As the serializing instructions you use take CPU cycles themselves, you have to also measure how many cycles does it take with no code between the measurement rig you write.
The last point is that even on the Pentium MMX the EMMS instruction itself finished fast - it was the first FPU instruction after that that was getting a nasty delay...
您需要一个序列化指令,例如CPUID,以确保RDTSC不会乱序执行。 您可以在此处阅读更多信息。
You need a serializing instruction, such as CPUID, to ensure that RDTSC is not executed out of order. You can read more here.