MOVDQA 和 MOVAPS x86 指令之间的区别?
我正在查找英特尔数据表:英特尔® 64 和 IA-32 架构 软件开发人员手册,我找不到
- MOVDQA:移动对齐双四字
- MOVAPS:移动对齐打包单精度
之间的区别在英特尔数据表中我可以找到对于这两个指令:
该指令可用于从 128 位加载 XMM 寄存器 内存位置,将 XMM 寄存器的内容存储到 128 位内存位置,或在两个 XMM 寄存器之间移动数据。
唯一的区别是:
要将双四字移入或移出未对齐的内存位置,请使用 MOVDQU指令。
和
将打包的单精度浮点值移入或移出 未对齐的内存位置,请使用 MOVUPS 指令。
但我找不到两个不同指令的原因?
有人能解释一下其中的区别吗?
I'm looking Intel datasheet: Intel® 64 and IA-32 Architectures
Software Developer’s Manual and I can't find the difference between
- MOVDQA: Move Aligned Double Quadword
- MOVAPS: Move Aligned Packed Single-Precision
In Intel datasheet I can find for both instructions:
This instruction can be used to load an XMM register from a 128-bit
memory location, to store the contents of an XMM register into a
128-bit memory location, or to move data between two XMM registers.
The only difference is:
To move a double quadword to or from unaligned memory locations, use
the MOVDQU instruction.
and
To move packed single-precision floating-point values to or from
unaligned memory locations, use the MOVUPS instruction.
But I can't find the reason why two different instructions?
So can anybody explain the difference?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在功能上,它们是相同的。
在某些(但不是全部)微架构上,由于“域交叉惩罚”而存在时间差异。因此,当数据与整数 SSE 指令一起使用时,通常应使用
movdqa
;当数据与浮点指令一起使用时,通常应使用movaps
。有关此主题的更多信息,请参阅英特尔优化手册或 Agner Fog 的优秀微架构指南。请注意,这些延迟通常与寄存器间移动相关,而不是与加载或存储相关。In functionality, they are identical.
On some (but not all) micro-architectures, there are timing differences due to "domain crossing penalties". For this reason, one should generally use
movdqa
when the data is being used with integer SSE instructions, andmovaps
when the data is being used with floating-point instructions. For more information on this subject, consult the Intel Optimization Manual, or Agner Fog's excellent microarchitecture guide. Note that these delays are most often associated with register-register moves instead of loads or stores.