使用MMWord PTR 64位内存操作数时,使用PunPCKLWD无效操作。
目前正在处理一些旧的装配代码,并使用此线路出现MASM错误。
punpcklwd MM3, MMWORD PTR [8+EBP+ECX*2]
给我:错误A2070:无效的说明操作数,
但是,这应该有效,对吗?从编译副本中分解的代码基本上与此相同。
另外,根据此PDF的说法,这是应该写的。 -mpeg1-audio-kernels-140701.pdf“ rel =” nofollow noreferrer“> https://www.intel.com/content/content/content/dam/develic/extervel/extern/enternal/external/en/en/en/documents/mmmx-app-mpeg-mpeg1-app-mpeg1-abpeg1-audio-audio-audio-audio-audio-audio-audio-audio-audio-audio------------------内核140701.pdf
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
内存源操作数为32位dword,而不是mmword或qword。
参见
” XMM版本:它确实是一个128位负载,如果它延伸到未上限的页面或未对准的情况下,则有故障。
描述部分备份了:
128位行为是SSE1/SSE2中许多愚蠢的设计决策之一。我想知道Pentium 4是否对商店 - 转向的局限性或某种东西有某种方式使第一代实现的效率降低了,就像
movq
加载一样。有movhps xmm3,qword ptr [ecx]
加载到上半部分以替换punpcklqdq
,但是您只需要一个单独的movq
即可缩小交通。仅采用其使用宽度的操作数的MMX行为是明智的。我不知道您链接的英特尔文档为何使用mmword;也许有些召集人当时接受了这一点。当前的MASM拒绝它确实有意义,但无论哪种方式都可以进行。
请注意, punpckhwd mm3,mm0 可以用
movq [esi],mm0
/替换punpckhwd mm3,[esi]
并运行相同,而不是需要[esi+4]
。这也让他们构建HW,只需将64位的负载带到洗牌单元,而无需广播或移动的负载即可将数据获取正确的位置,以便输入到Alu。现代英特尔负载端口可以进行广播负载(例如
movddup
或vbroadcastss
,存储器源以单个UOP的单个UOP运行,而无需涉及ALU),但这是很多东西比P5 Pentium更新。完全省略DWord / Mmword PTR
,顺便说一句,
punpcklwd mm3,[8+EBP+ECX*2] < / code>应该可以与大多数Intel-Syntax汇编器组装好代码> .intel_syntax noprefix
。寄存器目的地(以及助记符)意味着内存操作数的大小。gnu binutils
objdump -drwc -mintel
同意英特尔的手册,它是32位内存操作数。我认为MASM需要相同的语法。The memory source operand is 32-bit DWORD, not MMWORD or QWORD.
See Intel's asm manual entry:
Unfortunately, the same is not true for the XMM version: it does count as a 128-bit load, faulting if it extends into an unmapped page or is misaligned.
The Description section backs this up:
The 128-bit behaviour is one of many dumb design decisions in SSE1/SSE2. I wonder if Pentium 4 had limitations on store-forwarding or something that would have somehow made it less efficient in that first-gen implementation to be like a
movq
load. There ismovhps xmm3, qword ptr [ecx]
to load into the upper half to replacepunpcklqdq
, but you just need a separatemovq
for narrower interleaves.The MMX behaviour of only taking an operand of the width it uses is the sensible one. I don't know why the Intel doc you linked uses MMWORD with it; maybe some assemblers accepted that at the time. It does make sense that current MASM rejects it, but that could have gone either way.
Do note that
punpckHwd
and so on want a register-width memory operand, I guess so it more closely matches the register source version, e.g.punpckhwd mm3, mm0
could be replaced withmovq [esi], mm0
/punpckhwd mm3, [esi]
and run the same, rather than needing[esi+4]
.That also let them build HW that just feeds a 64-bit load to the shuffle unit, without needing a broadcast or shifted load to get the data at the right place for input to the ALU. Modern Intel load ports can do broadcast loads (e.g.
movddup
orvbroadcastss
with a memory source run as a single uop for a load port, no ALU involved), but that's something much more recent than P5 Pentium.Omit the DWORD / MMWORD PTR entirely
And BTW,
punpcklwd MM3, [8+EBP+ECX*2]
should assemble just fine with most Intel-syntax assemblers, including MASM as well as NASM and GAS with.intel_syntax noprefix
. The register destination (along with the mnemonic) implies the size of the memory operand.GNU Binutils
objdump -drwC -Mintel
agrees with Intel's manual that it's a 32-bit memory operand. I assume MASM would want the same syntax.