在 SSE 中组合前缀
在 SSE 中,前缀 066h
(操作数大小覆盖)0F2H
(REPNE) 和 0F3h
(REPE) 是操作码的一部分。
在非 SSE 中,066h
在 32 位(或 64 位)和 16 位操作之间切换。 0F2h
和 0F3h
用于字符串操作。它们可以组合起来,以便在同一条指令中使用066h
和0F2h
(或0F3h
),因为这是有意义的。 SSE 指令的行为是什么?例如,我们有(暂时忽略 mod/rm):
0f 58 addps
66 0f 58 addpd
f2 0f 58 addsd
f3 0f 58 addss
但这是什么?
66 f2 0f 58
又怎么样呢?
f2 66 0f 58
更不用说以下有两个冲突的 REP 前缀:
f2 f3 0f 58
这些的规范是什么?
In SSE the prefixes 066h
(operand size override) 0F2H
(REPNE) and 0F3h
(REPE) are part of the opcode.
In non-SSE 066h
switches between 32-bit (or 64-bit) and 16-bit operation. 0F2h
and 0F3h
are used for string operations. They can be combined so that 066h
and 0F2h
(or 0F3h
) can be used in the same instruction, because this is meaningful. What is the behavior in an SSE instruction? For instance, we have (ignoring mod/rm for now):
0f 58 addps
66 0f 58 addpd
f2 0f 58 addsd
f3 0f 58 addss
But what is this?
66 f2 0f 58
And how about?
f2 66 0f 58
Not to mention the following which has two conflicting REP prefixes:
f2 f3 0f 58
What is the spec for these?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我不记得见过任何关于在疯狂组合随机前缀的情况下应该期待什么的规范,所以我猜 CPU 行为可能是“未定义的”并且可能是特定于 CPU 的。 (显然,有些事情在例如英特尔的文档中指定,但许多情况没有涵盖)。并且某些组合可能会保留以供将来使用。
我天真的假设通常是额外的前缀将是空操作,但不能保证。这似乎是合理的,因为例如一些优化手册建议通过前缀
66h
来使用多字节NOP
(规范90h
),例如:但是,我也知道
CS
和DS
段覆盖前缀已经获得了作为 SSE2 分支提示前缀的新颖功能(预测分支 =3Eh
=DS< /code> override; 当应用于条件跳转指令时,预测分支不被采用 =
2Eh
=CS
override。,我查看了上面的示例,始终将
XMM1
设置为所有0
并将XMM7
设置为所有0FFh
无论如何 然后是有问题的代码,带有
xmm1, xmm7
参数。我观察到的情况(Win64 系统和 Intel T7300 Core 2 Duo 上的 32 位代码)是:1) 通过添加
66h
前缀,没有观察到addsd
发生变化2) 没有观察到
addsd
发生变化 2) 没有观察到addsd
发生变化code>addss 通过添加0F2h
前缀3) 但是,我通过在
addpd
加上0F2h
前缀观察到了变化:在这种情况下, XMM1 中的结果是
0000000000000000FFFFFFFFFFFFFFFFh
而不是FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFh
。所以我的结论是,人们不应该做出任何假设并期待“未定义”的行为。不过,如果您能在 Agner folg 的手册中找到一些线索,我不会感到惊讶。
I do not remember having seen any specification on what you should expect in the case of wildly combining random prefixes, so I guess CPU behaviour may be "undefined" and possibly CPU-specific. (Clearly, some things are specified in e.g. Intel's docs, but many cases aren't covered). And some combinations may be reserved for future use.
My naive assumptions would generally have been that additional prefixes would be no-ops but there's no guarantee. That seems reasonable given that e.g. some optimising manuals recommend multi-byte
NOP
(canonically90h
) by prefixing with66h
, e.g.:However, I also know that
CS
andDS
segment override prefixes have aquired novel functions as SSE2 branch hint prefixes (predict branch taken =3Eh
=DS
override; predict branch not taken =2Eh
=CS
override) when applied to conditional jump instructions.Anyway, I looked at your examples above, always setting
XMM1
to all0
andXMM7
to all0FFh
byand then the code in question, with
xmm1, xmm7
arguments. What I observed (32bit code on Win64 system and Intel T7300 Core 2 Duo) was:1) no change observed for
addsd
by adding66h
prefix2) no change observed for
addss
by adding0F2h
prefix3) However, I observed a change by prefixing
addpd
by0F2h
:In this case, the result in XMM1 was
0000000000000000FFFFFFFFFFFFFFFFh
instead ofFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFh
.So my conclusion is that one shouldn't make any assumptions and expect "undefined" behaviour. I wouldn't be surprised, however, if you could find some clues in Agner fog's manuals.
Intel 的 SDM vol.2 手册(指令集参考)将这些称为强制前缀。将它们视为操作码的一部分。
但是,是的,它们是前缀,可以在实际转义字节+操作码之前与其他前缀混合。事实上,REX 前缀必须位于其他前缀之后。
与往常一样,使用同一组中的多个冲突前缀恰好会在当前英特尔硬件上以最后一个前缀优先进行解码。我认为英特尔手册说这样做可能会产生不可预测的行为,因此它不能得到保证或面向未来。这不是一件有意义的事;如果您出于对齐原因想要填充指令以使其更长,我认为重复相同的前缀几次是安全的。
而且
Intel's SDM vol.2 manual (the instruction set reference) refers to these as mandatory prefixes. Think of them as part of the opcode.
But yes, they are prefixes and can be mixed with other prefixes ahead of the actual escape-byte+opcode. In fact a REX prefix must go after other prefixes.
As usual, using multiple conflicting prefixes from the same group happens to decode with the last one taking priority on current Intel hardware. I think Intel manuals say that doing this can give unpredictable behaviour so it's not guaranteed or future proof. It's not a meaningful thing to do; if you want to pad an instruction to make it longer for alignment reasons, I think repeating the same prefix a couple times is safe.
And also