在 SSE 中组合前缀

发布于 2024-08-24 07:08:08 字数 620 浏览 19 评论 0原文

在 SSE 中，前缀 066h（操作数大小覆盖）0F2H (REPNE) 和 0F3h (REPE) 是操作码的一部分。

在非 SSE 中，066h 在 32 位（或 64 位）和 16 位操作之间切换。 0F2h 和 0F3h 用于字符串操作。它们可以组合起来，以便在同一条指令中使用066h和0F2h（或0F3h），因为这是有意义的。 SSE 指令的行为是什么？例如，我们有（暂时忽略 mod/rm）：

0f 58      addps
66 0f 58   addpd
f2 0f 58   addsd
f3 0f 58   addss

但这是什么？

66 f2 0f 58

又怎么样呢？

f2 66 0f 58

更不用说以下有两个冲突的 REP 前缀：

f2 f3 0f 58

这些的规范是什么？

原文

In SSE the prefixes 066h (operand size override) 0F2H (REPNE) and 0F3h (REPE) are part of the opcode.

In non-SSE 066h switches between 32-bit (or 64-bit) and 16-bit operation. 0F2h and 0F3h are used for string operations. They can be combined so that 066h and 0F2h (or 0F3h) can be used in the same instruction, because this is meaningful. What is the behavior in an SSE instruction? For instance, we have (ignoring mod/rm for now):

0f 58      addps
66 0f 58   addpd
f2 0f 58   addsd
f3 0f 58   addss

But what is this?

66 f2 0f 58

And how about?

f2 66 0f 58

Not to mention the following which has two conflicting REP prefixes:

f2 f3 0f 58

What is the spec for these?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

寄居者 2024-08-31 07:08:08

我不记得见过任何关于在疯狂组合随机前缀的情况下应该期待什么的规范，所以我猜 CPU 行为可能是“未定义的”并且可能是特定于 CPU 的。（显然，有些事情在例如英特尔的文档中指定，但许多情况没有涵盖）。并且某些组合可能会保留以供将来使用。

我天真的假设通常是额外的前缀将是空操作，但不能保证。这似乎是合理的，因为例如一些优化手册建议通过前缀 66h 来使用多字节 NOP （规范 90h），例如：

db 66h, 90h; 2-byte NOP
db 66h, 66h, 90h; 3-byte NOP
db 66h, 66h, 66h, 90h; 4-byte NOP

但是，我也知道 CS 和 DS 段覆盖前缀已经获得了作为 SSE2 分支提示前缀的新颖功能（预测分支 = 3Eh = DS< /code> override; 当应用于条件跳转指令时，预测分支不被采用 = 2Eh = CS override。

，我查看了上面的示例，始终将 XMM1 设置为所有 0 并将 XMM7 设置为所有 0FFh

pxor xmm1, xmm1    ; xmm1 <- 0s
pcmpeqw xmm7, xmm7 ; xmm7 <- FFs

无论如何然后是有问题的代码，带有 xmm1, xmm7 参数。我观察到的情况（Win64 系统和 Intel T7300 Core 2 Duo 上的 32 位代码）是：

1) 通过添加 66h 前缀，没有观察到 addsd 发生变化

db 66h 
addsd xmm1, xmm7 ;total sequence = 66 F2 0F 58 CF

2) 没有观察到 addsd 发生变化 2) 没有观察到 addsd 发生变化code>addss 通过添加 0F2h 前缀

db 0f2h     
addss xmm1,xmm7 ;total sequence = F2 F3 0F 58 CF

3) 但是，我通过在 addpd 加上 0F2h 前缀观察到了变化：

db 0f2h    
addpd xmm1, xmm7 ;total sequence = F2 66 0F 58 CF

在这种情况下， XMM1 中的结果是 0000000000000000FFFFFFFFFFFFFFFFh 而不是 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFh。

所以我的结论是，人们不应该做出任何假设并期待“未定义”的行为。不过，如果您能在 Agner folg 的手册中找到一些线索，我不会感到惊讶。

I do not remember having seen any specification on what you should expect in the case of wildly combining random prefixes, so I guess CPU behaviour may be "undefined" and possibly CPU-specific. (Clearly, some things are specified in e.g. Intel's docs, but many cases aren't covered). And some combinations may be reserved for future use.

My naive assumptions would generally have been that additional prefixes would be no-ops but there's no guarantee. That seems reasonable given that e.g. some optimising manuals recommend multi-byte NOP (canonically 90h) by prefixing with 66h, e.g.:

db 66h, 90h; 2-byte NOP
db 66h, 66h, 90h; 3-byte NOP
db 66h, 66h, 66h, 90h; 4-byte NOP

However, I also know that CS and DS segment override prefixes have aquired novel functions as SSE2 branch hint prefixes (predict branch taken = 3Eh = DS override; predict branch not taken = 2Eh = CS override) when applied to conditional jump instructions.

Anyway, I looked at your examples above, always setting XMM1 to all 0 and XMM7 to all 0FFh by

pxor xmm1, xmm1    ; xmm1 <- 0s
pcmpeqw xmm7, xmm7 ; xmm7 <- FFs

and then the code in question, with xmm1, xmm7 arguments. What I observed (32bit code on Win64 system and Intel T7300 Core 2 Duo) was:

1) no change observed for addsd by adding 66h prefix

db 66h 
addsd xmm1, xmm7 ;total sequence = 66 F2 0F 58 CF

2) no change observed for addss by adding 0F2h prefix

db 0f2h     
addss xmm1,xmm7 ;total sequence = F2 F3 0F 58 CF

3) However, I observed a change by prefixing addpd by 0F2h:

db 0f2h    
addpd xmm1, xmm7 ;total sequence = F2 66 0F 58 CF

In this case, the result in XMM1 was 0000000000000000FFFFFFFFFFFFFFFFh instead of FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFh.

So my conclusion is that one shouldn't make any assumptions and expect "undefined" behaviour. I wouldn't be surprised, however, if you could find some clues in Agner fog's manuals.

回复收藏 0 原文

坦然微笑 2024-08-31 07:08:08

Intel 的 SDM vol.2 手册（指令集参考）将这些称为强制前缀。将它们视为操作码的一部分。

但是，是的，它们是前缀，可以在实际转义字节+操作码之前与其他前缀混合。事实上，REX 前缀必须位于其他前缀之后。

与往常一样，使用同一组中的多个冲突前缀恰好会在当前英特尔硬件上以最后一个前缀优先进行解码。我认为英特尔手册说这样做可能会产生不可预测的行为，因此它不能得到保证或面向未来。这不是一件有意义的事；如果您出于对齐原因想要填充指令以使其更长，我认为重复相同的前缀几次是安全的。

B.8 SSE 指令格式和编码
SSE指令使用ModR/M格式，前面带有0FH
前缀字节。一般来说，操作不会重复提供两个
方向（即，单独的加载和存储变体）。
以下三个表（表 B-22、B-23 和 B-24）显示了
SSE SIMD 浮点、SIMD 整数的格式和编码，
分别是高速缓存能力和内存排序指令。一些
SSE 指令需要强制前缀（66H、F2H、F3H）作为
两字节操作码。表格中包含强制前缀。

而且

2.1.2 操作码
主操作码的长度可以是 1、2 或 3 个字节。有时，附加的 3 位操作码字段被编码在
ModR/M 字节。可以在主操作码中定义更小的字段。这些字段定义操作方向、位移大小、寄存器编码、条件代码或符号扩展。使用的编码字段
操作码根据操作类别而变化。
通用和 SIMD 指令的双字节操作码格式由以下之一组成：
转义操作码字节 0FH 作为主要操作码和第二操作码字节。
强制前缀（66H、F2H 或 F3H）、转义操作码字节和第二个操作码字节（与之前的相同）
项目符号）。
例如，CVTDQ2PD 由以下序列组成：F3 0F E6。第一个字节是强制前缀（不是
被视为重复前缀）。
通用和 SIMD 指令的三字节操作码格式由以下之一组成：
转义操作码字节 0FH 作为主要操作码，加上两个附加操作码字节。
强制前缀（66H、F2H 或 F3H）、转义操作码字节以及两个附加操作码字节（与
上一条）。
例如，XMM 寄存器的 PHADDW 由以下序列组成：66 0F 38 01。第一个字节是强制前缀。

Intel's SDM vol.2 manual (the instruction set reference) refers to these as mandatory prefixes. Think of them as part of the opcode.

But yes, they are prefixes and can be mixed with other prefixes ahead of the actual escape-byte+opcode. In fact a REX prefix must go after other prefixes.

As usual, using multiple conflicting prefixes from the same group happens to decode with the last one taking priority on current Intel hardware. I think Intel manuals say that doing this can give unpredictable behaviour so it's not guaranteed or future proof. It's not a meaningful thing to do; if you want to pad an instruction to make it longer for alignment reasons, I think repeating the same prefix a couple times is safe.

B.8 SSE INSTRUCTION FORMATS AND ENCODINGS
The SSE instructions use the ModR/M format and are preceded by the 0FH
prefix byte. In general, operations are not duplicated to provide two
directions (that is, separate load and store variants).
The following three tables (Tables B-22, B-23, and B-24) show the
formats and encodings for the SSE SIMD floating-point, SIMD integer,
and cacheability and memory ordering instructions, respectively. Some
SSE instructions require a mandatory prefix (66H, F2H, F3H) as part of
the two-byte opcode. Mandatory prefixes are included in the tables.

And also

2.1.2 Opcodes
A primary opcode can be 1, 2, or 3 bytes in length. An additional 3-bit opcode field is sometimes encoded in the
ModR/M byte. Smaller fields can be defined within the primary opcode. Such fields define the direction of operation, size of displacements, register encoding, condition codes, or sign extension. Encoding fields used by an
opcode vary depending on the class of operation.
Two-byte opcode formats for general-purpose and SIMD instructions consist of one of the following:
An escape opcode byte 0FH as the primary opcode and a second opcode byte.
A mandatory prefix (66H, F2H, or F3H), an escape opcode byte, and a second opcode byte (same as previous
bullet).
For example, CVTDQ2PD consists of the following sequence: F3 0F E6. The first byte is a mandatory prefix (it is not
considered as a repeat prefix).
Three-byte opcode formats for general-purpose and SIMD instructions consist of one of the following:
An escape opcode byte 0FH as the primary opcode, plus two additional opcode bytes.
A mandatory prefix (66H, F2H, or F3H), an escape opcode byte, plus two additional opcode bytes (same as
previous bullet).
For example, PHADDW for XMM registers consists of the following sequence: 66 0F 38 01. The first byte is the mandatory prefix.