当前位置：文江博客话题详情

添加指令表格，英特尔手册没有描述操作数大小覆盖“字节”或“ word＆quot”

发布于 2025-01-21 09:49:59 字数 1109 浏览 0 评论 0 原文

我正在查看一个由NASM组装的Intel组装的示例。它具有说明：

add byte [ebx], 32

我从文档中如何知道“字节”是什么？

我正在阅读的书在文本中解释了“字节”如何告诉汇编程序我们只写一个字节到 ebx 。我尚不清楚我如何从查看文档中知道这一点。

从书籍和其他地方的示例中，看起来 add 指令具有两种形式：

add＆lt; dest＆gt; ＆lt; src＆gt;
add＆lt; size＆gt; ＆lt; dest＆gt; ＆lt; src＆gt;

但是，当我查看英特尔文档[1]时，我看不到任何看起来像我的表格的东西。表中给出的每个说明都只有一个逗号，对我而言，它似乎所有相应的opcodes都只有两个输入。有一个表格“指令操作数编码”。操作数3和4 na。环顾网络，大多数网站都没有提及大小参数（如果适用于我的处理器，更不用说它）了。

我正在以386模式组装Intel（R）Core（TM）I7-6700HQ CPU：

nasm -f elf -g -F stabs -o $OBJECTFILE $1
ld -m elf_i386 -o $BUILDNAME $OBJECTFILE

也许该指令需要386的额外操作数，但不适合新的架构？

[1]“Intel®64和IA-32体系结构软件开发人员手册组合卷：1、2a，2b，2c，2d，2d，3a，3a，3b，3b，3c，3c，3d和4“，vol。 2a 3-31 Page 605在PDF中。

https> https：//www.intel。 com/content/www/us/en/ens/developer/articles/technical/intel-sdm.html

原文

I'm looking at an example of Intel assembly being assembled by NASM. It has the instruction:

add byte [ebx], 32

How would I know from the documentation what "byte" does?

The book I'm reading explains in the text how "byte" tells the assembler that we're only writing a single byte to ebx. It's not clear to me how I'd know this from looking at the documentation.

From examples in the book and elsewhere, it looks like the ADD instruction has two forms:

ADD <dest> <src>
ADD <size> <dest> <src>

However, when I look at the Intel documentation[1], I don't see anything that looks like either of my forms. Each of the instructions given in the table have only a single comma which, to me, makes it seem like all the corresponding opcodes take only two inputs. There is a table that gives "Instruction Operand Encoding". Operands 3 and 4
NA. Looking around the web, most sites don't mention anything about the size parameter (let alone if it applies to my processor).

I'm assembling on an Intel(R) Core(TM) i7-6700HQ CPU in 386 mode:

nasm -f elf -g -F stabs -o $OBJECTFILE $1
ld -m elf_i386 -o $BUILDNAME $OBJECTFILE

Maybe the instruction takes an extra operand for 386 but not for newer architectures?

[1] "Intel® 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4", Vol. 2A 3-31 page 605 in the pdf.

https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

瑶笙 2025-01-28 09:49:59

ASM源中的字节关键字设置了指令的操作数尺寸属性。在机器代码中，数字opcode或对于16/32/64位操作数大小所暗示的，当前的CPU模式和非字节opcode的前缀。 英特尔的手册记录机器代码表单，而不是ASM源语法。

请参阅以下re：如何编码到机器代码中。

时，这也是为什么汇编者也有手册时，手册分开。

与ISA a>
3.1 nasm源线的布局描述了助记符和操作数的语法布局。不幸的是，该部分忽略了您可以放在操作数字前的覆盖，只有 o16 之类的前缀，您才能将其放在Mnemonic的前面！（作为指定操作数大小的笨拙的手册方法。）

手册确实具有在多个位置使用操作数大小覆盖的示例，例如 2.2 MASM用户的快速启动它指出NASM需要 MOV Word [var]，2 ，即使VAR是 var dw 0 在MASM中，它将神奇地暗示该指令的操作数大小。并提及相同的指定器与 strict> strict> strint 迫使即时编码，而不仅仅是操作数大小。例如添加ECX，严格的DWord 123 强制添加R/M32，IMM32 表单，而添加ECX，DWord 123 仍然允许添加R/M32，IMM8 表单。 https://www.felixcloutier.com/x86/add ）

（像GAS和CLANG/LLVM一样，默认使用 at st＆amp; t语法这与Intel Manuals用来谈论指令的内容截然不同。，在指令mnemonic上的后缀指定操作数大小的地方，例如 movb $'a'，（％rdi）而不是MASM MOV BYTE PTR [RDI]， 'a'（注意额外的 ptr 关键字）或nasm mov byte [rdi]，'a'

汇编语法取决于该工具，而不是ISA 。英特尔的手册，尤其是第2卷，列出了所有可用说明的部分，请执行 指定如何在ASM源中指定操作数大小的语法详细信息。

在ASM源中，寄存器可以暗示操作数大小的

说明，其中两个操作数必须相同的说明，寄存器操作数意味着ASM源语法中的操作数大小，因此您无需指定。例如添加eax，[rdi] 不需要是添加eax，dword [rdi] 。

但是Mov-Immediate到内存（或任何其他 op mem，IMM 指令），以及 inc [mem] 的单一记忆说明和稀有指令，也是模棱两可的操作数不必像 shl [rdi]，cl （目标大小为b/w/w/d/q）或 movzx eax，[rdi] （源大小可以是字节或单词）

请参见我何时需要在组装中指定操作数的大小？

像nasm这样的好的汇编器会在这种歧义上出错。不太好的汇编商有时会选择默认设置。例如，燃气选择除了MOV以外的说明，例如加入$ 1，（％rdi），直到最近才添加了警告！

同样， [rdi + rax] 指定64位地址尺寸，而 [EDI + EAX] 将是32位地址大小。 [1234] 之类的默认地址大小（在ASM源中）是当前模式的位，即计算机中不使用 67 地址>地址大小的前缀代码。

同样，这是关于ASM源级语法的100％。将指令编码到机器代码的某个模式中必然意味着操作数大小。

这就是为什么您需要告诉汇编程序CPU将在哪种模式中解码。通常，通过使用 nasm -felf64 组装来制作一个64位对象文件。在这种情况下，位32 会让您将错配的机器代码放入错误的对象文件中，而不是从按按下EBX 在64--的编码中引起错误。位模式。

The byte keyword in the asm source sets the operand-size attribute of the instruction. In machine code, that would be implied by the numeric opcode, or for 16/32/64-bit operand-size, by the current CPU mode and prefixes for the non-byte opcode. Intel's manual documents the machine-code forms, not asm source syntax.

See the following re: how that gets encoded into machine code.

This is why assemblers have manuals, too, separate from the ISA manual.

For example, NASM's manual, Chapter 3: The NASM Language
3.1 Layout of a NASM Source Line describes the syntax layout of a mnemonic and operands. Unfortunately that section neglects to mention the overrides you can put in front of operands, only the prefixes like o16 you can put in front of the mnemonic! (As a clunkier manual way to specify the operand-size.)

The manual does has examples of usage of operand-size overrides in multiple places, e.g. in 2.2 Quick Start for MASM Users it points out that NASM needs mov word [var], 2 even if var is var dw 0 which in MASM would magically imply an operand-size for that instruction. And mention of the same specifiers when used with strict to force the encoding of the immediate, not just the operand-size. e.g. add ecx, strict dword 123 forces the add r/m32, imm32 form, while add ecx, dword 123 still allows the add r/m32, imm8 form. (https://www.felixcloutier.com/x86/add)

Some other x86 assemblers, like GAS and clang/LLVM, by default use AT&T syntax that's very different from what Intel manuals use to talk about instructions, where operand-size is specified (if needed) by a suffix on the instruction mnemonic, like movb $'a', (%rdi) instead of MASM mov byte ptr [rdi], 'a' (note the extra ptr keyword) or NASM mov byte [rdi], 'a'

Assembly syntax depends on the tool, not the ISA. Intel's manuals, especially vol.2, the part that lists every available instruction, do not specify the syntax details of how to specify operand-size in asm source when it would be ambiguous.

In asm source, a register can imply operand-size

For instructions where both operands must be the same size, a register operand implies the operand-size in asm source syntax, so you don't need to specify it. e.g. add eax, [rdi] doesn't need to be add eax, dword [rdi].

But mov-immediate to memory (or any other op mem,imm instruction) are ambiguous, as are one-operand memory instructions like inc [mem], and the rare instructions where operands don't have to be the same size like shl [rdi], cl (destination size could be b/w/d/q) or movzx eax, [rdi] (source size could be byte or word)

See When do I need to specify the size of the operand in Assembly?

Good assemblers like NASM will error on that ambiguity. Less-good assemblers will sometimes just pick a default. e.g. GAS picks dword for instructions other than MOV, e.g. add $1, (%rdi), and only recently even added a warning about that!

Similarly, [rdi + rax] specifies 64-bit address-size, while [edi + eax] would be 32-bit address-size. The default address-size (in asm source) for something like [1234] is the bitness of the current mode, i.e. not using a 67 address-size prefix in the machine code.

Again, this is 100% about asm source-level syntax. Encoding an instruction into machine code for a certain mode necessarily implies an operand-size.

That's why you need to tell the assembler what mode the CPU will be decoding in. e.g. with NASM bits 32 if you're making a flat binary or switching modes in a bootloader. Or more normally by assembling with nasm -felf64 to make a 64-bit object file. In that case, bits 32 would let you put mismatched machine code into the wrong object file, instead of causing an error at assemble time from push ebx not being encodeable for 64-bit mode.