64 位汇编,何时使用较小尺寸的寄存器

发布于 2024-11-18 14:17:26 字数 840 浏览 5 评论 0原文

据我了解,在 x86_64 汇编中,例如有(64 位)rax 寄存器,但它也可以作为 32 位寄存器、eax、16 位、ax 和 8 位等进行访问。在什么情况下我不会只使用完整的 64 位,为什么,会有什么优势?

举个例子,用这个简单的 hello world 程序:

section .data
msg: db "Hello World!", 0x0a, 0x00
len: equ $-msg

section .text
global start

start:
mov rax, 0x2000004      ; System call write = 4
mov rdi, 1              ; Write to standard out = 1
mov rsi, msg            ; The address of hello_world string
mov rdx, len            ; The size to write
syscall                 ; Invoke the kernel
mov rax, 0x2000001      ; System call number for exit = 1
mov rdi, 0              ; Exit success = 0
syscall                 ; Invoke the kernel

rdi 和 rdx,至少只需要 8 位而不是 64 位,对吧?但是,如果我将它们分别更改为 dil 和 dl(它们的较低 8 位等效值),程序会进行汇编和链接,但不会输出任何内容。

但是,如果我使用 eax、edi 和 edx,它仍然可以工作,那么我应该使用这些而不是完整的 64 位吗?为什么或为什么不呢?

I understand in x86_64 assembly there is for example the (64 bit) rax register, but it can also be accessed as a 32 bit register, eax, 16 bit, ax, and 8 bit, al. In what situation would I not just use the full 64 bits, and why, what advantage would there be?

As an example, with this simple hello world program:

section .data
msg: db "Hello World!", 0x0a, 0x00
len: equ $-msg

section .text
global start

start:
mov rax, 0x2000004      ; System call write = 4
mov rdi, 1              ; Write to standard out = 1
mov rsi, msg            ; The address of hello_world string
mov rdx, len            ; The size to write
syscall                 ; Invoke the kernel
mov rax, 0x2000001      ; System call number for exit = 1
mov rdi, 0              ; Exit success = 0
syscall                 ; Invoke the kernel

rdi and rdx, at least, only need 8 bits and not 64, right? But if I change them to dil and dl, respectively (their lower 8-bit equivalents), the program assembles and links but doesn't output anything.

However, it still works if I use eax, edi and edx, so should I use those rather than the full 64-bits? Why or why not?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

笑,眼淚并存 2024-11-25 14:17:26

你在这里问几个问题。

如果您只加载寄存器的低 8 位,则寄存器的其余部分将保持其先前的值。这可以解释为什么你的系统调用得到了错误的参数。

当您只需要 32 位时,使用 32 位的原因之一是许多使用 EAX 或 EBX 的指令比使用 RAX 或 RBX 的指令短一个字节。这也可能意味着加载到寄存器中的常量更短。

该指令集已经发展了很长时间并且有很多怪癖!

You are asking several questions here.

If you just load the low 8 bits of a register, the rest of the register will keep its previous value. That can explain why your system call got the wrong parameters.

One reason for using 32 bits when that is all you need is that many instructions using EAX or EBX are one byte shorter than those using RAX or RBX. It might also mean that constants loaded into the register are shorter.

The instruction set has evolved over a long time and has quite a few quirks!

在风中等你 2024-11-25 14:17:26

首先也是最重要的是将较小的(例如 8 位)值从内存(读取字符、处理数据结构、反序列化网络数据包等)加载到寄存器中。

MOV AL, [0x1234]

MOV RAX, [0x1234]
SHR RAX, 56
# assuming there are actually 8 accessible bytes at 0x1234,
# and they're the right endianness; otherwise you'd need
# AND RAX, 0xFF or similar...

或者,当然,将所述值写回到内存中。


(编辑,就像 6 年后):

因为这种情况不断出现:

MOV AL, [0x1234]
  • 仅读取 0x1234 处的单个字节内存(相反只会覆盖单个字节内存)
  • 保留其他 56 中的内容RAX 位
    • 这会在 RAX 的过去和未来值之间创建依赖关系,因此 CPU 无法使用 注册重命名

相比之下:

MOV RAX, [0x1234]
  • 读取从 0x1234 开始的 8 字节内存(相反会覆盖 8 字节内存)
  • 覆盖 RAX 的全部
  • 假设内存中的字节与 CPU 具有相同的字节序(在网络中通常不是这样)数据包,因此是我几年前的 SHR 指令)

还需要注意的是:

MOV EAX, [0x1234]

然后,正如注释中提到的,有:

MOVZX EAX, byte [0x1234]
  • 仅读取0x1234处的内存的单个字节
  • 扩展值以填充所有 EAX(以及 RAX)都为零(消除依赖性并允许寄存器重命名优化)。

在所有这些情况下,如果您想从“A”寄存器写入到内存中,您必须选择宽度:

MOV [0x1234], AL   ; write a byte (8 bits)
MOV [0x1234], AX   ; write a word (16 bits)
MOV [0x1234], EAX  ; write a dword (32 bits)
MOV [0x1234], RAX  ; write a qword (64 bits)

First and foremost would be when loading a smaller (e.g. 8-bit) value from memory (reading a char, working on a data structure, deserialising a network packet, etc.) into a register.

MOV AL, [0x1234]

versus

MOV RAX, [0x1234]
SHR RAX, 56
# assuming there are actually 8 accessible bytes at 0x1234,
# and they're the right endianness; otherwise you'd need
# AND RAX, 0xFF or similar...

Or, of course, writing said value back to memory.


(Edit, like 6 years later):

Since this keeps coming up:

MOV AL, [0x1234]
  • only reads a single byte of memory at 0x1234 (the inverse would only overwrite a single byte of memory)
  • keeps whatever was in the other 56 bits of RAX
    • This creates a dependency between the past and future values of RAX, so the CPU can't optimise the instruction using register renaming.

By contrast:

MOV RAX, [0x1234]
  • reads 8 bytes of memory starting at 0x1234 (the inverse would overwrite 8 bytes of memory)
  • overwrites all of RAX
  • assumes the bytes in memory have the same endianness as the CPU (often not true in network packets, hence my SHR instruction years ago)

Also important to note:

MOV EAX, [0x1234]

Then, as mentioned in the comments, there is:

MOVZX EAX, byte [0x1234]
  • only reads a single byte of memory at 0x1234
  • extends the value to fill all of EAX (and thus RAX) with zeroes (eliminating the dependency and allowing register renaming optimisations).

In all of these cases, if you want to write from the 'A' register into memory you'd have to pick your width:

MOV [0x1234], AL   ; write a byte (8 bits)
MOV [0x1234], AX   ; write a word (16 bits)
MOV [0x1234], EAX  ; write a dword (32 bits)
MOV [0x1234], RAX  ; write a qword (64 bits)
思念绕指尖 2024-11-25 14:17:26

如果您只需要 32 位寄存器,您可以安全地使用它们,这在 64 位下是可以的。但如果您只需要 16 位或 8 位寄存器,请尽量避免使用它们或始终使用 movzx/movsx 来清除剩余位。众所周知,在x86-64下,使用32位操作数会清除64位寄存器的高位。这样做的主要目的是避免错误的依赖链。

请参阅 英特尔® 64 和 IA-32 架构软件开发人员手册第 1 卷

32 位操作数生成 32 位结果,在目标通用寄存器中零扩展为 64 位结果

打破依赖链允许指令以随机顺序并行执行,通过 乱序算法 自 Pentium Pro 以来由 CPU 内部实现1995 年。

引用自 英特尔® 64 和 IA-32 架构优化参考手册,第 3.5.1.8 节:

修改部分寄存器的代码序列可能会在其依赖链中遇到一些延迟,但可以通过使用依赖破坏惯用法来避免。在基于Intel Core微架构的处理器中,当软件使用这些指令将寄存器内容清零时,许多指令可以帮助清除执行依赖性。通过对 32 位寄存器而不是部分寄存器进行操作,打破指令之间对寄存器部分的依赖性。对于移动,这可以通过 32 位移动或使用 MOVZX 来完成。

汇编/编译器编码规则 37。(M 影响,MH 通用性):通过操作 32 位寄存器而不是部分寄存器来打破指令之间对寄存器部分的依赖性。对于移动,这可以通过 32 位移动或使用 MOVZX 来完成。

对于 x64,具有 32 位操作数的 MOVZX 和 MOV 是等效的 - 它们都破坏依赖链。

这就是为什么如果您在使用较小的寄存器时始终尝试清除较大寄存器的最高位,您的代码将执行得更快。当这些位总是被清除时,不依赖于寄存器的先前值,CPU可以在内部重命名寄存器。

寄存器重命名 是 CPU 内部使用的一种技术,它消除了由于连续指令重用寄存器而产生的错误数据依赖性,而这些连续指令之间没有任何真正的数据依赖性。

If you just need 32-bit registers, you can safely work with them, this is OK under 64-bit. But if you just need 16-bit or 8-bit registers, try to avoid them or always use movzx/movsx to clear the remaining bits. It is well known that under x86-64, using 32-bit operands clears the higher bits of the 64-bit register. The main purpose of this is avoid false dependency chains.

Please refer to the relevant section - 3.4.1.1 - of The Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 1:

32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose register

Breaking dependency chains allows the instructions to execute in parallel, in random order, by the Out-of-Order algorithm implemented internally by CPUs since Pentium Pro in 1995.

A Quote from the Intel® 64 and IA-32 Architectures Optimization Reference Manual, Section 3.5.1.8:

Code sequences that modifies partial register can experience some delay in its dependency chain, but can be avoided by using dependency breaking idioms. In processors based on Intel Core micro-architecture, a number of instructions can help clear execution dependency when software uses these instruction to clear register content to zero. Break dependencies on portions of registers between instructions by operating on 32-bit registers instead of partial registers. For moves, this can be accomplished with 32-bit moves or by using MOVZX.

Assembly/Compiler Coding Rule 37. (M impact, MH generality): Break dependencies on portions of registers between instructions by operating on 32-bit registers instead of partial registers. For moves, this can be accomplished with 32-bit moves or by using MOVZX.

The MOVZX and MOV with 32-bit operands for x64 are equivalent - they all break dependency chains.

That's why your code will execute faster if you always try clear the highest bits of larger registers when using smaller registers. When the bits are always cleard, thre are no dependencies on the previous value of the register, the CPU can internally rename the registers.

Register renaming is a technique used internally by a CPU that eliminates the false data dependencies arising from the reuse of registers by successive instructions that do not have any real data dependencies between them.

路还长,别太狂 2024-11-25 14:17:26

如果您只想使用 8 位数量,那么您可以使用 AL 寄存器。 AX 和 EAX 相同。

例如,您可以有一个包含两个 32 位值的 64 位值。您可以通过访问 EAX 寄存器来处理低 32 位。当您想要处理高 32 位时,可以交换两个 32 位数量(反转寄存器中的 DWORD),以便高位现在位于 EAX 中。

If you want to work with only an 8-bit quantity, then you'd work with the AL register. Same for AX and EAX.

For example, you could have a 64-bit value that contains two 32-bit values. You can work on the low 32-bits by accessing the EAX register. When you want to work on the high 32-bits, you can swap the two 32-bit quantities (reverse the DWORDs in the register) so that the high bits are now in EAX.

櫻之舞 2024-11-25 14:17:26

64 位 是您可以作为单个单元使用的最大内存。这并不意味着您需要使用多少。

如果需要 8 位,则使用 8。如果需要 16,则使用 16。如果多少位不重要,那么使用多少位也没关系。

诚然,在 64 位处理器上,使用完整 64 位的开销非常小。但是,例如,如果您正在计算字节值,则使用字节将意味着结果已经是正确的大小。

64-bit is the largest piece of memory you can work with as a single unit. That doesn't mean that's how much you need to use.

If you need 8 bits, use 8. If you need 16, use 16. If it doesn't matter how many bits, then it doesn't matter how many you use.

Admittedly, when on a 64-bit processor, there's very little overhead to use the full 64 bits. But if, for example, you are calculating a byte value, working with a byte will mean the result will already be the correct size.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文