64 位汇编,何时使用较小尺寸的寄存器
据我了解,在 x86_64 汇编中,例如有(64 位)rax 寄存器,但它也可以作为 32 位寄存器、eax、16 位、ax 和 8 位等进行访问。在什么情况下我不会只使用完整的 64 位,为什么,会有什么优势?
举个例子,用这个简单的 hello world 程序:
section .data
msg: db "Hello World!", 0x0a, 0x00
len: equ $-msg
section .text
global start
start:
mov rax, 0x2000004 ; System call write = 4
mov rdi, 1 ; Write to standard out = 1
mov rsi, msg ; The address of hello_world string
mov rdx, len ; The size to write
syscall ; Invoke the kernel
mov rax, 0x2000001 ; System call number for exit = 1
mov rdi, 0 ; Exit success = 0
syscall ; Invoke the kernel
rdi 和 rdx,至少只需要 8 位而不是 64 位,对吧?但是,如果我将它们分别更改为 dil 和 dl(它们的较低 8 位等效值),程序会进行汇编和链接,但不会输出任何内容。
但是,如果我使用 eax、edi 和 edx,它仍然可以工作,那么我应该使用这些而不是完整的 64 位吗?为什么或为什么不呢?
I understand in x86_64 assembly there is for example the (64 bit) rax register, but it can also be accessed as a 32 bit register, eax, 16 bit, ax, and 8 bit, al. In what situation would I not just use the full 64 bits, and why, what advantage would there be?
As an example, with this simple hello world program:
section .data
msg: db "Hello World!", 0x0a, 0x00
len: equ $-msg
section .text
global start
start:
mov rax, 0x2000004 ; System call write = 4
mov rdi, 1 ; Write to standard out = 1
mov rsi, msg ; The address of hello_world string
mov rdx, len ; The size to write
syscall ; Invoke the kernel
mov rax, 0x2000001 ; System call number for exit = 1
mov rdi, 0 ; Exit success = 0
syscall ; Invoke the kernel
rdi and rdx, at least, only need 8 bits and not 64, right? But if I change them to dil and dl, respectively (their lower 8-bit equivalents), the program assembles and links but doesn't output anything.
However, it still works if I use eax, edi and edx, so should I use those rather than the full 64-bits? Why or why not?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
你在这里问几个问题。
如果您只加载寄存器的低 8 位,则寄存器的其余部分将保持其先前的值。这可以解释为什么你的系统调用得到了错误的参数。
当您只需要 32 位时,使用 32 位的原因之一是许多使用 EAX 或 EBX 的指令比使用 RAX 或 RBX 的指令短一个字节。这也可能意味着加载到寄存器中的常量更短。
该指令集已经发展了很长时间并且有很多怪癖!
You are asking several questions here.
If you just load the low 8 bits of a register, the rest of the register will keep its previous value. That can explain why your system call got the wrong parameters.
One reason for using 32 bits when that is all you need is that many instructions using EAX or EBX are one byte shorter than those using RAX or RBX. It might also mean that constants loaded into the register are shorter.
The instruction set has evolved over a long time and has quite a few quirks!
首先也是最重要的是将较小的(例如 8 位)值从内存(读取字符、处理数据结构、反序列化网络数据包等)加载到寄存器中。
与
或者,当然,将所述值写回到内存中。
(编辑,就像 6 年后):
因为这种情况不断出现:
相比之下:
SHR
指令)还需要注意的是:
然后,正如注释中提到的,有:
在所有这些情况下,如果您想从“A”寄存器写入到内存中,您必须选择宽度:
First and foremost would be when loading a smaller (e.g. 8-bit) value from memory (reading a char, working on a data structure, deserialising a network packet, etc.) into a register.
versus
Or, of course, writing said value back to memory.
(Edit, like 6 years later):
Since this keeps coming up:
By contrast:
SHR
instruction years ago)Also important to note:
Then, as mentioned in the comments, there is:
In all of these cases, if you want to write from the 'A' register into memory you'd have to pick your width:
如果您只需要 32 位寄存器,您可以安全地使用它们,这在 64 位下是可以的。但如果您只需要 16 位或 8 位寄存器,请尽量避免使用它们或始终使用 movzx/movsx 来清除剩余位。众所周知,在x86-64下,使用32位操作数会清除64位寄存器的高位。这样做的主要目的是避免错误的依赖链。
请参阅 英特尔® 64 和 IA-32 架构软件开发人员手册第 1 卷:
打破依赖链允许指令以随机顺序并行执行,通过 乱序算法 自 Pentium Pro 以来由 CPU 内部实现1995 年。
引用自 英特尔® 64 和 IA-32 架构优化参考手册,第 3.5.1.8 节:
对于 x64,具有 32 位操作数的 MOVZX 和 MOV 是等效的 - 它们都破坏依赖链。
这就是为什么如果您在使用较小的寄存器时始终尝试清除较大寄存器的最高位,您的代码将执行得更快。当这些位总是被清除时,不依赖于寄存器的先前值,CPU可以在内部重命名寄存器。
寄存器重命名 是 CPU 内部使用的一种技术,它消除了由于连续指令重用寄存器而产生的错误数据依赖性,而这些连续指令之间没有任何真正的数据依赖性。
If you just need 32-bit registers, you can safely work with them, this is OK under 64-bit. But if you just need 16-bit or 8-bit registers, try to avoid them or always use movzx/movsx to clear the remaining bits. It is well known that under x86-64, using 32-bit operands clears the higher bits of the 64-bit register. The main purpose of this is avoid false dependency chains.
Please refer to the relevant section - 3.4.1.1 - of The Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 1:
Breaking dependency chains allows the instructions to execute in parallel, in random order, by the Out-of-Order algorithm implemented internally by CPUs since Pentium Pro in 1995.
A Quote from the Intel® 64 and IA-32 Architectures Optimization Reference Manual, Section 3.5.1.8:
The MOVZX and MOV with 32-bit operands for x64 are equivalent - they all break dependency chains.
That's why your code will execute faster if you always try clear the highest bits of larger registers when using smaller registers. When the bits are always cleard, thre are no dependencies on the previous value of the register, the CPU can internally rename the registers.
Register renaming is a technique used internally by a CPU that eliminates the false data dependencies arising from the reuse of registers by successive instructions that do not have any real data dependencies between them.
如果您只想使用 8 位数量,那么您可以使用 AL 寄存器。 AX 和 EAX 相同。
例如,您可以有一个包含两个 32 位值的 64 位值。您可以通过访问 EAX 寄存器来处理低 32 位。当您想要处理高 32 位时,可以交换两个 32 位数量(反转寄存器中的 DWORD),以便高位现在位于 EAX 中。
If you want to work with only an 8-bit quantity, then you'd work with the AL register. Same for AX and EAX.
For example, you could have a 64-bit value that contains two 32-bit values. You can work on the low 32-bits by accessing the EAX register. When you want to work on the high 32-bits, you can swap the two 32-bit quantities (reverse the DWORDs in the register) so that the high bits are now in EAX.
64 位
是您可以作为单个单元使用的最大内存。这并不意味着您需要使用多少。如果需要 8 位,则使用 8。如果需要 16,则使用 16。如果多少位不重要,那么使用多少位也没关系。
诚然,在 64 位处理器上,使用完整 64 位的开销非常小。但是,例如,如果您正在计算字节值,则使用字节将意味着结果已经是正确的大小。
64-bit
is the largest piece of memory you can work with as a single unit. That doesn't mean that's how much you need to use.If you need 8 bits, use 8. If you need 16, use 16. If it doesn't matter how many bits, then it doesn't matter how many you use.
Admittedly, when on a 64-bit processor, there's very little overhead to use the full 64 bits. But if, for example, you are calculating a byte value, working with a byte will mean the result will already be the correct size.