对于少于四个参数的函数是否需要保留堆栈空间?

发布于 2024-12-03 23:37:57 字数 605 浏览 5 评论 0原文

刚刚开始学习 x64 汇编,我有一个关于函数、参数和堆栈的问题。据我了解,函数中的前四个参数在 Windows 中传递给 rcx、rdx、r8 和 r9 寄存器(以及用于浮点数的 xmm0-xmm3)。因此,具有四个参数的简单加法函数如下所示:

add:
   mov r10, rcx
   add r10, rdx
   add r10, r8
   add r10, r9
   mov rax, r10
   ret

但是,我遇到过 提到这一点的文档

每个函数至少必须在堆栈上保留 32 个字节(四个 64 位值)。此空间允许将传递到函数的寄存器轻松复制到众所周知的堆栈位置。 被调用函数不需要将输入寄存器参数溢出到堆栈,但堆栈空间预留可确保在需要时可以这样做。

因此,即使函数我需要保留堆栈空间,我也必须保留堆栈空间吗?正在使用四个或更少的参数,或者这只是一个建议?

Just started learning x64 assembly and I have a question about functions, arguments, and the stack. As far as I understand it, the first four arguments in a function get passed to rcx, rdx, r8, and r9 registers (and xmm0-xmm3 for floats) in Windows. So a trivial addition function with four parameters would looks like this:

add:
   mov r10, rcx
   add r10, rdx
   add r10, r8
   add r10, r9
   mov rax, r10
   ret

However, I've come across documentation that mentions this:

At a minimum, each function must reserve 32 bytes (four 64-bit values) on the stack. This space allows registers passed into the function to be easily copied to a well-known stack location. The callee function isn't required to spill the input register params to the stack, but the stack space reservation ensures that it can if needed.

So, do I have to reserve stack space even if the functions I'm making take four parameters or less, or is it just a recommendation?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

梦里南柯 2024-12-10 23:37:57

您的引用来自文档的“调用约定”部分。至少,如果您不从汇编代码中调用其他函数,则不必担心这一点。如果您这样做,那么您必须尊重“红色区域”和堆栈对齐注意事项,您引用的建议旨在确保这一点。

编辑:这篇文章澄清了“红色区域”和“阴影空间”之间的区别。

Your quote is from the "calling convention" part of the documentation. At the very least, you do not have to worry about this if you do not call other functions from your assembly code. If you do, then you must respect, among other things, "red zone" and stack alignment considerations, that the recommendation you quote is intended to ensure.

EDIT: this post clarifies the difference between "red zone" and "shadow space".

蝶…霜飞 2024-12-10 23:37:57

在深入研究并阅读文档之后,需要为您调用的任何函数保留 32 个字节。如果你的函数像示例一样简单,并且不调用其他函数,则不必保留此空间。但是,您调用的任何函数都可以使用这 32 个字节,因此如果您不保留它们,

该函数也可能依赖于调用您函数的堆栈上有 32 个字节(如果它遵循 ABI)。通常,这个 32 字节区域用于保存将在函数中更改的寄存器,以便您可以在返回之前恢复它们的值。我认为这是出于性能目的,选择 32 字节足以使其成为最重要的
叶函数(不调用其他函数的函数)不需要保留任何堆栈空间,并且在堆栈上有临时空间来保存寄存器并在返回之前恢复它们。举个例子:

调用函数:

CallingFunction:
  push rbp
  mov rbp, rsp
  sub rsp, 40  // $20 bytes we want to use at [rbp+30],
               // plus $20 bytes for calling other functions
               // according to windows ABI spec
  mov rcx, [rsi+10]     // parameter 1 (xmm0 if non-int)
  mov rdx, 10           // parameter 2 (xmm1 if non-int)
  movss xmm2, [rsi+28]  // parameter 3 (r8 if int)
  mov r9, [rsi+64]      // parameter 4 (xmm3 if non-int)
  call MyFunction
  // ... do other stuff
  add rsp, 40           // free space we reserved
  pop rbp
  xor rax,rax
  ret

被调用函数

CalledFunction:
  push rbp      // standard
  mov rbp, rsp  // standard

  // should do 'sub rsp, 20' here if calling any functions
  // to give them a free scratch area

  // [rbp] is saved rbp
  // [rbp+8] is return address
  // [rbp+10] to [rbp+2f] are the 0x20 bytes we can
  //     safely modify in this function, this could
  //     be pushed higher if the function had more than 4
  //     parameters and some had to be passed on the stack
  //     or if returning a structure or something that needs
  //     more space.  In these cases the CALLER would have
  //     allocated more space for us

  // the main reason for the 0x20 is so that we can save 
  // registers we want to modify without having to allocate
  // stack space ourselves
  mov [rbp+10], rsi // save rsi in space allocated by caller
  mov [rbp+18], rdi // save rdi in space allocated by caller
  mov rsi, [rcx+20]
  mov rdi, [rsi+48]
  add rdi, [rsi+28]
  mov rax, rdi
  mov rdi, [rbp+18] // restore changed register
  mov rsi, [rbp+10] // restore changed register
  pop rbp
  ret

原始答案

我刚刚遇到了这个不知道的情况,似乎是这样。例如,GetAsyncKeyState 中的前两条指令会覆盖 0x20 字节区域中返回地址上方的堆栈,您应该为被调用者保留用于参数:

user32.GetAsyncKeyState  - mov [rsp+08],rbx
user32.GetAsyncKeyState+5- mov [rsp+10],rsi
user32.GetAsyncKeyState+A- push rdi
user32.GetAsyncKeyState+B- sub rsp,20

After playing with this more and reading the docs, the 32 bytes need to be reserved for any function that you call. If your function is as simple as the example and you don't call other functions, you don't have to reserve this space. Any function you call however may use this 32 bytes so if you do not reserve them the function may

Also your function may rely on there being 32 bytes available on the stack from the function that called yours if it's following the ABI. Commonly this 32 byte area is used to save registers that will be changed in your function so you can restore their values before returning. I think is is for performance purposes, 32 bytes being chosen as enough to make it so most
leaf functions (functions that don't call others) don't need to reserve any stack space, and have temporary room on the stack to save registers and restore them before returning. Take this example:

Calling Function:

CallingFunction:
  push rbp
  mov rbp, rsp
  sub rsp, 40  // $20 bytes we want to use at [rbp+30],
               // plus $20 bytes for calling other functions
               // according to windows ABI spec
  mov rcx, [rsi+10]     // parameter 1 (xmm0 if non-int)
  mov rdx, 10           // parameter 2 (xmm1 if non-int)
  movss xmm2, [rsi+28]  // parameter 3 (r8 if int)
  mov r9, [rsi+64]      // parameter 4 (xmm3 if non-int)
  call MyFunction
  // ... do other stuff
  add rsp, 40           // free space we reserved
  pop rbp
  xor rax,rax
  ret

Called Function

CalledFunction:
  push rbp      // standard
  mov rbp, rsp  // standard

  // should do 'sub rsp, 20' here if calling any functions
  // to give them a free scratch area

  // [rbp] is saved rbp
  // [rbp+8] is return address
  // [rbp+10] to [rbp+2f] are the 0x20 bytes we can
  //     safely modify in this function, this could
  //     be pushed higher if the function had more than 4
  //     parameters and some had to be passed on the stack
  //     or if returning a structure or something that needs
  //     more space.  In these cases the CALLER would have
  //     allocated more space for us

  // the main reason for the 0x20 is so that we can save 
  // registers we want to modify without having to allocate
  // stack space ourselves
  mov [rbp+10], rsi // save rsi in space allocated by caller
  mov [rbp+18], rdi // save rdi in space allocated by caller
  mov rsi, [rcx+20]
  mov rdi, [rsi+48]
  add rdi, [rsi+28]
  mov rax, rdi
  mov rdi, [rbp+18] // restore changed register
  mov rsi, [rbp+10] // restore changed register
  pop rbp
  ret

Original answer

I just ran into this not knowing and it seems to be the case. The first two instructions in GetAsyncKeyState for instance overwrite the stack above the return address in the 0x20 byte area you're supposed to reserve for the callee to use for parameters:

user32.GetAsyncKeyState  - mov [rsp+08],rbx
user32.GetAsyncKeyState+5- mov [rsp+10],rsi
user32.GetAsyncKeyState+A- push rdi
user32.GetAsyncKeyState+B- sub rsp,20
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文