帮助理解 GDB 中非常基本的 main() 反汇编

发布于 2024-10-13 05:40:48 字数 1684 浏览 10 评论 0 原文

嘿哟,

我写了这个非常基本的 main 函数来实验反汇编,也为了看看并希望了解底层发生了什么:

int main() {
  return 6;
}

使用 gdb 来 disas main 会产生这样的结果:

0x08048374 <main+0>:    lea    0x4(%esp),%ecx
0x08048378 <main+4>:    and    $0xfffffff0,%esp
0x0804837b <main+7>:    pushl  -0x4(%ecx)
0x0804837e <main+10>:   push   %ebp
0x0804837f <main+11>:   mov    %esp,%ebp
0x08048381 <main+13>:   push   %ecx
0x08048382 <main+14>:   mov    $0x6,%eax
0x08048387 <main+19>:   pop    %ecx
0x08048388 <main+20>:   pop    %ebp
0x08048389 <main+21>:   lea    -0x4(%ecx),%esp
0x0804838c <main+24>:   ret  

这是我对我认为会发生的事情的最佳猜测以及我需要逐行帮助的内容:

lea 0x4(%esp),%ecx

将 esp + 4 的地址加载到 ecx 中。 为什么我们要向 esp 添加 4?

我在某处读到这是命令行参数的地址。但是当我执行 x/d $ecx 时,我得到了 argc 的值。 实际的命令行参数值存储在哪里?

与 $0xfffffff0,%esp

对齐堆栈

pushl -0x4(%ecx)

压入地址其中 esp 最初位于堆栈中。 这样做的目的是什么?

push %ebp

将基指针压入堆栈

mov %esp,%ebp

将当前堆栈指针移入基指针

push %ecx

将原始 esp + 4 的地址压入堆栈。 为什么?

mov $0x6,%eax

我想在这里返回 6,所以我猜测返回值存储在 eax 中?

pop %ecx

将 ecx 恢复为堆栈上的值。 为什么返回时我们希望 ecx 为 esp + 4?

pop %ebp

将 ebp 恢复为堆栈上的值

lea -0x4(%ecx) ,%esp

将 esp 恢复为原始值

ret

我在汇编方面是个n00b,所以任何帮助都会很棒!另外,如果您发现任何关于我认为正在发生的事情的错误陈述,请纠正我。

非常感谢! :]

Heyo,

I have written this very basic main function to experiment with disassembly and also to see and hopefully understand what is going on at the lower level:

int main() {
  return 6;
}

Using gdb to disas main produces this:

0x08048374 <main+0>:    lea    0x4(%esp),%ecx
0x08048378 <main+4>:    and    $0xfffffff0,%esp
0x0804837b <main+7>:    pushl  -0x4(%ecx)
0x0804837e <main+10>:   push   %ebp
0x0804837f <main+11>:   mov    %esp,%ebp
0x08048381 <main+13>:   push   %ecx
0x08048382 <main+14>:   mov    $0x6,%eax
0x08048387 <main+19>:   pop    %ecx
0x08048388 <main+20>:   pop    %ebp
0x08048389 <main+21>:   lea    -0x4(%ecx),%esp
0x0804838c <main+24>:   ret  

Here is my best guess as to what I think is going on and what I need help with line-by-line:

lea 0x4(%esp),%ecx

Load the address of esp + 4 into ecx. Why do we add 4 to esp?

I read somewhere that this is the address of the command line arguments. But when I did x/d $ecx I get the value of argc. Where are the actual command line argument values stored?

and $0xfffffff0,%esp

Align stack

pushl -0x4(%ecx)

Push the address of where esp was originally onto the stack. What is the purpose of this?

push %ebp

Push the base pointer onto the stack

mov %esp,%ebp

Move the current stack pointer into the base pointer

push %ecx

Push the address of original esp + 4 on to stack. Why?

mov $0x6,%eax

I wanted to return 6 here so i'm guessing the return value is stored in eax?

pop %ecx

Restore ecx to value that is on the stack. Why would we want ecx to be esp + 4 when we return?

pop %ebp

Restore ebp to value that is on the stack

lea -0x4(%ecx),%esp

Restore esp to it's original value

ret

I am a n00b when it comes to assembly so any help would be great! Also if you see any false statements about what I think is going on please correct me.

Thanks a bunch! :]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

橘和柠 2024-10-20 05:40:48

堆栈框架

函数体开头的代码:

push  %ebp
mov   %esp, %ebp

是创建所谓的堆栈框架,它是引用过程本地参数和对象的“坚实基础”。 %ebp 寄存器(如其名称所示)用作基址指针,它指向本地的基址(或底部)程序内部的堆栈。

进入程序后,堆栈指针寄存器(%esp)指向调用指令在堆栈上存储的返回地址(即紧随其后的指令的地址)的电话)。如果您现在调用 ret,该地址将从堆栈弹出到 %eip(指令指针),并且代码将从该地址( 调用之后的下一条指令)。但我们还没有回来,不是吗? ;-)

然后,您按下 %ebp 寄存器将其先前的值保存在某处,并且不会丢失它,因为您很快就会将其用于某些用途。 (顺便说一句,它通常包含调用函数的基指针,当您查看该值时,您会发现之前存储的 %ebp,它又是该函数一级的基指针更高,因此您可以通过这种方式跟踪调用堆栈。)当您保存 %ebp 时,您可以将当前的 %esp (堆栈指针)存储在那里,以便%ebp 将指向相同的地址:当前本地堆栈的基址。当您在堆栈上压入和弹出值或保留 & 时,%esp 将在过程内来回移动。释放局部变量。但是 %ebp 将保持固定,仍然指向本地堆栈帧的基址。

访问参数

由调用者传递给过程的参数“埋藏在地下”(也就是说,它们相对于基址有偏移量,因为堆栈向下增长)。 %ebp 中有本地堆栈基址的地址,其中包含 %ebp 的先前值。在其下方(即 4(%ebp) 处)是返回地址。因此第一个参数将位于 8(%ebp) 处,第二个参数位于 12(%ebp) 等等

局部变量

局部变量可以分配在基址之上的堆栈上(也就是说,它们相对于基址有偏移量)。只需将 %esp 减去 N,您就可以通过将堆栈顶部移至上方(或者准确地说,下方)来在堆栈上为局部变量分配 N 个字节。 ) 这个区域 :-) 您可以通过相对于 %ebp偏移量来引用该区域,即 -4(%ebp) 是第一个字,-8(%ebp) 是第二个,等等。请记住,(%ebp) 指向本地堆栈的基址,即前面的 %ebp 值已保存。因此,在程序结束时尝试通过 pop %ebp 恢复 %ebp 之前,请记住将堆栈恢复到之前的位置。您可以通过两种方式完成:
1. 您可以通过将 N 添加回 %esp(堆栈指针)来仅释放局部变量,即移动堆栈顶部,就好像这些局部变量一样变量从未存在过。 (好吧,它们的值将保留在堆栈上,但它们将被视为“已释放”,并且可能会被后续推送覆盖,因此引用它们不再安全。它们是死尸;-J)
2. 您可以将堆栈刷新到地面并释放所有本地空间,只需从之前已固定到基址的 %ebp 恢复 %esp堆。它将把堆栈指针恢复到进入过程并将%esp保存到%ebp后的状态。这就像当你弄乱了某些东西时加载以前保存的游戏;-)

关闭帧指针

通过添加开关 -fomit-frame 可以从 gcc -S 获得不那么混乱的程序集-指针。它告诉 GCC 不要汇编任何用于设置/重置堆栈帧的代码,直到确实需要它为止。请记住,它可能会让调试器感到困惑,因为它们通常依赖于堆栈帧来跟踪调用堆栈。但如果您不需要调试这个二进制文件,它不会破坏任何东西。它非常适合发布目标,并且节省了一些空间。

调用帧信息

有时您会遇到一些奇怪的汇编指令,这些指令从 .cfi 开始,与函数头交错。这就是所谓的调用帧信息。调试器使用它来跟踪函数调用。但它也用于高级语言中的异常处理,这需要堆栈展开和其他基于调用堆栈的操作。您也可以在程序集中通过添加开关 -fno-dwarf2-cfi-asm 来关闭它。这告诉 GCC 使用普通的旧标签而不是那些奇怪的 .cfi 指令,并在程序集末尾添加一个特殊的数据结构,引用这些标签。这不会关闭 CFI,只是将格式更改为更“透明”的格式:然后程序员就可以看到 CFI 表。

Stack frames

The code at the beginning of the function body:

push  %ebp
mov   %esp, %ebp

is to create the so-called stack frame, which is a "solid ground" for referencing parameters and objects local to the procedure. The %ebp register is used (as its name indicates) as a base pointer, which points to the base (or bottom) of the local stack inside the procedure.

After entering the procedure, the stack pointer register (%esp) points to the return address stored on the stack by the call instruction (it is the address of the instruction just after the call). If you'd just invoke ret now, this address would be popped from the stack into the %eip (instruction pointer) and the code would execute further from that address (of the next instruction after the call). But we don't return yet, do we? ;-)

You then push %ebp register to save its previous value somewhere and not lose it, because you'll use it for something shortly. (BTW, it usually contains the base pointer of the caller function, and when you peek that value, you'll find a previously stored %ebp, which would be again a base pointer of the function one level higher, so you can trace the call stack that way.) When you save the %ebp, you can then store the current %esp (stack pointer) there, so that %ebp will point to the same address: the base of the current local stack. The %esp will move back and forth inside the procedure when you'll be pushing and popping values on the stack or reserving & freeing local variables. But %ebp will stay fixed, still pointing to the base of the local stack frame.

Accessing parameters

Parameters passed to the procedure by the caller are "burried just uner the ground" (that is, they have positive offsets relative to the base, because stack grows down). You have in %ebp the address of the base of the local stack, where lies the previous value of the %ebp. Below it (that is, at 4(%ebp) lies the return address. So the first parameter will be at 8(%ebp), the second at 12(%ebp) and so on.

Local variables

And local variables could be allocated on the stack above the base (that is, they'd have negative offsets relative to the base). Just subtract N to the %esp and you've just allocated N bytes on the stack for local variables, by moving the top of the stack above (or, precisely, below) this region :-) You can refer to this area by negative offsets relative to %ebp, i.e. -4(%ebp) is the first word, -8(%ebp) is second etc. Remember that (%ebp) points to the base of the local stack, where the previous %ebp value has been saved. So remember to restore the stack to the previous position before you try to restore the %ebp through pop %ebp at the end of the procedure. You can do it two ways:
1. You can free only the local variables by adding back the N to the %esp (stack pointer), that is, moving the top of the stack as if these local variables had never been there. (Well, their values will stay on the stack, but they'll be considered "freed" and could be overwritten by subsequent pushes, so it's no longer safe to refer them. They're dead bodies ;-J )
2. You can flush the stack down to the ground and free all local space by simply restoring the %esp from the %ebp which has been fixed earlier to the base of the stack. It'll restore the stack pointer to the state it has just after entering the procedure and saving the %esp into %ebp. It's like loading the previously saved game when you've messed something ;-)

Turning off frame pointers

It's possible to have a less messy assembly from gcc -S by adding a switch -fomit-frame-pointer. It tells GCC to not assemble any code for setting/resetting the stack frame until it's really needed for something. Just remember that it can confuse debuggers, because they usually depend on the stack frame being there to be able to track up the call stack. But it won't break anything if you don't need to debug this binary. It's perfectly fine for release targets and it saves some spacetime.

Call Frame Information

Sometimes you can meet some strange assembler directives starting from .cfi interleaved with the function header. This is a so-called Call Frame Information. It's used by debuggers to track the function calls. But it's also used for exception handling in high-level languages, which needs stack unwinding and other call-stack-based manipulations. You can turn it off too in your assembly, by adding a switch -fno-dwarf2-cfi-asm. This tells the GCC to use plain old labels instead of those strange .cfi directives, and it adds a special data structures at the end of your assembly, refering to those labels. This doesn't turn off the CFI, just changes the format to more "transparent" one: the CFI tables are then visible to the programmer.

笑饮青盏花 2024-10-20 05:40:48

你的解释做得很好。当调用函数时,返回地址会自动压入堆栈,这就是为什么第一个参数 argc 被压回到 4(%esp)。 argv 将从 8(%esp) 开始,每个参数都有一个指针,后跟一个空指针。此函数将 %esp 的旧值推送到堆栈,以便它可以在返回时包含原始的未对齐值。返回时 %ecx 的值并不重要,这就是为什么它被用作 %esp 引用的临时存储。除此之外,你的一切都是正确的。

You did pretty good with your interpretation. When a function is called, the return address is automatically pushed to the stack, which is why argc, the first argument, has been pushed back to 4(%esp). argv would start at 8(%esp), with a pointer for each argument, followed by a null pointer. This function pushes the old value of %esp to the stack so that it can contain the original, unaligned value upon returned. The value of %ecx at return doesn't matter, which is why it is used as temporary storage for the %esp reference. Other than that, you are correct with everything.

情愿 2024-10-20 05:40:48

关于你的第一个问题(命令行参数存储在哪里),函数的参数就在ebp之前。我必须说,你的“真正的”主要从 < main + 10 >,其中推送 ebp 并将 esp 移动到 ebp。我认为 gcc 将所有 lea 搞乱了,只是为了替换函数调用之前和之后 esp 上的常规操作(上瘾和减法)。通常一个例程看起来像这样(我作为示例执行的简单函数):

   0x080483b4 <+0>:     push   %ebp     
   0x080483b5 <+1>:     mov    %esp,%ebp
   0x080483b7 <+3>:     sub    $0x10,%esp            # room for local variables
   0x080483ba <+6>:     mov    0xc(%ebp),%eax        # get arg2
   0x080483bd <+9>:     mov    0x8(%ebp),%edx        # and arg1
   0x080483c0 <+12>:    lea    (%edx,%eax,1),%eax    # just add them
   0x080483c3 <+15>:    mov    %eax,-0x4(%ebp)       # store in local var
   0x080483c6 <+18>:    mov    -0x4(%ebp),%eax       # and return the sum
   0x080483c9 <+21>:    leave
   0x080483ca <+22>:    ret 

也许您已经启用了一些优化,这可能会使代码变得更加棘手。
最后是的,返回值存储在eax中。无论如何,你的解释是非常正确的。

Regarding your first question (where are stored the command line arguments), arguments to functions are right before ebp. I must say, your "real" main begins at < main + 10 >, where it pushes ebp and moves esp to ebp. I think that gcc messes everything up with all that leas just to replace the usual operations (addictions and subtractions) on esp before and after functions call. Usually a routine looks like this (simple function I did as an example):

   0x080483b4 <+0>:     push   %ebp     
   0x080483b5 <+1>:     mov    %esp,%ebp
   0x080483b7 <+3>:     sub    $0x10,%esp            # room for local variables
   0x080483ba <+6>:     mov    0xc(%ebp),%eax        # get arg2
   0x080483bd <+9>:     mov    0x8(%ebp),%edx        # and arg1
   0x080483c0 <+12>:    lea    (%edx,%eax,1),%eax    # just add them
   0x080483c3 <+15>:    mov    %eax,-0x4(%ebp)       # store in local var
   0x080483c6 <+18>:    mov    -0x4(%ebp),%eax       # and return the sum
   0x080483c9 <+21>:    leave
   0x080483ca <+22>:    ret 

Perhaps you've enabled some optimizations, which could make the code trickier.
Finally yes, the return value is stored in eax. Your interpretation is quite correct anyway.

围归者 2024-10-20 05:40:48

我认为从您最初的问题中唯一突出的是为什么您的代码中存在以下语句:

0x08048381 <main+13>:   push   %ecx
0x08048382 <main+14>:   mov    $0x6,%eax
0x08048387 <main+19>:   pop    %ecx

The push and pop of %ecx at and 似乎没有多大意义 - 并且它们在这个示例中并没有真正执行任何操作,但请考虑您的代码调用函数调用的情况>。

系统无法保证对其他函数的调用(这些函数将设置自己的堆栈激活帧)不会重置寄存器值。事实上他们可能会的。因此,代码在堆栈上设置了一个保存的寄存器部分,其中代码使用的任何寄存器(除了 %esp 和 %ebp 之外,它们已经通过常规堆栈设置保存)之前都存储在堆栈中可能将控制权移交给当前代码块的“核心”中的函数调用。

当这些潜在的调用返回时,系统会将这些值从堆栈中弹出以恢复调用前的寄存器值。如果您直接编写汇编程序而不是编译,您将负责自己存储和检索这些寄存器值。

然而,在您的示例代码中,没有函数调用 - 只有 处的一条指令,您在其中设置返回值,但编译器无法知道并照常保留其寄存器。


如果您在 之后添加将其他值推送到堆栈的 C 语句,那么看看这里会发生什么将会很有趣。如果我对这是堆栈的保存的寄存器部分的看法是正确的,那么您会期望编译器在 pop 语句19> 以清除这些值。

The only thing I think that's outstanding from your original questions is why the following statements exist in your code:

0x08048381 <main+13>:   push   %ecx
0x08048382 <main+14>:   mov    $0x6,%eax
0x08048387 <main+19>:   pop    %ecx

The push and pop of %ecx at <main+13> and <main+19> don't seem to make much sense - and they don't really do anything in this example, but consider the case where your code invokes function calls.

There's no way for the system to guarantee that the calls to other functions - which will set up their own stack activation frames - won't reset register values. In fact they probably will. The code therefore sets up a saved register section on the stack where any registers used by the code (other than %esp and %ebp which are already saved though the regular stack setup) are stored in the stack before possibly handing control over to function calls in the "meat" of the current code block.

When these potential calls return, the system then pops the values off the stack to restore the pre-call register values. If you were writing assembler directly rather than compiling, you'd be responsible for storing and retrieving these register values, yourself.

In the case of your example code, however, there are no function calls - only a single instruction at <main+14> where you're setting the return value, but the compiler can't know that, and preserves its registers as usual.


It would be interesting to see what would happen here if you added C statements which pushed other values onto the stack after <main+14>. If I'm right about this being a saved register section of the stack, you'd expect the compiler to insert automatic pop statements prior to <main+19> in order to clear these values.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文