了解简单 C 程序生成的汇编代码

发布于 2024-09-18 03:56:34 字数 1847 浏览 9 评论 0原文

我试图通过使用 gdb 的反汇编程序检查简单 C 程序的汇编级代码。

以下是 C 代码：

#include <stdio.h>

void function(int a, int b, int c) {
   char buffer1[5];
   char buffer2[10];
}

void main() {
  function(1,2,3);
}

以下是 main 和 function 的反汇编代码

gdb) disass main
Dump of assembler code for function main:
0x08048428 <main+0>:    push   %ebp
0x08048429 <main+1>:    mov    %esp,%ebp
0x0804842b <main+3>:    and    $0xfffffff0,%esp
0x0804842e <main+6>:    sub    $0x10,%esp
0x08048431 <main+9>:    movl   $0x3,0x8(%esp)
0x08048439 <main+17>:   movl   $0x2,0x4(%esp)
0x08048441 <main+25>:   movl   $0x1,(%esp)
0x08048448 <main+32>:   call   0x8048404 <function>
0x0804844d <main+37>:   leave  
0x0804844e <main+38>:   ret
End of assembler dump.

(gdb) disass function
Dump of assembler code for function function:
0x08048404 <function+0>:    push   %ebp
0x08048405 <function+1>:    mov    %esp,%ebp
0x08048407 <function+3>:    sub    $0x28,%esp
0x0804840a <function+6>:    mov    %gs:0x14,%eax
0x08048410 <function+12>:   mov    %eax,-0xc(%ebp)
0x08048413 <function+15>:   xor    %eax,%eax
0x08048415 <function+17>:   mov    -0xc(%ebp),%eax
0x08048418 <function+20>:   xor    %gs:0x14,%eax
0x0804841f <function+27>:   je     0x8048426 <function+34>
0x08048421 <function+29>:   call   0x8048340 <__stack_chk_fail@plt>
0x08048426 <function+34>:   leave  
0x08048427 <function+35>:   ret    
End of assembler dump.

我正在寻找以下问题的答案：

寻址是如何工作的，我的意思是（main+0 ) , (main+1), (main+3)
在 main 中，为什么使用 $0xfffffff0,%esp
在函数中，为什么使用 %gs:0x14,%eax , %eax,-0xc(%ebp)用过的。
如果有人可以解释一步一步发生的情况，那将不胜感激。

原文

I am trying to understand the assembly level code for a simple C program by inspecting it with gdb's disassembler.

Following is the C code:

#include <stdio.h>

void function(int a, int b, int c) {
   char buffer1[5];
   char buffer2[10];
}

void main() {
  function(1,2,3);
}

Following is the disassembly code for both main and function

gdb) disass main
Dump of assembler code for function main:
0x08048428 <main+0>:    push   %ebp
0x08048429 <main+1>:    mov    %esp,%ebp
0x0804842b <main+3>:    and    $0xfffffff0,%esp
0x0804842e <main+6>:    sub    $0x10,%esp
0x08048431 <main+9>:    movl   $0x3,0x8(%esp)
0x08048439 <main+17>:   movl   $0x2,0x4(%esp)
0x08048441 <main+25>:   movl   $0x1,(%esp)
0x08048448 <main+32>:   call   0x8048404 <function>
0x0804844d <main+37>:   leave  
0x0804844e <main+38>:   ret
End of assembler dump.

(gdb) disass function
Dump of assembler code for function function:
0x08048404 <function+0>:    push   %ebp
0x08048405 <function+1>:    mov    %esp,%ebp
0x08048407 <function+3>:    sub    $0x28,%esp
0x0804840a <function+6>:    mov    %gs:0x14,%eax
0x08048410 <function+12>:   mov    %eax,-0xc(%ebp)
0x08048413 <function+15>:   xor    %eax,%eax
0x08048415 <function+17>:   mov    -0xc(%ebp),%eax
0x08048418 <function+20>:   xor    %gs:0x14,%eax
0x0804841f <function+27>:   je     0x8048426 <function+34>
0x08048421 <function+29>:   call   0x8048340 <__stack_chk_fail@plt>
0x08048426 <function+34>:   leave  
0x08048427 <function+35>:   ret    
End of assembler dump.

I am seeking answers for following things :

how the addressing is working , I mean (main+0) , (main+1), (main+3)
In the main, why is $0xfffffff0,%esp being used
In the function, why is %gs:0x14,%eax , %eax,-0xc(%ebp) being used.
If someone can explain , step by step happening, that will be greatly appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

瀞厅☆埖开 2024-09-25 03:56:34

出现“奇怪”地址的原因，例如 main+0、main+1、main+3、main+6< /code> 等等，是因为每条指令占用的字节数是可变的。例如：

main+0: push %ebp

是一条单字节指令，因此下一条指令位于 main+1。另一方面，

main+3: and $0xfffffff0,%esp

它是一个三字节指令，因此之后的下一条指令位于 main+6。

而且，由于您在评论中询问为什么 movl 似乎采用可变数量的字节，因此解释如下。

指令长度不仅取决于操作码（例如movl），还取决于操作数的寻址模式（操作码是运行）。我没有专门检查您的代码，但我怀疑

movl $0x1,(%esp)

指令可能更短，因为不涉及偏移量 - 它只是使用 esp 作为地址。而像这样的东西：

movl $0x2,0x4(%esp)

需要 movl $0x1,(%esp) 所做的一切，加上偏移 0x4 的额外字节。

事实上，这里的调试会话显示了我的意思：

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

c:\pax> debug
-a
0B52:0100 mov word ptr [di],7
0B52:0104 mov word ptr [di+2],8
0B52:0109 mov word ptr [di+0],7
0B52:010E
-u100,10d
0B52:0100 C7050700      MOV     WORD PTR [DI],0007
0B52:0104 C745020800    MOV     WORD PTR [DI+02],0008
0B52:0109 C745000700    MOV     WORD PTR [DI+00],0007
-q
c:\pax> _

您可以看到带有偏移量的第二条指令实际上与没有偏移量的第一条指令不同。它长了一个字节（5 个字节而不是 4 个字节，用于保存偏移量），并且实际上具有不同的编码 c745 而不是 c705。

您还可以看到，可以用两种不同的方式对第一条和第三条指令进行编码，但它们基本上执行相同的操作。

and $0xfffffff0,%esp 指令是一种强制 esp 位于特定边界上的方法。这用于确保变量的正确对齐。现代处理器上的许多内存访问如果遵循对齐规则（例如 4 字节值必须与 4 字节边界对齐），将会更加高效。如果您不遵守这些规则，一些现代处理器甚至会引发故障。

执行此指令后，您可以保证 esp 小于或等于其先前的值并且与 16 字节边界对齐。

gs: 前缀仅意味着使用 gs 段寄存器来访问内存，而不是默认的。

指令mov %eax,-0xc(%ebp)表示将ebp寄存器的内容减去12(0xc)，然后将eax的值放入该内存位置。

重新解释一下代码。你的 function 函数基本上是一个大的无操作。生成的程序集仅限于堆栈帧设置和拆卸，以及使用上述 %gs:14 内存位置的一些堆栈帧损坏检查。

它将该位置的值（可能类似于 0xdeadbeef）加载到堆栈帧中，完成其工作，然后检查堆栈以确保它没有被损坏。

在这种情况下，它的工作就没什么了。所以你看到的只是功能管理的东西。

堆栈建立发生在 function+0 和 function+12 之间。之后的一切都是在 eax 中设置返回代码并拆除堆栈帧，包括损坏检查。

同样，main由栈帧设置、推送function参数、调用function、拆除栈帧并退出组成。

注释已插入到下面的代码中：

0x08048428 <main+0>:    push   %ebp                 ; save previous value.
0x08048429 <main+1>:    mov    %esp,%ebp            ; create new stack frame.
0x0804842b <main+3>:    and    $0xfffffff0,%esp     ; align to boundary.
0x0804842e <main+6>:    sub    $0x10,%esp           ; make space on stack.

0x08048431 <main+9>:    movl   $0x3,0x8(%esp)       ; push values for function.
0x08048439 <main+17>:   movl   $0x2,0x4(%esp)
0x08048441 <main+25>:   movl   $0x1,(%esp)
0x08048448 <main+32>:   call   0x8048404 <function> ; and call it.

0x0804844d <main+37>:   leave                       ; tear down frame.
0x0804844e <main+38>:   ret                         ; and exit.

0x08048404 <func+0>:    push   %ebp                 ; save previous value.
0x08048405 <func+1>:    mov    %esp,%ebp            ; create new stack frame.
0x08048407 <func+3>:    sub    $0x28,%esp           ; make space on stack.
0x0804840a <func+6>:    mov    %gs:0x14,%eax        ; get sentinel value.
0x08048410 <func+12>:   mov    %eax,-0xc(%ebp)      ; put on stack.

0x08048413 <func+15>:   xor    %eax,%eax            ; set return code 0.

0x08048415 <func+17>:   mov    -0xc(%ebp),%eax      ; get sentinel from stack.
0x08048418 <func+20>:   xor    %gs:0x14,%eax        ; compare with actual.
0x0804841f <func+27>:   je     <func+34>            ; jump if okay.
0x08048421 <func+29>:   call   <_stk_chk_fl>        ; otherwise corrupted stack.
0x08048426 <func+34>:   leave                       ; tear down frame.
0x08048427 <func+35>:   ret                         ; and exit.

我认为 %gs:0x14 的原因可能从上面显而易见，但为了以防万一，我将在这里详细说明。

它使用这个值（哨兵）来放入当前堆栈帧，这样，如果函数中的某些内容做了一些愚蠢的事情，例如将 1024 字节写入堆栈上创建的 20 字节数组，或者在您的情况下：

char buffer1[5];
strcpy (buffer1, "Hello there, my name is Pax.");

那么哨兵将被覆盖，函数末尾的检查将检测到这一点，调用失败函数让您知道，然后可能中止以避免任何其他问题。

如果它将 0xdeadbeef 放入堆栈中并将其更改为其他内容，则与 0xdeadbeef 的 xor 将产生一个非零值，即使用 je 指令在代码中检测到。

相关部分解释如下：

          mov    %gs:0x14,%eax     ; get sentinel value.
          mov    %eax,-0xc(%ebp)   ; put on stack.

          ;; Weave your function
          ;;   magic here.

          mov    -0xc(%ebp),%eax   ; get sentinel back from stack.
          xor    %gs:0x14,%eax     ; compare with original value.
          je     stack_ok          ; zero/equal means no corruption.
          call   stack_bad         ; otherwise corrupted stack.
stack_ok: leave                    ; tear down frame.

The reason for the "strange" addresses such as main+0, main+1, main+3, main+6 and so on, is because each instruction takes up a variable number of bytes. For example:

main+0: push %ebp

is a one-byte instruction so the next instruction is at main+1. On the other hand,

main+3: and $0xfffffff0,%esp

is a three-byte instruction so the next instruction after that is at main+6.

And, since you ask in the comments why movl seems to take a variable number of bytes, the explanation for that is as follows.

Instruction length depends not only on the opcode (such as movl) but also the addressing modes for the operands as well (the things the opcode are operating on). I haven't checked specifically for your code but I suspect the

movl $0x1,(%esp)

instruction is probably shorter because there's no offset involved - it just uses esp as the address. Whereas something like:

movl $0x2,0x4(%esp)

requires everything that movl $0x1,(%esp) does, plus an extra byte for the offset 0x4.

In fact, here's a debug session showing what I mean:

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

c:\pax> debug
-a
0B52:0100 mov word ptr [di],7
0B52:0104 mov word ptr [di+2],8
0B52:0109 mov word ptr [di+0],7
0B52:010E
-u100,10d
0B52:0100 C7050700      MOV     WORD PTR [DI],0007
0B52:0104 C745020800    MOV     WORD PTR [DI+02],0008
0B52:0109 C745000700    MOV     WORD PTR [DI+00],0007
-q
c:\pax> _

You can see that the second instruction with an offset is actually different to the first one without it. It's one byte longer (5 bytes instead of 4, to hold the offset) and actually has a different encoding c745 instead of c705.

You can also see that you can encode the first and third instruction in two different ways but they basically do the same thing.

The and $0xfffffff0,%esp instruction is a way to force esp to be on a specific boundary. This is used to ensure proper alignment of variables. Many memory accesses on modern processors will be more efficient if they follow the alignment rules (such as a 4-byte value having to be aligned to a 4-byte boundary). Some modern processors will even raise a fault if you don't follow these rules.

After this instruction, you're guaranteed that esp is both less than or equal to its previous value and aligned to a 16 byte boundary.

The gs: prefix simply means to use the gs segment register to access memory rather than the default.

The instruction mov %eax,-0xc(%ebp) means to take the contents of the ebp register, subtract 12 (0xc) and then put the value of eax into that memory location.

Re the explanation of the code. Your function function is basically one big no-op. The assembly generated is limited to stack frame setup and teardown, along with some stack frame corruption checking which uses the afore-mentioned %gs:14 memory location.

It loads the value from that location (probably something like 0xdeadbeef) into the stack frame, does its job, then checks the stack to ensure it hasn't been corrupted.

Its job, in this case, is nothing. So all you see is the function administration stuff.

Stack set-up occurs between function+0 and function+12. Everything after that is setting up the return code in eax and tearing down the stack frame, including the corruption check.

Similarly, main consist of stack frame set-up, pushing the parameters for function, calling function, tearing down the stack frame and exiting.

Comments have been inserted into the code below:

0x08048428 <main+0>:    push   %ebp                 ; save previous value.
0x08048429 <main+1>:    mov    %esp,%ebp            ; create new stack frame.
0x0804842b <main+3>:    and    $0xfffffff0,%esp     ; align to boundary.
0x0804842e <main+6>:    sub    $0x10,%esp           ; make space on stack.

0x08048431 <main+9>:    movl   $0x3,0x8(%esp)       ; push values for function.
0x08048439 <main+17>:   movl   $0x2,0x4(%esp)
0x08048441 <main+25>:   movl   $0x1,(%esp)
0x08048448 <main+32>:   call   0x8048404 <function> ; and call it.

0x0804844d <main+37>:   leave                       ; tear down frame.
0x0804844e <main+38>:   ret                         ; and exit.

0x08048404 <func+0>:    push   %ebp                 ; save previous value.
0x08048405 <func+1>:    mov    %esp,%ebp            ; create new stack frame.
0x08048407 <func+3>:    sub    $0x28,%esp           ; make space on stack.
0x0804840a <func+6>:    mov    %gs:0x14,%eax        ; get sentinel value.
0x08048410 <func+12>:   mov    %eax,-0xc(%ebp)      ; put on stack.

0x08048413 <func+15>:   xor    %eax,%eax            ; set return code 0.

0x08048415 <func+17>:   mov    -0xc(%ebp),%eax      ; get sentinel from stack.
0x08048418 <func+20>:   xor    %gs:0x14,%eax        ; compare with actual.
0x0804841f <func+27>:   je     <func+34>            ; jump if okay.
0x08048421 <func+29>:   call   <_stk_chk_fl>        ; otherwise corrupted stack.
0x08048426 <func+34>:   leave                       ; tear down frame.
0x08048427 <func+35>:   ret                         ; and exit.

I think the reason for the %gs:0x14 may be evident from above but, just in case, I'll elaborate here.

It uses this value (a sentinel) to put in the current stack frame so that, should something in the function do something silly like write 1024 bytes to a 20-byte array created on the stack or, in your case:

char buffer1[5];
strcpy (buffer1, "Hello there, my name is Pax.");

then the sentinel will be overwritten and the check at the end of the function will detect that, calling the failure function to let you know, and then probably aborting so as to avoid any other problems.

If it placed 0xdeadbeef onto the stack and this was changed to something else, then an xor with 0xdeadbeef would produce a non-zero value which is detected in the code with the je instruction.

The relevant bit is paraphrased here:

          mov    %gs:0x14,%eax     ; get sentinel value.
          mov    %eax,-0xc(%ebp)   ; put on stack.

          ;; Weave your function
          ;;   magic here.

          mov    -0xc(%ebp),%eax   ; get sentinel back from stack.
          xor    %gs:0x14,%eax     ; compare with original value.
          je     stack_ok          ; zero/equal means no corruption.
          call   stack_bad         ; otherwise corrupted stack.
stack_ok: leave                    ; tear down frame.

回复收藏 0 原文

じ违心 2024-09-25 03:56:34

帕克斯给出了明确的答案。然而，为了完整起见，我想添加一个关于让 GCC 本身向您展示它生成的程序集的注释。

GCC 的 -S 选项告诉它停止编译并将程序集写入文件。通常，它要么将该文件传递给汇编器，要么对于某些目标本身直接写入目标文件。

对于问题中的示例代码：

#include <stdio.h>

void function(int a, int b, int c) {
   char buffer1[5];
   char buffer2[10];
}

void main() {
  function(1,2,3);
}

命令 gcc -S q3654898.c 创建一个名为 q3654898.s 的文件：

        .file   "q3654898.c"
        .text
.globl _function
        .def    _function;      .scl    2;      .type   32;     .endef
_function:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $40, %esp
        leave
        ret
        .def    ___main;        .scl    2;      .type   32;     .endef
.globl _main
        .def    _main;  .scl    2;      .type   32;     .endef
_main:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $24, %esp
        andl    $-16, %esp
        movl    $0, %eax
        addl    $15, %eax
        addl    $15, %eax
        shrl    $4, %eax
        sall    $4, %eax
        movl    %eax, -4(%ebp)
        movl    -4(%ebp), %eax
        call    __alloca
        call    ___main
        movl    $3, 8(%esp)
        movl    $2, 4(%esp)
        movl    $1, (%esp)
        call    _function
        leave
        ret

显而易见的一件事是我的 GCC (gcc (GCC) 3.4.5 (mingw -vista Special r3)) 默认不包含堆栈检查代码。我想象有一个命令行选项，或者如果我抽出时间将我的 MinGW 安装升级到更新的 GCC，它就可以了。

编辑：在 Pax 的推动下，这是让 GCC 完成更多工作的另一种方法。

C:\Documents and Settings\Ross\My Documents\testing>gcc -Wa,-al q3654898.c
q3654898.c: In function `main':
q3654898.c:8: warning: return type of 'main' is not `int'
GAS LISTING C:\DOCUME~1\Ross\LOCALS~1\Temp/ccLg8pWC.s                   page 1


   1                            .file   "q3654898.c"
   2                            .text
   3                    .globl _function
   4                            .def    _function;      .scl    2;      .type
32;     .endef
   5                    _function:
   6 0000 55                    pushl   %ebp
   7 0001 89E5                  movl    %esp, %ebp
   8 0003 83EC28                subl    $40, %esp
   9 0006 C9                    leave
  10 0007 C3                    ret
  11                            .def    ___main;        .scl    2;      .type
32;     .endef
  12                    .globl _main
  13                            .def    _main;  .scl    2;      .type   32;
.endef
  14                    _main:
  15 0008 55                    pushl   %ebp
  16 0009 89E5                  movl    %esp, %ebp
  17 000b 83EC18                subl    $24, %esp
  18 000e 83E4F0                andl    $-16, %esp
  19 0011 B8000000              movl    $0, %eax
  19      00
  20 0016 83C00F                addl    $15, %eax
  21 0019 83C00F                addl    $15, %eax
  22 001c C1E804                shrl    $4, %eax
  23 001f C1E004                sall    $4, %eax
  24 0022 8945FC                movl    %eax, -4(%ebp)
  25 0025 8B45FC                movl    -4(%ebp), %eax
  26 0028 E8000000              call    __alloca
  26      00
  27 002d E8000000              call    ___main
  27      00
  28 0032 C7442408              movl    $3, 8(%esp)
  28      03000000
  29 003a C7442404              movl    $2, 4(%esp)
  29      02000000
  30 0042 C7042401              movl    $1, (%esp)
  30      000000
  31 0049 E8B2FFFF              call    _function
  31      FF
  32 004e C9                    leave
  33 004f C3                    ret

C:\Documents and Settings\Ross\My Documents\testing>

在这里我们看到汇编器生成的输出列表。（它的名字是 GAS，因为它是经典 *nix 汇编器 as 的 Gnu 版本。这其中有一些幽默之处。）

每一行都包含以下大部分字段：编号、当前节中的地址、该地址存储的字节以及汇编源文件中的源文本。
地址是该模块提供的每个部分的该部分的偏移量。该特定模块仅在存储可执行代码的 .text 部分中包含内容。您通常还会发现名为 .data 和 .bss 的部分。使用了许多其他名称，其中一些有特殊用途。如果您确实想了解，请阅读链接器的手册。

Pax has produced a definitive answer. However, for completeness, I thought I'd add a note on getting GCC itself to show you the assembly it generates.

The -S option to GCC tells it to stop compilation and write the assembly to a file. Normally, it either passes that file to the assembler or for some targets writes the object file directly itself.

For the sample code in the question:

#include <stdio.h>

void function(int a, int b, int c) {
   char buffer1[5];
   char buffer2[10];
}

void main() {
  function(1,2,3);
}

the command gcc -S q3654898.c creates a file named q3654898.s:

        .file   "q3654898.c"
        .text
.globl _function
        .def    _function;      .scl    2;      .type   32;     .endef
_function:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $40, %esp
        leave
        ret
        .def    ___main;        .scl    2;      .type   32;     .endef
.globl _main
        .def    _main;  .scl    2;      .type   32;     .endef
_main:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $24, %esp
        andl    $-16, %esp
        movl    $0, %eax
        addl    $15, %eax
        addl    $15, %eax
        shrl    $4, %eax
        sall    $4, %eax
        movl    %eax, -4(%ebp)
        movl    -4(%ebp), %eax
        call    __alloca
        call    ___main
        movl    $3, 8(%esp)
        movl    $2, 4(%esp)
        movl    $1, (%esp)
        call    _function
        leave
        ret

One thing that is evident is that my GCC (gcc (GCC) 3.4.5 (mingw-vista special r3)) doesn't include the stack check code by default. I imagine that there is a command line option, or that if I ever got around to nudging my MinGW install up to a more current GCC that it could.

Edit: Nudged to do so by Pax, here's another way to get GCC to do more of the work.

C:\Documents and Settings\Ross\My Documents\testing>gcc -Wa,-al q3654898.c
q3654898.c: In function `main':
q3654898.c:8: warning: return type of 'main' is not `int'
GAS LISTING C:\DOCUME~1\Ross\LOCALS~1\Temp/ccLg8pWC.s                   page 1


   1                            .file   "q3654898.c"
   2                            .text
   3                    .globl _function
   4                            .def    _function;      .scl    2;      .type
32;     .endef
   5                    _function:
   6 0000 55                    pushl   %ebp
   7 0001 89E5                  movl    %esp, %ebp
   8 0003 83EC28                subl    $40, %esp
   9 0006 C9                    leave
  10 0007 C3                    ret
  11                            .def    ___main;        .scl    2;      .type
32;     .endef
  12                    .globl _main
  13                            .def    _main;  .scl    2;      .type   32;
.endef
  14                    _main:
  15 0008 55                    pushl   %ebp
  16 0009 89E5                  movl    %esp, %ebp
  17 000b 83EC18                subl    $24, %esp
  18 000e 83E4F0                andl    $-16, %esp
  19 0011 B8000000              movl    $0, %eax
  19      00
  20 0016 83C00F                addl    $15, %eax
  21 0019 83C00F                addl    $15, %eax
  22 001c C1E804                shrl    $4, %eax
  23 001f C1E004                sall    $4, %eax
  24 0022 8945FC                movl    %eax, -4(%ebp)
  25 0025 8B45FC                movl    -4(%ebp), %eax
  26 0028 E8000000              call    __alloca
  26      00
  27 002d E8000000              call    ___main
  27      00
  28 0032 C7442408              movl    $3, 8(%esp)
  28      03000000
  29 003a C7442404              movl    $2, 4(%esp)
  29      02000000
  30 0042 C7042401              movl    $1, (%esp)
  30      000000
  31 0049 E8B2FFFF              call    _function
  31      FF
  32 004e C9                    leave
  33 004f C3                    ret

C:\Documents and Settings\Ross\My Documents\testing>

Here we see an output listing produced by the assembler. (Its name is GAS, because it is Gnu's version of the classic *nix assembler as. There's humor there somewhere.)

Each line has most of the following fields: a line number, an address in the current section, bytes stored at that address, and the source text from the assembly source file.
The addresses are offsets into that portion of each section provided by this module. This particular module only has content in the .text section which stores executable code. You will typically find mention of sections named .data and .bss as well. Lots of other names are used and some have special purposes. Read the manual for the linker if you really want to know.

回复收藏 0 原文

泡沫很甜 2024-09-25 03:56:34

最好尝试使用 gcc 的 -fno-stack-protector 标志来禁用金丝雀并查看结果。

回复收藏 0 原文

飘过的浮云 2024-09-25 03:56:34

我想补充一点，对于简单的东西，如果您打开一点优化，GCC 的汇编输出通常会更容易阅读。又是示例代码...

void function(int a, int b, int c) {
   char buffer1[5];
   char buffer2[10];
}

/* corrected calling convention of main() */
int main() {
   function(1,2,3);
   return 0;
}

这是我在没有优化的情况下得到的（OSX 10.6，gcc 4.2.1+Apple 补丁）

.globl _function
_function:
    pushl   %ebp
    movl    %esp, %ebp
    pushl   %ebx
    subl    $36, %esp
    call    L4
"L00000000001$pb":
L4:
    popl    %ebx
    leal    L___stack_chk_guard$non_lazy_ptr-"L00000000001$pb"(%ebx), %eax
    movl    (%eax), %eax
    movl    (%eax), %edx
    movl    %edx, -12(%ebp)
    xorl    %edx, %edx
    leal    L___stack_chk_guard$non_lazy_ptr-"L00000000001$pb"(%ebx), %eax
    movl    (%eax), %eax
    movl    -12(%ebp), %edx
    xorl    (%eax), %edx
    je      L3
    call    ___stack_chk_fail
L3:
    addl    $36, %esp
    popl    %ebx
    leave
    ret
.globl _main
_main:
    pushl   %ebp
    movl    %esp, %ebp
    subl    $24, %esp
    movl    $3, 8(%esp)
    movl    $2, 4(%esp)
    movl    $1, (%esp)
    call    _function
    movl    $0, %eax
    leave
    ret

唷，一口！但是看看在命令行上使用 -O 会发生什么...

    .text
.globl _function
_function:
    pushl   %ebp
    movl    %esp, %ebp
    leave
    ret
.globl _main
_main:
    pushl   %ebp
    movl    %esp, %ebp
    movl    $0, %eax
    leave
    ret

当然，您确实面临代码完全无法识别的风险，尤其是在更高的优化级别和更复杂的情况下。即使在这里，我们也看到对 function 的调用已被视为毫无意义而被丢弃。但我发现，不必阅读数十个不必要的堆栈溢出通常值得在控制流上多费点心思。

I'd like to add that for simple stuff, GCC's assembly output is often easier to read if you turn on a little optimization. Here's the sample code again...

void function(int a, int b, int c) {
   char buffer1[5];
   char buffer2[10];
}

/* corrected calling convention of main() */
int main() {
   function(1,2,3);
   return 0;
}

this is what I get without optimization (OSX 10.6, gcc 4.2.1+Apple patches)

.globl _function
_function:
    pushl   %ebp
    movl    %esp, %ebp
    pushl   %ebx
    subl    $36, %esp
    call    L4
"L00000000001$pb":
L4:
    popl    %ebx
    leal    L___stack_chk_guard$non_lazy_ptr-"L00000000001$pb"(%ebx), %eax
    movl    (%eax), %eax
    movl    (%eax), %edx
    movl    %edx, -12(%ebp)
    xorl    %edx, %edx
    leal    L___stack_chk_guard$non_lazy_ptr-"L00000000001$pb"(%ebx), %eax
    movl    (%eax), %eax
    movl    -12(%ebp), %edx
    xorl    (%eax), %edx
    je      L3
    call    ___stack_chk_fail
L3:
    addl    $36, %esp
    popl    %ebx
    leave
    ret
.globl _main
_main:
    pushl   %ebp
    movl    %esp, %ebp
    subl    $24, %esp
    movl    $3, 8(%esp)
    movl    $2, 4(%esp)
    movl    $1, (%esp)
    call    _function
    movl    $0, %eax
    leave
    ret

Whew, one heck of a mouthful! But look what happens with -O on the command line...

    .text
.globl _function
_function:
    pushl   %ebp
    movl    %esp, %ebp
    leave
    ret
.globl _main
_main:
    pushl   %ebp
    movl    %esp, %ebp
    movl    $0, %eax
    leave
    ret

Of course, you do run the risk of your code being rendered completely unrecognizable, especially at higher optimization levels and with more complicated stuff. Even here, we see that the call to function has been discarded as pointless. But I find that not having to read through dozens of unnecessary stack spills is generally more than worth a little extra scratching my head over the control flow.

回复收藏 0 原文

~没有更多了~