x86、win32 上空程序的 GCC 汇编输出

发布于 2024-08-02 04:34:19 字数 1677 浏览 8 评论 0原文

我编写空程序是为了惹恼 stackoverflow 程序员,而不是。我只是在探索 gnu 工具链。

现在,以下内容对我来说可能太深了,但为了继续空程序传奇,我已经开始检查 C 编译器的输出,即 GNU 所消耗的东西。

gcc version 4.4.0 (TDM-1 mingw32)

test.c:

int main()
{
    return 0;
}

gcc -S test.c

    .file   "test.c"
    .def    ___main;    .scl    2;  .type   32; .endef
    .text
.globl _main
    .def    _main;  .scl    2;  .type   32; .endef
_main:
    pushl   %ebp
    movl    %esp, %ebp
    andl    $-16, %esp
    call    ___main
    movl    $0, %eax
    leave
    ret 

你能解释一下这里发生了什么吗?这是我努力理解的。我使用了 as 手册和我最少的 x86 ASM 知识:

  • .file "test.c" 是逻辑文件名的指令。
  • .def:根据文档“开始定义符号名称的调试信息”。什么是符号(函数名/变量?)以及什么样的调试信息?
  • .scl:文档说“存储类可能会标记符号是静态的还是外部的”。这与我从 C 中了解到的静态外部相同吗?那“2”是什么?
  • .type:存储参数“作为符号表条目的类型属性”,我不知道。
  • .endef:没问题。
  • .text:现在这是有问题的,它似乎是所谓的部分,我已经读到它是代码的地方,但文档没有告诉我太多。
  • .globl “使符号对ld可见。”,手册对此说得很清楚。
  • _main:这可能是我的主函数的起始地址(?)
  • pushl_:长(32位)推送,将EBP放在堆栈上
  • movl:32位移动。伪-C:EBP = ESP;
  • andl:逻辑与。伪 C:ESP = -16 & ESP,我真的不明白这有什么意义。
  • call:将 IP 压入堆栈(以便被调用的过程可以找到返回的路)并在 __main 所在位置继续。 (什么是 __main?)
  • movl:这个零必须是我在代码末尾返回的常量。 MOV 将这个零放入 EAX 中。
  • leave:在 ENTER 指令(?)之后恢复堆栈。为什么?
  • ret:返回到堆栈中保存的指令地址

感谢您的帮助!

I write empty programs to annoy the hell out of stackoverflow coders, NOT. I am just exploring the gnu toolchain.

Now the following might be too deep for me, but to continuie the empty program saga I have started to examine the output of the C compiler, the stuff GNU as consumes.

gcc version 4.4.0 (TDM-1 mingw32)

test.c:

int main()
{
    return 0;
}

gcc -S test.c

    .file   "test.c"
    .def    ___main;    .scl    2;  .type   32; .endef
    .text
.globl _main
    .def    _main;  .scl    2;  .type   32; .endef
_main:
    pushl   %ebp
    movl    %esp, %ebp
    andl    $-16, %esp
    call    ___main
    movl    $0, %eax
    leave
    ret 

Can you explain what happens here? Here is my effort to understand it. I have used the as manual and my minimal x86 ASM knowledge:

  • .file "test.c" is the directive for the logical filename.
  • .def: according to the docs "Begin defining debugging information for a symbol name". What is a symbol (a function name/variable?) and what kind of debugging information?
  • .scl: docs say "Storage class may flag whether a symbol is static or external". Is this the same static and external I know from C? And what is that '2'?
  • .type: stores the parameter "as the type attribute of a symbol table entry", I have no clue.
  • .endef: no problem.
  • .text: Now this is problematic, it seems to be something called section and I have read that its the place for code, but the docs didn't tell me too much.
  • .globl "makes the symbol visible to ld.", the manual is quite clear on this.
  • _main: This might be the starting address (?) for my main function
  • pushl_: A long (32bit) push, which places EBP on the stack
  • movl: 32-bit move. Pseudo-C: EBP = ESP;
  • andl: Logical AND. Pseudo-C: ESP = -16 & ESP, I don't really see whats the point of this.
  • call: Pushes the IP to the stack (so the called procedure can find its way back) and continues where __main is. (what is __main?)
  • movl: this zero must be the constant I return at the end of my code. The MOV places this zero into EAX.
  • leave: restores stack after an ENTER instruction (?). Why?
  • ret: goes back to the instruction address that is saved on the stack

Thank you for your help!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

白云不回头 2024-08-09 04:34:19

.文件“test.c”

以 . 开头的命令是对汇编器的指令。这只是说这是“file.c”,该信息可以导出到exe的调试信息。

.def ___main; .scl 2;
.输入32; .endef

.def 指令定义调试符号。 scl 2 表示存储类别 2(外部存储类别)。type 32 表示此 sumbol 是一个函数。这些数字将由 pe-coff exe 格式定义。

___main 是一个名为的函数,负责处理 gcc 所需的引导(它将执行诸如运行 c++ 静态初始化程序和其他所需的内务处理之类的操作)。

<前><代码>.文本

开始文本部分 - 代码位于此处。

.globl_main

将 _main 符号定义为全局符号,这将使其对链接器和链接的其他模块可见。

<前><代码>.def _main; .scl 2; .输入32; .endef

与 _main 相同,创建调试符号,表明 _main 是一个函数。这可供调试器使用。

_主要:

开始一个新标签(它最终会是一个地址)。上面的 .globl 指令使该地址对其他实体可见。

<前><代码>pushl %ebp

将旧的帧指针(ebp 寄存器)保存在堆栈上(以便在该函数结束时可以将其放回原位)

movl %esp, %ebp

将堆栈指针移至 ebp 寄存器。 ebp通常被称为帧指针,它指向当前“帧”(通常是函数)内的堆栈顶部值,(通过ebp引用堆栈上的变量可以帮助调试器)

andl $-16,%esp

将堆栈与 fffffff0 相加,有效地将其对齐在 16 字节边界上。访问堆栈上对齐的值比访问未对齐的值要快得多。所有这些前面的指令几乎都是标准的函数序言。

call        ___main

调用 ___main 函数来初始化 gcc 需要的东西。 Call会将当前指令指针压入堆栈并跳转到___main的地址

movl $0, %eax

将 0 移至 eax 寄存器,(返回 0 中的 0;)eax 寄存器用于保存 stdcall 调用约定的函数返回值。

离开

离开指令几乎是简写

movl ebp,esp
流行的ebp

即它“撤消”在函数开始时完成的操作 - 将帧指针和堆栈恢复到以前的状态。

返回

返回调用该函数的人。它将从堆栈中弹出指令指针(相应的调用指令将放置在那里)并跳转到那里。

.file "test.c"

Commands starting with . are directives to the assembler. This just says this is "file.c", that information can be exported to the debugging information of the exe.

.def ___main; .scl 2;
.type 32; .endef

.def directives defines a debugging symbol. scl 2 means storage class 2(external storage class) .type 32 says this sumbol is a function. These numbers will be defined by the pe-coff exe-format

___main is a function called that takes care of bootstrapping that gcc needs(it'll do things like run c++ static initializers and other housekeeping needed).

.text

Begins a text section - code lives here.

.globl _main

defines the _main symbol as global, which will make it visible to the linker and to other modules that's linked in.

.def        _main;  .scl    2;      .type   32;     .endef

Same thing as _main , creates debugging symbols stating that _main is a function. This can be used by debuggers.

_main:

Starts a new label(It'll end up an address). the .globl directive above makes this address visible to other entities.

pushl       %ebp

Saves the old frame pointer(ebp register) on the stack (so it can be put back in place when this function ends)

movl        %esp, %ebp

Moves the stack pointer to the ebp register. ebp is often called the frame pointer, it points at the top of the stack values within the current "frame"(function usually), (referring to variables on the stack via ebp can help debuggers)

andl $-16, %esp

Ands the stack with fffffff0 which effectivly aligns it on a 16 byte boundary. Access to aligned values on the stack are much faster than if they were unaligned. All these preceding instructions are pretty much a standard function prologue.

call        ___main

Calls the ___main function which will do initializing stuff that gcc needs. Call will push the current instruction pointer on the stack and jump to the address of ___main

movl        $0, %eax

move 0 to the eax register,(the 0 in return 0;) the eax register is used to hold function return values for the stdcall calling convention.

leave

The leave instruction is pretty much shorthand for

movl     ebp,esp
popl     ebp

i.e. it "undos" the stuff done at the start of the function - restoring the frame pointer and stack to its former state.

ret

Returns to whoever called this function. It'll pop the instruction pointer from the stack (which a corresponding call instruction will have placed there) and jump there.

不疑不惑不回忆 2024-08-09 04:34:19

这里概述了一个非常相似的练习:http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax

你已经明白了大部分内容——我只是做一些额外的注释来强调和补充。

__main 是 GNU 标准库中的一个子例程,负责各种启动初始化。对于 C 程序来说,这并不是绝对必要的,但只是在 C 代码与 C++ 链接时才需要。

_main 是您的主子例程。由于 _main__main 都是代码位置,因此它们具有相同的存储类和类型。我还没有挖掘出 .scl.type 的定义。通过定义一些全局变量,您可能会得到一些启发。

前三个指令正在设置一个堆栈帧,这是子例程工作存储的技术术语——大部分是本地变量和临时变量。压入ebp保存调用者栈帧的基址。将 esp 放入 ebp 设置堆栈帧的基础。 andl 将堆栈帧与 16 字节边界对齐,以防堆栈上的任何局部变量需要 16 字节对齐(对于 x86 SIMD 指令需要这种对齐,但对齐确实可以加快普通类型的速度,例如intfloat

此时,您通常期望 esp 在内存中向下移动,为局部变量分配堆栈空间。 因此 gcc 不会打扰

__main 的调用对于主入口点来说是特殊的,通常不会出现在子例程中。

您的 main 没有, 推测寄存器 eax 是在二进制规范中放置整数返回代码的地方,leave 撤消堆栈帧,ret 返回调用者。在这种情况下,调用者是低级 C 运行时,它将执行额外的操作(例如调用 atexit() 函数、设置进程的退出代码并要求操作系统终止进程)。 。

There's a very similar exercise outlined here: http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax

You've figured out most of it -- I'll just make additional notes for emphasis and additions.

__main is a subroutine in the GNU standard library that takes care of various start-up initialization. It is not strictly necessary for C programs but is required just in case the C code is linking with C++.

_main is your main subroutine. As both _main and __main are code locations they have the same storage class and type. I've not yet dug up the definitions for .scl and .type yet. You may get some illumination by defining a few global variables.

The first three instructions are setting up a stack frame which is a technical term for the working storage of a subroutine -- local and temporary variables for the most part. Pushing ebp saves the base of the caller's stack frame. Putting esp into ebp sets the base of our stack frame. The andl aligns the stack frame to a 16 byte boundary just in case any local variables on the stack require 16 byte alignment (for the x86 SIMD instructions require that alignment, but alignment does speed up ordinary types such as ints and floats.

At this point you'd normally expect esp to get moved down in memory to allocate stack space for local variables. Your main has none so gcc doesn't bother.

The call to __main is special to the main entry point and won't typically appear in subroutines.

The rest goes as you surmised. Register eax is the place to put integer return codes in the binary spec. leave undoes the stack frame and ret goes back to the caller. In this case, the caller is the low-level C runtime which will do additional magic (like calling atexit() functions, set the exit code for the process and ask the operating system to terminate the process.

守望孤独 2024-08-09 04:34:19

关于 andl $-16,%esp

  • 32 位:十进制的 -16 等于十六进制表示的 0xfffffff0
  • 64 位:十进制的 -16 等于十六进制表示的 0xffffffffffffffff0

所以它将掩盖最后 4 ESP 位(顺便说一句:2**4 等于 16)并将保留所有其他位(无论目标系统是 32 位还是 64 位)。

Regarding that andl $-16,%esp

  • 32 bits: -16 in decimal equals to 0xfffffff0 in hexadecimal representation
  • 64 bits: -16 in decimal equals to 0xfffffffffffffff0 in hexadecimal representation

So it will mask off the last 4 bits of ESP (btw: 2**4 equals to 16) and will retain all other bits (no matter if the target system is 32 or 64 bits).

铃予 2024-08-09 04:34:19

除了 andl $-16,%esp 之外,这是有效的,因为将低位设置为零将始终调整 %esp 的值向下,在 x86 上堆栈向下增长。

Further to the andl $-16,%esp, this works because setting the low bits to zero will always adjust %esp down in value, and the stack grows downward on x86.

¢好甜 2024-08-09 04:34:19

我没有所有的答案,但我可以解释我所知道的。

函数使用ebp来存储esp在其流程中的初始状态,这是对传递给函数的参数在哪里以及它自己的局部变量在哪里的引用。函数所做的第一件事是保存给定 ebp 执行 pushl%ebp 的状态,这对于进行调用的函数至关重要,然后将其替换为它自己的当前堆栈位置 esp 执行 movl %esp, %ebp。此时将 ebp 的最后 4 位清零是 GCC 特有的,我不知道为什么这个编译器会这样做。不做它也会起作用。现在终于进入正题了,调用___main,__main是谁?我也不知道......也许更多的 GCC 特定过程,最后你的 main() 所做的唯一一件事是,使用 movl $0, %eax 和 Leave< 将返回值设置为 0 /code> 与 movl %ebp, %esp; 相同popl %ebp 恢复 ebp 状态,然后 ret 完成。 ret 弹出 eip 并从该点继续线程流,无论它在哪里(因为它的 main(),这个 ret 可能会导致一些处理 eip 结束的内核过程)程序)。

其中大部分都是关于管理堆栈的。我前段时间写了一篇关于如何使用堆栈的详细教程,解释为什么要制作所有这些东西会很有用。不过是葡萄牙语的...

I don't have all answers but I can explain what I know.

ebp is used by the function to store the initial state of esp during its flow, a reference to where are the arguments passed to the function and where are its own local variables. The first thing a function does is to save the status of the given ebp doing pushl %ebp, it is vital to the function that make the call, and than replaces it by its own current stack position esp doing movl %esp, %ebp. Zeroing the last 4 bits of ebp at this point is GCC specific, I don't know why this compiler does that. It would work without doing it. Now finally we go into business, call ___main, who is __main? I don't know either... maybe more GCC specific procedures, and finally the only thing your main() does, set return value as 0 with movl $0, %eax and leave which is the same as doing movl %ebp, %esp; popl %ebp to restore ebp state, then ret to finish. ret pops eip and continue thread flow from that point, wherever it is (as its the main(), this ret probably leads to some kernel procedure which handles the end of the program).

Most of it is all about managing the stack. I wrote a detailed tutorial about how stack is used some time ago, it would be useful to explain why all those things are made. But its in portuguese...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文