x86、win32 上空程序的 GCC 汇编输出
我编写空程序是为了惹恼 stackoverflow 程序员,而不是。我只是在探索 gnu 工具链。
现在,以下内容对我来说可能太深了,但为了继续空程序传奇,我已经开始检查 C 编译器的输出,即 GNU 所消耗的东西。
gcc version 4.4.0 (TDM-1 mingw32)
test.c:
int main()
{
return 0;
}
gcc -S test.c
.file "test.c"
.def ___main; .scl 2; .type 32; .endef
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
call ___main
movl $0, %eax
leave
ret
你能解释一下这里发生了什么吗?这是我努力理解的。我使用了 as
手册和我最少的 x86 ASM 知识:
.file "test.c"
是逻辑文件名的指令。.def
:根据文档“开始定义符号名称的调试信息”。什么是符号(函数名/变量?)以及什么样的调试信息?.scl
:文档说“存储类可能会标记符号是静态的还是外部的”。这与我从 C 中了解到的静态和外部相同吗?那“2”是什么?.type
:存储参数“作为符号表条目的类型属性”,我不知道。.endef
:没问题。.text
:现在这是有问题的,它似乎是所谓的部分,我已经读到它是代码的地方,但文档没有告诉我太多。.globl
“使符号对ld可见。”,手册对此说得很清楚。_main:
这可能是我的主函数的起始地址(?)pushl_
:长(32位)推送,将EBP放在堆栈上movl:32位移动。伪-C:
EBP = ESP;
andl
:逻辑与。伪 C:ESP = -16 & ESP,我真的不明白这有什么意义。
call
:将 IP 压入堆栈(以便被调用的过程可以找到返回的路)并在__main
所在位置继续。 (什么是 __main?)movl
:这个零必须是我在代码末尾返回的常量。 MOV 将这个零放入 EAX 中。leave
:在 ENTER 指令(?)之后恢复堆栈。为什么?ret
:返回到堆栈中保存的指令地址
感谢您的帮助!
I write empty programs to annoy the hell out of stackoverflow coders, NOT. I am just exploring the gnu toolchain.
Now the following might be too deep for me, but to continuie the empty program saga I have started to examine the output of the C compiler, the stuff GNU as consumes.
gcc version 4.4.0 (TDM-1 mingw32)
test.c:
int main()
{
return 0;
}
gcc -S test.c
.file "test.c"
.def ___main; .scl 2; .type 32; .endef
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
call ___main
movl $0, %eax
leave
ret
Can you explain what happens here? Here is my effort to understand it. I have used the as
manual and my minimal x86 ASM knowledge:
.file "test.c"
is the directive for the logical filename..def
: according to the docs "Begin defining debugging information for a symbol name". What is a symbol (a function name/variable?) and what kind of debugging information?.scl
: docs say "Storage class may flag whether a symbol is static or external". Is this the same static and external I know from C? And what is that '2'?.type
: stores the parameter "as the type attribute of a symbol table entry", I have no clue..endef
: no problem..text
: Now this is problematic, it seems to be something called section and I have read that its the place for code, but the docs didn't tell me too much..globl
"makes the symbol visible to ld.", the manual is quite clear on this._main:
This might be the starting address (?) for my main functionpushl_
: A long (32bit) push, which places EBP on the stackmovl
: 32-bit move. Pseudo-C:EBP = ESP;
andl
: Logical AND. Pseudo-C:ESP = -16 & ESP
, I don't really see whats the point of this.call
: Pushes the IP to the stack (so the called procedure can find its way back) and continues where__main
is. (what is __main?)movl
: this zero must be the constant I return at the end of my code. The MOV places this zero into EAX.leave
: restores stack after an ENTER instruction (?). Why?ret
: goes back to the instruction address that is saved on the stack
Thank you for your help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
以 . 开头的命令是对汇编器的指令。这只是说这是“file.c”,该信息可以导出到exe的调试信息。
.def 指令定义调试符号。 scl 2 表示存储类别 2(外部存储类别)。type 32 表示此 sumbol 是一个函数。这些数字将由 pe-coff exe 格式定义。
___main 是一个名为的函数,负责处理 gcc 所需的引导(它将执行诸如运行 c++ 静态初始化程序和其他所需的内务处理之类的操作)。
开始文本部分 - 代码位于此处。
将 _main 符号定义为全局符号,这将使其对链接器和链接的其他模块可见。
与 _main 相同,创建调试符号,表明 _main 是一个函数。这可供调试器使用。
开始一个新标签(它最终会是一个地址)。上面的 .globl 指令使该地址对其他实体可见。
将旧的帧指针(ebp 寄存器)保存在堆栈上(以便在该函数结束时可以将其放回原位)
将堆栈指针移至 ebp 寄存器。 ebp通常被称为帧指针,它指向当前“帧”(通常是函数)内的堆栈顶部值,(通过ebp引用堆栈上的变量可以帮助调试器)
将堆栈与 fffffff0 相加,有效地将其对齐在 16 字节边界上。访问堆栈上对齐的值比访问未对齐的值要快得多。所有这些前面的指令几乎都是标准的函数序言。
调用 ___main 函数来初始化 gcc 需要的东西。 Call会将当前指令指针压入堆栈并跳转到___main的地址
将 0 移至 eax 寄存器,(返回 0 中的 0;)eax 寄存器用于保存 stdcall 调用约定的函数返回值。
离开指令几乎是简写
即它“撤消”在函数开始时完成的操作 - 将帧指针和堆栈恢复到以前的状态。
返回调用该函数的人。它将从堆栈中弹出指令指针(相应的调用指令将放置在那里)并跳转到那里。
Commands starting with . are directives to the assembler. This just says this is "file.c", that information can be exported to the debugging information of the exe.
.def directives defines a debugging symbol. scl 2 means storage class 2(external storage class) .type 32 says this sumbol is a function. These numbers will be defined by the pe-coff exe-format
___main is a function called that takes care of bootstrapping that gcc needs(it'll do things like run c++ static initializers and other housekeeping needed).
Begins a text section - code lives here.
defines the _main symbol as global, which will make it visible to the linker and to other modules that's linked in.
Same thing as _main , creates debugging symbols stating that _main is a function. This can be used by debuggers.
Starts a new label(It'll end up an address). the .globl directive above makes this address visible to other entities.
Saves the old frame pointer(ebp register) on the stack (so it can be put back in place when this function ends)
Moves the stack pointer to the ebp register. ebp is often called the frame pointer, it points at the top of the stack values within the current "frame"(function usually), (referring to variables on the stack via ebp can help debuggers)
Ands the stack with fffffff0 which effectivly aligns it on a 16 byte boundary. Access to aligned values on the stack are much faster than if they were unaligned. All these preceding instructions are pretty much a standard function prologue.
Calls the ___main function which will do initializing stuff that gcc needs. Call will push the current instruction pointer on the stack and jump to the address of ___main
move 0 to the eax register,(the 0 in return 0;) the eax register is used to hold function return values for the stdcall calling convention.
The leave instruction is pretty much shorthand for
i.e. it "undos" the stuff done at the start of the function - restoring the frame pointer and stack to its former state.
Returns to whoever called this function. It'll pop the instruction pointer from the stack (which a corresponding call instruction will have placed there) and jump there.
这里概述了一个非常相似的练习:http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax
你已经明白了大部分内容——我只是做一些额外的注释来强调和补充。
__main
是 GNU 标准库中的一个子例程,负责各种启动初始化。对于 C 程序来说,这并不是绝对必要的,但只是在 C 代码与 C++ 链接时才需要。_main
是您的主子例程。由于_main
和__main
都是代码位置,因此它们具有相同的存储类和类型。我还没有挖掘出.scl
和.type
的定义。通过定义一些全局变量,您可能会得到一些启发。前三个指令正在设置一个堆栈帧,这是子例程工作存储的技术术语——大部分是本地变量和临时变量。压入
ebp
保存调用者栈帧的基址。将esp
放入ebp
设置堆栈帧的基础。 andl 将堆栈帧与 16 字节边界对齐,以防堆栈上的任何局部变量需要 16 字节对齐(对于 x86 SIMD 指令需要这种对齐,但对齐确实可以加快普通类型的速度,例如int
和float
此时,您通常期望
esp
在内存中向下移动,为局部变量分配堆栈空间。 因此 gcc 不会打扰__main
的调用对于主入口点来说是特殊的,通常不会出现在子例程中。您的
main
没有, 推测寄存器 eax 是在二进制规范中放置整数返回代码的地方,leave
撤消堆栈帧,ret
返回调用者。在这种情况下,调用者是低级 C 运行时,它将执行额外的操作(例如调用 atexit() 函数、设置进程的退出代码并要求操作系统终止进程)。 。There's a very similar exercise outlined here: http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax
You've figured out most of it -- I'll just make additional notes for emphasis and additions.
__main
is a subroutine in the GNU standard library that takes care of various start-up initialization. It is not strictly necessary for C programs but is required just in case the C code is linking with C++._main
is your main subroutine. As both_main
and__main
are code locations they have the same storage class and type. I've not yet dug up the definitions for.scl
and.type
yet. You may get some illumination by defining a few global variables.The first three instructions are setting up a stack frame which is a technical term for the working storage of a subroutine -- local and temporary variables for the most part. Pushing
ebp
saves the base of the caller's stack frame. Puttingesp
intoebp
sets the base of our stack frame. Theandl
aligns the stack frame to a 16 byte boundary just in case any local variables on the stack require 16 byte alignment (for the x86 SIMD instructions require that alignment, but alignment does speed up ordinary types such asint
s andfloat
s.At this point you'd normally expect
esp
to get moved down in memory to allocate stack space for local variables. Yourmain
has none so gcc doesn't bother.The call to
__main
is special to the main entry point and won't typically appear in subroutines.The rest goes as you surmised. Register
eax
is the place to put integer return codes in the binary spec.leave
undoes the stack frame andret
goes back to the caller. In this case, the caller is the low-level C runtime which will do additional magic (like callingatexit()
functions, set the exit code for the process and ask the operating system to terminate the process.关于 andl $-16,%esp
所以它将掩盖最后 4 ESP 位(顺便说一句:2**4 等于 16)并将保留所有其他位(无论目标系统是 32 位还是 64 位)。
Regarding that andl $-16,%esp
So it will mask off the last 4 bits of ESP (btw: 2**4 equals to 16) and will retain all other bits (no matter if the target system is 32 or 64 bits).
除了
andl $-16,%esp
之外,这是有效的,因为将低位设置为零将始终调整%esp
的值向下,在 x86 上堆栈向下增长。Further to the
andl $-16,%esp
, this works because setting the low bits to zero will always adjust%esp
down in value, and the stack grows downward on x86.我没有所有的答案,但我可以解释我所知道的。
函数使用
ebp
来存储esp
在其流程中的初始状态,这是对传递给函数的参数在哪里以及它自己的局部变量在哪里的引用。函数所做的第一件事是保存给定ebp
执行pushl%ebp
的状态,这对于进行调用的函数至关重要,然后将其替换为它自己的当前堆栈位置esp
执行movl %esp, %ebp
。此时将ebp
的最后 4 位清零是 GCC 特有的,我不知道为什么这个编译器会这样做。不做它也会起作用。现在终于进入正题了,调用___main
,__main是谁?我也不知道......也许更多的 GCC 特定过程,最后你的 main() 所做的唯一一件事是,使用 movl $0, %eax 和 Leave< 将返回值设置为 0 /code> 与movl %ebp, %esp; 相同popl %ebp
恢复ebp
状态,然后ret
完成。ret
弹出eip
并从该点继续线程流,无论它在哪里(因为它的 main(),这个 ret 可能会导致一些处理 eip 结束的内核过程)程序)。其中大部分都是关于管理堆栈的。我前段时间写了一篇关于如何使用堆栈的详细教程,解释为什么要制作所有这些东西会很有用。不过是葡萄牙语的...
I don't have all answers but I can explain what I know.
ebp
is used by the function to store the initial state ofesp
during its flow, a reference to where are the arguments passed to the function and where are its own local variables. The first thing a function does is to save the status of the givenebp
doingpushl %ebp
, it is vital to the function that make the call, and than replaces it by its own current stack positionesp
doingmovl %esp, %ebp
. Zeroing the last 4 bits ofebp
at this point is GCC specific, I don't know why this compiler does that. It would work without doing it. Now finally we go into business,call ___main
, who is __main? I don't know either... maybe more GCC specific procedures, and finally the only thing your main() does, set return value as 0 withmovl $0, %eax
andleave
which is the same as doingmovl %ebp, %esp; popl %ebp
to restoreebp
state, thenret
to finish.ret
popseip
and continue thread flow from that point, wherever it is (as its the main(), this ret probably leads to some kernel procedure which handles the end of the program).Most of it is all about managing the stack. I wrote a detailed tutorial about how stack is used some time ago, it would be useful to explain why all those things are made. But its in portuguese...