学习拆解
为了尝试了解下面发生的情况,我编写了一些小型 C 程序,然后反转它,并尝试了解其 objdump 输出。
C 程序是:
#include <stdio.h>
int function(int a, int b, int c) {
printf("%d, %d, %d\n", a,b,c);
}
int main() {
int a;
int *ptr;
asm("nop");
function(1,2,3);
}
函数的 objdump 输出给出以下内容。
080483a4 <function>:
80483a4: 55 push ebp
80483a5: 89 e5 mov ebp,esp
80483a7: 83 ec 08 sub esp,0x8
80483aa: ff 75 10 push DWORD PTR [ebp+16]
80483ad: ff 75 0c push DWORD PTR [ebp+12]
80483b0: ff 75 08 push DWORD PTR [ebp+8]
80483b3: 68 04 85 04 08 push 0x8048504
80483b8: e8 fb fe ff ff call 80482b8 <printf@plt>
80483bd: 83 c4 10 add esp,0x10
80483c0: c9 leave
请注意,在调用 printf 之前,偏移量为 8、16、12 的三个 DWORD(它们必须是 function
的参数,顺序相反)被压入堆栈。随后将推送一个十六进制地址,该地址必须是格式字符串的地址。
我的疑问是
- 我希望看到 esp 被手动递减,然后值被压入堆栈,而不是直接将 3 个 DWORDS 和格式说明符压入堆栈。如何解释这一行为?
In an attempt to understand what occurs underneath I am making small C programs and then reversing it, and trying to understand its objdump output.
The C program is:
#include <stdio.h>
int function(int a, int b, int c) {
printf("%d, %d, %d\n", a,b,c);
}
int main() {
int a;
int *ptr;
asm("nop");
function(1,2,3);
}
The objdump output for function gives me the following.
080483a4 <function>:
80483a4: 55 push ebp
80483a5: 89 e5 mov ebp,esp
80483a7: 83 ec 08 sub esp,0x8
80483aa: ff 75 10 push DWORD PTR [ebp+16]
80483ad: ff 75 0c push DWORD PTR [ebp+12]
80483b0: ff 75 08 push DWORD PTR [ebp+8]
80483b3: 68 04 85 04 08 push 0x8048504
80483b8: e8 fb fe ff ff call 80482b8 <printf@plt>
80483bd: 83 c4 10 add esp,0x10
80483c0: c9 leave
Notice that before the call to printf, three DWORD's with offsets 8,16,12(they must be the arguments to function
in the reverse order) are being pushed onto the stack. Later a hex address which must be the address of the format string is being pushed.
My doubt is
- Rather than pushing 3 DWORDS and the format specifier onto the stack directly, I expected to see the esp being manually decremented and the values being pushed onto the stack after that. How can one explain this behaviour?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
嗯,有些机器有一个堆栈指针,有点像任何其他寄存器,所以你推入东西的方式是,是的,先递减,然后存储。
但有些机器,例如 x8632/64,有一条 push 指令来执行宏操作:递减指针并进行存储。
顺便说一句,宏操作有一段有趣的历史。有时,某些机器上的某些示例比使用简单指令执行基本操作要慢。
我怀疑今天是否经常出现这种情况。现代 x86 非常复杂。 CPU 会将操作码本身分解为微操作,然后将其存储在缓存中。微操作具有特定的管道和时隙要求,最终结果是现在 x86 内部有一个 RISC cpu,整个过程运行得非常快并且具有良好的架构层代码密度。
Well, some machines have a stack pointer that is kind of like any other register, so the way you push something is, yes, with a decrement followed by a store.
But some machines, like x8632/64 have a push instruction that does a macro-op: decrementing the pointer and doing the store.
Macro-ops, btw, have a funny history. At times, certain examples on certain machines have been slower than performing the elementary operations with simple instructions.
I doubt if that's frequently the case today. Modern x86 is amazingly sophisticated. The CPU will be disassembling your opcodes themselves into micro-ops which it then stores in a cache. The micro-ops have specific pipeline and time slot requirements and the end result is that there is a RISC cpu inside the x86 these days, and the whole thing goes really fast and has good architectural-layer code density.
堆栈指针通过
push
指令进行调整。因此它被复制到ebp
,并且参数被推入堆栈,因此它们分别存在于两个位置:function
的堆栈和printf
的堆栈堆。push
es 影响esp
,因此ebp
被复制。The stack pointer is adjusted with the
push
instruction. So it's copied toebp
and the parameters are pushed onto the stack so they exist in 2 places each:function
's stack andprintf
's stack. Thepush
es affectesp
, thusebp
is copied.没有mov [esp+x],[ebp+y]指令,操作数太多。它将需要两条指令并使用寄存器。 Push 只需一条指令即可完成。
There is no mov [esp+x], [ebp+y] instruction, too many operands. It would take two instructions and use a register. Push does it in one instruction.
这是 x86 机器的标准 cdecl 调用约定。有几种不同类型的调用约定。您可以在维基百科中阅读以下有关它的文章:
http://en.wikipedia.org/wiki/X86_calling_conventions
解释了基本原理。
This is a standard cdecl calling convention for x86 machine. There are several different types of calling conventions. You can read the following article in the Wikipedia about it:
http://en.wikipedia.org/wiki/X86_calling_conventions
It explains the basic principle.
你提出了一个有趣的观点,我认为迄今为止尚未直接解决。我想您已经看到过如下所示的汇编代码:
这种反汇编是由某些编译器生成的。它所做的就是扩展堆栈,然后将新空间的值分配为 eax(希望此时已填充了有意义的内容)。这实际上相当于
push
助记符的作用。我无法回答为什么某些编译器会生成此代码,但我的猜测是,在某些时候这样做被认为更有效。You raise an interesting point which I think has not been directly addressed so far. I suppose that you have seen assembly code which looks something like this:
This sort of disassembly is generated by certain compilers. What it is doing is extending the stack, then assigning the value of the new space to be
eax
(which has hopefully been populated with something meaningful by that point). This is actually equivalent to what thepush
mnemonic does. I can't answer why certain compilers generate this code instead but my guess is that at some point doing it this way was judged to be more efficient.在您学习汇编语言和反汇编二进制文件的过程中,您可能会发现 ODA 很有用。它是一个基于 Web 的反汇编程序,可以方便地反汇编许多不同的体系结构,而无需为每个体系结构构建 binutil 的 objdump。
http://onlinedisassembler.com/
In your effort to learn assembly language and disassemble binaries, you might find ODA useful. It's a web-based disassembler, which is handy for disassembling lots of different architectures without having to build binutil's objdump for each one of them.
http://onlinedisassembler.com/