为什么gcc使用movl而不是push来传递函数参数?
注意这段代码:
#include <stdio.h>
void a(int a, int b, int c)
{
char buffer1[5];
char buffer2[10];
}
int main()
{
a(1,2,3);
}
之后:
gcc -S a.c
该命令显示了我们的汇编源代码。
现在我们可以看到在主函数中,我们从不使用“push”命令来推送参数 将a函数放入栈中。它使用“移动”而不是
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $16, %esp
movl $3, 8(%esp)
movl $2, 4(%esp)
movl $1, (%esp)
call a
leave
为什么会发生这种情况? 他们之间有什么区别?
pay attention to this code :
#include <stdio.h>
void a(int a, int b, int c)
{
char buffer1[5];
char buffer2[10];
}
int main()
{
a(1,2,3);
}
after that :
gcc -S a.c
that command shows our source code in assembly.
now we can see in the main function, we never use "push" command to push the arguments of
the a function into the stack. and it used "movel" instead of that
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $16, %esp
movl $3, 8(%esp)
movl $2, 4(%esp)
movl $1, (%esp)
call a
leave
why does it happen?
what's difference between them?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这是gcc手册 不得不说一下:
显然
-maccumulate-outgoing-args
默认情况下启用,覆盖-mpush-args
。使用-mno-accumulate-outgoing-args
显式编译确实会恢复为PUSH
方法(此处)。2019 更新:自 Pentium M 以来,现代 CPU 已实现高效的入栈/出栈。
-mno-accumulate-outgoing-args
(并使用推送)最终在 2014 年 1 月成为-mtune=generic
的默认设置。Here is what the gcc manual has to say about it:
Apparently
-maccumulate-outgoing-args
is enabled by default, overriding-mpush-args
. Explicitly compiling with-mno-accumulate-outgoing-args
does revert to thePUSH
method, here.2019 update: modern CPUs have had efficient push/pop since about Pentium M.
-mno-accumulate-outgoing-args
(and using push) eventually became the default for-mtune=generic
in Jan 2014.该代码只是直接将常量 (1, 2, 3) 放置在距(更新的)堆栈指针 (esp) 的偏移位置处。编译器选择手动执行“推送”,但结果相同。
“push”既设置数据又更新堆栈指针。在这种情况下,编译器将其减少为仅对堆栈指针进行一次更新(而不是三次)。一项有趣的实验是尝试更改函数“a”以仅采用一个参数,并查看指令模式是否发生变化。
That code is just directly putting the constants (1, 2, 3) at offset positions from the (updated) stack pointer (esp). The compiler is choosing to do the "push" manually with the same result.
"push" both sets the data and updates the stack pointer. In this case, the compiler is reducing that to only one update of the stack pointer (vs. three). An interesting experiment would be to try changing function "a" to take only one argument, and see if the instruction pattern changes.
gcc 进行各种优化,包括根据要优化的特定 CPU 的执行速度来选择指令。您会注意到像
x *= n
这样的东西经常被 SHL、ADD 和/或 SUB 的混合所取代,特别是当 n 是常数时;而 MUL 仅在 SHL-ADD-SUB 组合的平均运行时间(以及缓存等占用空间)超过 MUL 时使用,或者 n 不是常量(因此使用循环)使用 shl-add-sub 会更昂贵)。对于函数参数:MOV 可以由硬件并行化,而 PUSH 则不能。 (由于 esp 寄存器的更新,第二个 PUSH 必须等待第一个 PUSH 完成。)在函数参数的情况下,MOV 可以并行运行。
gcc does all sorts of optimizations, including selecting instructions based upon execution speed of the particular CPU being optimized for. You will notice that things like
x *= n
is often replaced by a mix of SHL, ADD and/or SUB, especially when n is a constant; while MUL is only used when the average runtime (and cache/etc. footprints) of the combination of SHL-ADD-SUB would exceed that of MUL, orn
is not a constant (and thus using loops with shl-add-sub would come costlier).In case of function arguments: MOV can be parallelized by hardware, while PUSH cannot. (The second PUSH has to wait for the first PUSH to finish because of the update of the esp register.) In case of function arguments, MOVs can be run in parallel.
OS X 上有这个吗?我在某处读到它要求堆栈指针在 16 字节边界对齐。这可能可以解释这种代码生成。
我找到了这篇文章: http://blogs.embarcadero.com/eboling/2009/ 5607年5月20日
Is this on OS X by any chance? I read somewhere that it requires the stack pointer to be aligned at 16-byte boundaries. That could possibly explain this kind of code generation.
I found the article: http://blogs.embarcadero.com/eboling/2009/05/20/5607