为什么gcc使用movl而不是push来传递函数参数?

发布于 2024-10-08 21:44:10 字数 500 浏览 10 评论 0原文

注意这段代码:

#include <stdio.h>
void a(int a, int b, int c)
{
    char buffer1[5];
    char buffer2[10];
}

int main()
{
    a(1,2,3); 
}

之后:

gcc -S a.c

该命令显示了我们的汇编源代码。

现在我们可以看到在主函数中,我们从不使用“push”命令来推送参数 将a函数放入栈中。它使用“移动”而不是

main:
 pushl %ebp
 movl %esp, %ebp
 andl $-16, %esp
 subl $16, %esp
 movl $3, 8(%esp)
 movl $2, 4(%esp)
 movl $1, (%esp)
 call a
 leave

为什么会发生这种情况? 他们之间有什么区别?

pay attention to this code :

#include <stdio.h>
void a(int a, int b, int c)
{
    char buffer1[5];
    char buffer2[10];
}

int main()
{
    a(1,2,3); 
}

after that :

gcc -S a.c

that command shows our source code in assembly.

now we can see in the main function, we never use "push" command to push the arguments of
the a function into the stack. and it used "movel" instead of that

main:
 pushl %ebp
 movl %esp, %ebp
 andl $-16, %esp
 subl $16, %esp
 movl $3, 8(%esp)
 movl $2, 4(%esp)
 movl $1, (%esp)
 call a
 leave

why does it happen?
what's difference between them?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

对风讲故事 2024-10-15 21:44:10

这是gcc手册 不得不说一下:

-mpush-args
-mno-push-args
    Use PUSH operations to store outgoing parameters. This method is shorter and usually
    equally fast as method using SUB/MOV operations and is enabled by default. 
    In some cases disabling it may improve performance because of improved scheduling
    and reduced dependencies.

 -maccumulate-outgoing-args
    If enabled, the maximum amount of space required for outgoing arguments will be
    computed in the function prologue. This is faster on most modern CPUs because of
    reduced dependencies, improved scheduling and reduced stack usage when preferred
    stack boundary is not equal to 2. The drawback is a notable increase in code size.
    This switch implies -mno-push-args. 

显然 -maccumulate-outgoing-args 默认情况下启用,覆盖 -mpush-args。使用 -mno-accumulate-outgoing-args 显式编译确实会恢复为 PUSH 方法(此处)。


2019 更新:自 Pentium M 以来,现代 CPU 已实现高效的入栈/出栈。
-mno-accumulate-outgoing-args(并使用推送)最终在 2014 年 1 月成为 -mtune=generic 的默认设置。

Here is what the gcc manual has to say about it:

-mpush-args
-mno-push-args
    Use PUSH operations to store outgoing parameters. This method is shorter and usually
    equally fast as method using SUB/MOV operations and is enabled by default. 
    In some cases disabling it may improve performance because of improved scheduling
    and reduced dependencies.

 -maccumulate-outgoing-args
    If enabled, the maximum amount of space required for outgoing arguments will be
    computed in the function prologue. This is faster on most modern CPUs because of
    reduced dependencies, improved scheduling and reduced stack usage when preferred
    stack boundary is not equal to 2. The drawback is a notable increase in code size.
    This switch implies -mno-push-args. 

Apparently -maccumulate-outgoing-args is enabled by default, overriding -mpush-args. Explicitly compiling with -mno-accumulate-outgoing-args does revert to the PUSH method, here.


2019 update: modern CPUs have had efficient push/pop since about Pentium M.
-mno-accumulate-outgoing-args (and using push) eventually became the default for -mtune=generic in Jan 2014.

断肠人 2024-10-15 21:44:10

该代码只是直接将常量 (1, 2, 3) 放置在距(更新的)堆栈指针 (esp) 的偏移位置处。编译器选择手动执行“推送”,但结果相同。

“push”既设置数据又更新堆栈指针。在这种情况下,编译器将其减少为仅对堆栈指针进行一次更新(而不是三次)。一项有趣的实验是尝试更改函数“a”以仅采用一个参数,并查看指令模式是否发生变化。

That code is just directly putting the constants (1, 2, 3) at offset positions from the (updated) stack pointer (esp). The compiler is choosing to do the "push" manually with the same result.

"push" both sets the data and updates the stack pointer. In this case, the compiler is reducing that to only one update of the stack pointer (vs. three). An interesting experiment would be to try changing function "a" to take only one argument, and see if the instruction pattern changes.

清风疏影 2024-10-15 21:44:10

gcc 进行各种优化,包括根据要优化的特定 CPU 的执行速度来选择指令。您会注意到像 x *= n 这样的东西经常被 SHL、ADD 和/或 SUB 的混合所取代,特别是当 n 是常数时;而 MUL 仅在 SHL-ADD-SUB 组合的平均运行时间(以及缓存等占用空间)超过 MUL 时使用,或者 n 不是常量(因此使用循环)使用 shl-add-sub 会更昂贵)。

对于函数参数:MOV 可以由硬件并行化,而 PUSH 则不能。 (由于 esp 寄存器的更新,第二个 PUSH 必须等待第一个 PUSH 完成。)在函数参数的情况下,MOV 可以并行运行。

gcc does all sorts of optimizations, including selecting instructions based upon execution speed of the particular CPU being optimized for. You will notice that things like x *= n is often replaced by a mix of SHL, ADD and/or SUB, especially when n is a constant; while MUL is only used when the average runtime (and cache/etc. footprints) of the combination of SHL-ADD-SUB would exceed that of MUL, or n is not a constant (and thus using loops with shl-add-sub would come costlier).

In case of function arguments: MOV can be parallelized by hardware, while PUSH cannot. (The second PUSH has to wait for the first PUSH to finish because of the update of the esp register.) In case of function arguments, MOVs can be run in parallel.

月下凄凉 2024-10-15 21:44:10

OS X 上有这个吗?我在某处读到它要求堆栈指针在 16 字节边界对齐。这可能可以解释这种代码生成。

我找到了这篇文章: http://blogs.embarcadero.com/eboling/2009/ 5607年5月20日

Is this on OS X by any chance? I read somewhere that it requires the stack pointer to be aligned at 16-byte boundaries. That could possibly explain this kind of code generation.

I found the article: http://blogs.embarcadero.com/eboling/2009/05/20/5607

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文