gcc 疯狂优化级别(-O3)还不够疯狂吗?
作为回答另一个问题的一部分,我想表明 gcc
(-O3
) 的疯狂优化级别基本上会删除 main 中未使用的任何变量。代码是:
#include <stdio.h>
int main (void) {
char bing[71];
int x = 7;
bing[0] = 11;
return 0;
}
并且 gcc -O3
输出是:
.file "qq.c"
.text
.p2align 4,,15
.globl main
.type main, @function
main:
pushl %ebp
xorl %eax, %eax
movl %esp, %ebp
popl %ebp
ret
.size main, .-main
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",@progbits
现在我可以看到它已经删除了局部变量,但其中仍然存在相当多的浪费。在我看来,整个:
pushl %ebp
xorl %eax, %eax
movl %esp, %ebp
popl %ebp
ret
部分可以替换为更简单的:
xorl %eax, %eax
ret
有谁知道为什么gcc
不执行此优化?我知道这对于 main 本身来说节省的很少,但是,如果这也用普通函数完成,那么在大规模循环中不必要地调整堆栈指针的影响将是相当大的。
用于生成程序集的命令是:
gcc -O3 -std=c99 -S qq.c
As part of answering another question, I wanted to show that the insane level of optimisation of gcc
(-O3
) would basically strip out any variables that weren't used in main. The code was:
#include <stdio.h>
int main (void) {
char bing[71];
int x = 7;
bing[0] = 11;
return 0;
}
and the gcc -O3
output was:
.file "qq.c"
.text
.p2align 4,,15
.globl main
.type main, @function
main:
pushl %ebp
xorl %eax, %eax
movl %esp, %ebp
popl %ebp
ret
.size main, .-main
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",@progbits
Now I can see it's removed the local variables but there's still quite a bit of wastage in there. It seems to me that the entire:
pushl %ebp
xorl %eax, %eax
movl %esp, %ebp
popl %ebp
ret
section could be replaced with the simpler:
xorl %eax, %eax
ret
Does anyone have any idea why gcc
does not perform this optimisation? I know that would save very little for main
itself but, if this were done with normal functions as well, the effect of unnecessarily adjusting the stack pointer in a massive loop would be considerable.
The command used to generate the assembly was:
gcc -O3 -std=c99 -S qq.c
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用 -fomit-frame-pointer 编译器标志启用该特定优化。这样做会使某些机器上的调试变得不可能,而在其他机器上则变得更加困难,这就是它通常被禁用的原因。
尽管您的 GCC 文档可能会说
-fomit-frame-pointer
在各种优化级别上启用,但您可能会发现情况并非如此 - 您几乎肯定必须自己显式启用它。You can enable that particular optimization with the
-fomit-frame-pointer
compiler flag. Doing so makes debugging impossible on some machines and substantially more difficult on everything else, which is why it's usually disabled.Although your GCC documentation may say that
-fomit-frame-pointer
is enabled at various optimization levels, you'll likely find that that's not the case—you'll almost certainly have to explicitly enable it yourself.打开
-fomit-frame-pointer
(源代码) 应该摆脱额外的堆栈操作。GCC 显然保留了这些,因为它们有助于调试(在需要时获取堆栈跟踪),尽管文档指出
-fomit-frame-pointer
是从 GCC 4.6 开始的默认设置。Turning on
-fomit-frame-pointer
(source) should get rid of the extra stack manipulations.GCC apparently left those in because they facilitate debugging (getting a stack trace when needed), although the docs note that
-fomit-frame-pointer
is the default starting with GCC 4.6.