GCC 如何优化循环内递增的未使用变量?
我写了这个简单的C程序:
int main() {
int i;
int count = 0;
for(i = 0; i < 2000000000; i++){
count = count + 1;
}
}
我想看看gcc编译器如何优化这个循环(显然add 1 2000000000次应该是“add 2000000000一次”)。因此:
gcc test.c 然后 a.out
上的 time
给出:
real 0m7.717s
user 0m7.710s
sys 0m0.000s
$ gcc -O2 test.c然后 time on
a.out` 给出:
real 0m0.003s
user 0m0.000s
sys 0m0.000s
然后我用 gcc -S
反汇编了两者。第一个看起来很清楚:
.file "test.c"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
movl $0, -8(%rbp)
movl $0, -4(%rbp)
jmp .L2
.L3:
addl $1, -8(%rbp)
addl $1, -4(%rbp)
.L2:
cmpl $1999999999, -4(%rbp)
jle .L3
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
.section .note.GNU-stack,"",@progbits
L3 添加,L2 将 -4(%rbp)
与 1999999999
进行比较,如果 i
i
则循环到 L3。 2000000000。
现在是优化的:
.file "test.c"
.text
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
rep
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
.section .note.GNU-stack,"",@progbits
我完全不明白那里发生了什么!我对汇编知之甚少,但我期望像
addl $2000000000, -8(%rbp)
我尝试过的那样 gcc -c -g -Wa,-a,-ad -O2 test.c 来查看 C 代码它转换为的程序集,但结果并不比前一个更清晰。
有人可以简要解释一下:
- gcc -S -O2 输出。
- 如果循环按照我的预期进行优化(一个总和而不是多个总和)?
I wrote this simple C program:
int main() {
int i;
int count = 0;
for(i = 0; i < 2000000000; i++){
count = count + 1;
}
}
I wanted to see how the gcc compiler optimizes this loop (clearly add 1 2000000000 times should be "add 2000000000 one time"). So:
gcc test.c and then time
on a.out
gives:
real 0m7.717s
user 0m7.710s
sys 0m0.000s
$ gcc -O2 test.c and then time on
a.out` gives:
real 0m0.003s
user 0m0.000s
sys 0m0.000s
Then I disassembled both with gcc -S
. First one seems quite clear:
.file "test.c"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
movl $0, -8(%rbp)
movl $0, -4(%rbp)
jmp .L2
.L3:
addl $1, -8(%rbp)
addl $1, -4(%rbp)
.L2:
cmpl $1999999999, -4(%rbp)
jle .L3
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
.section .note.GNU-stack,"",@progbits
L3 adds, L2 compare -4(%rbp)
with 1999999999
and loops to L3 if i < 2000000000
.
Now the optimized one:
.file "test.c"
.text
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
rep
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
.section .note.GNU-stack,"",@progbits
I can't understand at all what's going on there! I've got little knowledge of assembly, but I expected something like
addl $2000000000, -8(%rbp)
I even tried with gcc -c -g -Wa,-a,-ad -O2 test.c to see the C code together with the assembly it was converted to, but the result was no more clear that the previous one.
Can someone briefly explain:
- The gcc -S -O2 output.
- If the loop is optimized as I expected (one sum instead of many sums)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
编译器比这更聪明。 :)
事实上,它意识到您没有使用循环的结果。所以它完全去掉了整个循环!
这称为死代码消除。
更好的测试是打印结果:
编辑:我已经添加了所需的
#include
; MSVC 程序集列表对应于没有#include
的版本,但它应该是相同的。目前我面前没有 GCC,因为我已启动到 Windows。但这里是 MSVC 上带有
printf()
的版本的反汇编:编辑:我有错误的汇编输出。这是正确的。
所以,是的,Visual Studio 会进行此优化。我想海湾合作委员会可能也是如此。
是的,GCC 执行了类似的优化。以下是使用
gcc -S -O2 test.c
的同一程序的汇编列表(gcc 4.5.2、Ubuntu 11.10、x86):The compiler is even smarter than that. :)
In fact, it realizes that you aren't using the result of the loop. So it took out the entire loop completely!
This is called Dead Code Elimination.
A better test is to print the result:
EDIT : I've added the required
#include <stdio.h>
; the MSVC assembly listing corresponds to a version without the#include
, but it should be the same.I don't have GCC in front of me at the moment, since I'm booted into Windows. But here's the disassembly of the version with the
printf()
on MSVC:EDIT : I had the wrong assembly output. Here's the correct one.
So yes, Visual Studio does this optimization. I'd assume GCC probably does too.
And yes, GCC performs a similar optimization. Here's an assembly listing for the same program with
gcc -S -O2 test.c
(gcc 4.5.2, Ubuntu 11.10, x86):编译器可以使用一些工具来使代码更高效或更“高效”:
如果从未使用计算结果,则可以省略执行计算的代码(如果计算作用于
易失性
值,这些值仍然必须被读取,但读取的结果可能会被忽略)。如果未使用提供给它的计算结果,则执行这些操作的代码也可以省略。如果这样的省略使得条件分支上的两个路径的代码相同,则该条件可以被视为未使用并被省略。这不会对任何不进行越界内存访问或调用附件 L 所称的“关键未定义行为”的程序的行为(执行时间除外)产生任何影响。如果编译器确定计算值的机器代码只能产生特定范围内的结果,则它可能会忽略任何可以在此基础上预测其结果的条件测试。如上所述,这不会影响执行时间以外的行为,除非代码调用“关键未定义行为”。
如果编译器确定某些输入将在编写的代码中调用任何形式的未定义行为,则标准将允许编译器省略仅在收到此类输入时才相关的任何代码,即使给定此类输入的执行平台本来是良性的,而编译器的重写将使其变得危险。
好的编译器会执行#1 和#2。然而,出于某种原因,#3 变得流行。
Compilers have a few tools at their disposal to make code more efficient or more "efficient":
If the result of a computation is never used, the code that performs the computation can be omitted (if the computation acted upon
volatile
values, those values must still be read but the results of the read may be ignored). If the results of the computations that fed it weren't used, the code that performs those can be omitted as well. If such omission makes the code for both paths on a conditional branch identical, the condition may be regarded as unused and omitted. This will have no effect on the behaviors (other than execution time) of any program that doesn't make out-of-bounds memory accesses or invoke what Annex L would call "Critical Undefined Behaviors".If the compiler determines that the machine code that computes a value can only produce results in a certain range, it may omit any conditional tests whose outcome could be predicted on that basis. As above, this will not affect behaviors other than execution time unless code invokes "Critical Undefined Behaviors".
If the compiler determines that certain inputs would invoke any form of Undefined Behavior with the code as written, the Standard would allow the compiler to omit any code which would only be relevant when such inputs are received, even if the natural behavior of the execution platform given such inputs would have been benign and the compiler's rewrite would make it dangerous.
Good compilers do #1 and #2. For some reason, however, #3 has become fashionable.