GCC 如何优化循环内递增的未使用变量?

发布于 2024-12-26 17:19:04 字数 2053 浏览 1 评论 0原文

我写了这个简单的C程序:

int main() {
    int i;
    int count = 0;
    for(i = 0; i < 2000000000; i++){
        count = count + 1;
    }
}

我想看看gcc编译器如何优化这个循环(显然add 1 2000000000次应该是“add 2000000000一次”)。因此:

gcc test.c 然后 a.out 上的 time 给出:

real 0m7.717s  
user 0m7.710s  
sys 0m0.000s  

$ gcc -O2 test.c然后 time ona.out` 给出:

real 0m0.003s  
user 0m0.000s  
sys 0m0.000s  

然后我用 gcc -S 反汇编了两者。第一个看起来很清楚:

    .file "test.c"  
    .text  
.globl main
    .type   main, @function  
main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    movq    %rsp, %rbp
    .cfi_offset 6, -16
    .cfi_def_cfa_register 6
    movl    $0, -8(%rbp)
    movl    $0, -4(%rbp)
    jmp .L2
.L3:
    addl    $1, -8(%rbp)
    addl    $1, -4(%rbp)
.L2:
    cmpl    $1999999999, -4(%rbp)
    jle .L3
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
    .section    .note.GNU-stack,"",@progbits

L3 添加,L2 将 -4(%rbp)1999999999 进行比较,如果 i i 则循环到 L3。 2000000000。

现在是优化的:

    .file "test.c"  
    .text
    .p2align 4,,15
.globl main
    .type main, @function
main:
.LFB0:
    .cfi_startproc
    rep
    ret
    .cfi_endproc
.LFE0:
    .size main, .-main
    .ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
    .section .note.GNU-stack,"",@progbits

我完全不明白那里发生了什么!我对汇编知之甚少,但我期望像

addl $2000000000, -8(%rbp)

我尝试过的那样 gcc -c -g -Wa,-a,-ad -O2 test.c 来查看 C 代码它转换为的程序集,但结果并不比前一个更清晰。

有人可以简要解释一下:

  1. gcc -S -O2 输出。
  2. 如果循环按照我的预期进行优化(一个总和而不是多个总和)?

I wrote this simple C program:

int main() {
    int i;
    int count = 0;
    for(i = 0; i < 2000000000; i++){
        count = count + 1;
    }
}

I wanted to see how the gcc compiler optimizes this loop (clearly add 1 2000000000 times should be "add 2000000000 one time"). So:

gcc test.c and then time on a.out gives:

real 0m7.717s  
user 0m7.710s  
sys 0m0.000s  

$ gcc -O2 test.c and then time ona.out` gives:

real 0m0.003s  
user 0m0.000s  
sys 0m0.000s  

Then I disassembled both with gcc -S. First one seems quite clear:

    .file "test.c"  
    .text  
.globl main
    .type   main, @function  
main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    movq    %rsp, %rbp
    .cfi_offset 6, -16
    .cfi_def_cfa_register 6
    movl    $0, -8(%rbp)
    movl    $0, -4(%rbp)
    jmp .L2
.L3:
    addl    $1, -8(%rbp)
    addl    $1, -4(%rbp)
.L2:
    cmpl    $1999999999, -4(%rbp)
    jle .L3
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
    .section    .note.GNU-stack,"",@progbits

L3 adds, L2 compare -4(%rbp) with 1999999999 and loops to L3 if i < 2000000000.

Now the optimized one:

    .file "test.c"  
    .text
    .p2align 4,,15
.globl main
    .type main, @function
main:
.LFB0:
    .cfi_startproc
    rep
    ret
    .cfi_endproc
.LFE0:
    .size main, .-main
    .ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
    .section .note.GNU-stack,"",@progbits

I can't understand at all what's going on there! I've got little knowledge of assembly, but I expected something like

addl $2000000000, -8(%rbp)

I even tried with gcc -c -g -Wa,-a,-ad -O2 test.c to see the C code together with the assembly it was converted to, but the result was no more clear that the previous one.

Can someone briefly explain:

  1. The gcc -S -O2 output.
  2. If the loop is optimized as I expected (one sum instead of many sums)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

静待花开 2025-01-02 17:19:04

编译器比这更聪明。 :)

事实上,它意识到您没有使用循环的结果。所以它完全去掉了整个循环!

这称为死代码消除

更好的测试是打印结果:

#include <stdio.h>
int main(void) {
    int i; int count = 0;
    for(i = 0; i < 2000000000; i++){
        count = count + 1;
    }

    //  Print result to prevent Dead Code Elimination
    printf("%d\n", count);
}

编辑:我已经添加了所需的#include; MSVC 程序集列表对应于没有 #include 的版本,但它应该是相同的。


目前我面前没有 GCC,因为我已启动到 Windows。但这里是 MSVC 上带有 printf() 的版本的反汇编:

编辑:我有错误的汇编输出。这是正确的。

; 57   : int main(){

$LN8:
    sub rsp, 40                 ; 00000028H

; 58   : 
; 59   : 
; 60   :     int i; int count = 0;
; 61   :     for(i = 0; i < 2000000000; i++){
; 62   :         count = count + 1;
; 63   :     }
; 64   : 
; 65   :     //  Print result to prevent Dead Code Elimination
; 66   :     printf("%d\n",count);

    lea rcx, OFFSET FLAT:??_C@_03PMGGPEJJ@?$CFd?6?$AA@
    mov edx, 2000000000             ; 77359400H
    call    QWORD PTR __imp_printf

; 67   : 
; 68   : 
; 69   : 
; 70   :
; 71   :     return 0;

    xor eax, eax

; 72   : }

    add rsp, 40                 ; 00000028H
    ret 0

所以,是的,Visual Studio 会进行此优化。我想海湾合作委员会可能也是如此。

是的,GCC 执行了类似的优化。以下是使用 gcc -S -O2 test.c 的同一程序的汇编列表(gcc 4.5.2、Ubuntu 11.10、x86):

        .file   "test.c"
        .section        .rodata.str1.1,"aMS",@progbits,1
.LC0:
        .string "%d\n"
        .text
        .p2align 4,,15
.globl main
        .type   main, @function
main:
        pushl   %ebp
        movl    %esp, %ebp
        andl    $-16, %esp
        subl    $16, %esp
        movl    $2000000000, 8(%esp)
        movl    $.LC0, 4(%esp)
        movl    $1, (%esp)
        call    __printf_chk
        leave
        ret
        .size   main, .-main
        .ident  "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
        .section        .note.GNU-stack,"",@progbits

The compiler is even smarter than that. :)

In fact, it realizes that you aren't using the result of the loop. So it took out the entire loop completely!

This is called Dead Code Elimination.

A better test is to print the result:

#include <stdio.h>
int main(void) {
    int i; int count = 0;
    for(i = 0; i < 2000000000; i++){
        count = count + 1;
    }

    //  Print result to prevent Dead Code Elimination
    printf("%d\n", count);
}

EDIT : I've added the required #include <stdio.h>; the MSVC assembly listing corresponds to a version without the #include, but it should be the same.


I don't have GCC in front of me at the moment, since I'm booted into Windows. But here's the disassembly of the version with the printf() on MSVC:

EDIT : I had the wrong assembly output. Here's the correct one.

; 57   : int main(){

$LN8:
    sub rsp, 40                 ; 00000028H

; 58   : 
; 59   : 
; 60   :     int i; int count = 0;
; 61   :     for(i = 0; i < 2000000000; i++){
; 62   :         count = count + 1;
; 63   :     }
; 64   : 
; 65   :     //  Print result to prevent Dead Code Elimination
; 66   :     printf("%d\n",count);

    lea rcx, OFFSET FLAT:??_C@_03PMGGPEJJ@?$CFd?6?$AA@
    mov edx, 2000000000             ; 77359400H
    call    QWORD PTR __imp_printf

; 67   : 
; 68   : 
; 69   : 
; 70   :
; 71   :     return 0;

    xor eax, eax

; 72   : }

    add rsp, 40                 ; 00000028H
    ret 0

So yes, Visual Studio does this optimization. I'd assume GCC probably does too.

And yes, GCC performs a similar optimization. Here's an assembly listing for the same program with gcc -S -O2 test.c (gcc 4.5.2, Ubuntu 11.10, x86):

        .file   "test.c"
        .section        .rodata.str1.1,"aMS",@progbits,1
.LC0:
        .string "%d\n"
        .text
        .p2align 4,,15
.globl main
        .type   main, @function
main:
        pushl   %ebp
        movl    %esp, %ebp
        andl    $-16, %esp
        subl    $16, %esp
        movl    $2000000000, 8(%esp)
        movl    $.LC0, 4(%esp)
        movl    $1, (%esp)
        call    __printf_chk
        leave
        ret
        .size   main, .-main
        .ident  "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
        .section        .note.GNU-stack,"",@progbits
守不住的情 2025-01-02 17:19:04

编译器可以使用一些工具来使代码更高效或更“高效”:

  1. 如果从未使用计算结果,则可以省略执行计算的代码(如果计算作用于 易失性值,这些值仍然必须被读取,但读取的结果可能会被忽略)。如果未使用提供给它的计算结果,则执行这些操作的代码也可以省略。如果这样的省略使得条件分支上的两个路径的代码相同,则该条件可以被视为未使用并被省略。这不会对任何不进行越界内存访问或调用附件 L 所称的“关键未定义行为”的程序的行为(执行时间除外)产生任何影响。

  2. 如果编译器确定计算值的机器代码只能产生特定范围内的结果,则它可能会忽略任何可以在此基础上预测其结果的条件测试。如上所述,这不会影响执行时间以外的行为,除非代码调用“关键未定义行为”。

  3. 如果编译器确定某些输入将在编写的代码中调用任何形式的未定义行为,则标准将允许编译器省略仅在收到此类输入时才相关的任何代码,即使给定此类输入的执行平台本来是良性的,而编译器的重写将使其变得危险。

好的编译器会执行#1 和#2。然而,出于某种原因,#3 变得流行。

Compilers have a few tools at their disposal to make code more efficient or more "efficient":

  1. If the result of a computation is never used, the code that performs the computation can be omitted (if the computation acted upon volatile values, those values must still be read but the results of the read may be ignored). If the results of the computations that fed it weren't used, the code that performs those can be omitted as well. If such omission makes the code for both paths on a conditional branch identical, the condition may be regarded as unused and omitted. This will have no effect on the behaviors (other than execution time) of any program that doesn't make out-of-bounds memory accesses or invoke what Annex L would call "Critical Undefined Behaviors".

  2. If the compiler determines that the machine code that computes a value can only produce results in a certain range, it may omit any conditional tests whose outcome could be predicted on that basis. As above, this will not affect behaviors other than execution time unless code invokes "Critical Undefined Behaviors".

  3. If the compiler determines that certain inputs would invoke any form of Undefined Behavior with the code as written, the Standard would allow the compiler to omit any code which would only be relevant when such inputs are received, even if the natural behavior of the execution platform given such inputs would have been benign and the compiler's rewrite would make it dangerous.

Good compilers do #1 and #2. For some reason, however, #3 has become fashionable.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文