为什么使用 GCC 在 x86 上整数溢出会导致无限循环？

发布于 2024-12-08 17:51:46 字数 2033 浏览 0 评论 0原文

以下代码在 GCC 上进入无限循环：

#include <iostream>
using namespace std;

int main(){
    int i = 0x10000000;

    int c = 0;
    do{
        c++;
        i += i;
        cout << i << endl;
    }while (i > 0);

    cout << c << endl;
    return 0;
}

所以情况如下： 有符号整数溢出在技术上是未定义的行为。但是 x86 上的 GCC 使用 x86 整数指令实现整数算术 - 这会导致溢出。

因此，我预计它会在溢出时换行 - 尽管事实上它是未定义的行为。但事实显然并非如此。那么我错过了什么？

我使用以下方法编译：

~/Desktop$ g++ main.cpp -O2

GCC 输出：

~/Desktop$ ./a.out
536870912
1073741824
-2147483648
0
0
0

... (infinite loop)

禁用优化后，不会出现无限循环，并且输出是正确的。 Visual Studio 也可以正确编译它并给出以下结果：

正确输出：

~/Desktop$ g++ main.cpp
~/Desktop$ ./a.out
536870912
1073741824
-2147483648
3

以下是一些其他变体：

i *= 2;   //  Also fails and goes into infinite loop.
i <<= 1;  //  This seems okay. It does not enter infinite loop.

以下是所有相关版本信息：

~/Desktop$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu/4.5.2/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ..

...

Thread model: posix
gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4) 
~/Desktop$

< strong>所以问题是：这是 GCC 中的一个错误吗？或者我是否误解了 GCC 如何处理整数算术？

*我也标记了这个C，因为我假设这个错误会在C中重现。（我还没有验证它。）

编辑：

这是循环的汇编：（如果我正确识别它）

.L5:
addl    %ebp, %ebp
movl    $_ZSt4cout, %edi
movl    %ebp, %esi
.cfi_offset 3, -40
call    _ZNSolsEi
movq    %rax, %rbx
movq    (%rax), %rax
movq    -24(%rax), %rax
movq    240(%rbx,%rax), %r13
testq   %r13, %r13
je  .L10
cmpb    $0, 56(%r13)
je  .L3
movzbl  67(%r13), %eax
.L4:
movsbl  %al, %esi
movq    %rbx, %rdi
addl    $1, %r12d
call    _ZNSo3putEc
movq    %rax, %rdi
call    _ZNSo5flushEv
cmpl    $3, %r12d
jne .L5

原文

The following code goes into an infinite loop on GCC:

#include <iostream>
using namespace std;

int main(){
    int i = 0x10000000;

    int c = 0;
    do{
        c++;
        i += i;
        cout << i << endl;
    }while (i > 0);

    cout << c << endl;
    return 0;
}

So here's the deal: Signed integer overflow is technically undefined behavior. But GCC on x86 implements integer arithmetic using x86 integer instructions - which wrap on overflow.

Therefore, I would have expected it to wrap on overflow - despite the fact that it is undefined behavior. But that's clearly not the case. So what did I miss?

I compiled this using:

~/Desktop$ g++ main.cpp -O2

GCC Output:

~/Desktop$ ./a.out
536870912
1073741824
-2147483648
0
0
0

... (infinite loop)

With optimizations disabled, there is no infinite loop and the output is correct. Visual Studio also correctly compiles this and gives the following result:

Correct Output:

~/Desktop$ g++ main.cpp
~/Desktop$ ./a.out
536870912
1073741824
-2147483648
3

Here are some other variations:

i *= 2;   //  Also fails and goes into infinite loop.
i <<= 1;  //  This seems okay. It does not enter infinite loop.

Here's all the relevant version information:

~/Desktop$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu/4.5.2/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ..

...

Thread model: posix
gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4) 
~/Desktop$

So the question is: Is this a bug in GCC? Or did I misunderstand something about how GCC handles integer arithmetic?

*I'm tagging this C as well, because I assume this bug will reproduce in C. (I haven't verified it yet.)

EDIT:

Here's the assembly of the loop: (if I recognized it properly)

.L5:
addl    %ebp, %ebp
movl    $_ZSt4cout, %edi
movl    %ebp, %esi
.cfi_offset 3, -40
call    _ZNSolsEi
movq    %rax, %rbx
movq    (%rax), %rax
movq    -24(%rax), %rax
movq    240(%rbx,%rax), %r13
testq   %r13, %r13
je  .L10
cmpb    $0, 56(%r13)
je  .L3
movzbl  67(%r13), %eax
.L4:
movsbl  %al, %esi
movq    %rbx, %rdi
addl    $1, %r12d
call    _ZNSo3putEc
movq    %rax, %rdi
call    _ZNSo5flushEv
cmpl    $3, %r12d
jne .L5

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

她说她爱他 2024-12-15 17:51:46

当标准说这是未定义的行为时，它就是这个意思。任何事情都可能发生。 “任何事情”包括“通常整数环绕，但有时会发生奇怪的事情”。

是的，在 x86 CPU 上，整数通常按照您期望的方式包装。这是这些例外之一。编译器假设您不会导致未定义的行为，并且优化循环测试。如果你确实想要环绕，请在编译时将 -fwrapv 传递给 g++ 或 gcc ；这为您提供了明确定义的（二进制补码）溢出语义，但可能会损害性能。

回复收藏 0 原文

独自←快乐 2024-12-15 17:51:46

很简单：未定义的行为 - 尤其是在打开优化 (-O2) 的情况下 - 意味着任何事情都可能发生。

如果没有 -O2 开关，您的代码将按照您的预期运行。

顺便说一句，它与 icl 和 tcc 配合得很好，但你不能依赖这样的东西......

根据这个，gcc优化实际上利用了有符号整数溢出。这意味着“错误”是设计使然。

回复收藏 0 原文

醉生梦死 2024-12-15 17:51:46

这里需要注意的重要一点是，C++ 程序是为 C++ 抽象机（通常通过硬件指令模拟）编写的。您正在为 x86 进行编译的事实完全与它具有未定义的行为这一事实无关。

编译器可以自由地使用未定义行为的存在来改进其优化（通过从循环中删除条件，如本例所示）。除了要求机器代码在执行时产生 C++ 抽象机所需的结果之外，C++ 级别构造和 x86 级别机器代码构造之间没有保证的映射，甚至没有有用的映射。

回复收藏 0 原文

才能让你更想念 2024-12-15 17:51:46

请大家注意，未定义的行为就是未定义。这意味着任何事情都可能发生。在实践中（如本例所示），编译器可以自由地假设它不会被调用，并且如果可以使代码更快/更小，就可以做任何它想做的事情。任何人都可以猜测不应该运行的代码会发生什么。它将取决于周围的代码（取决于此，编译器可以生成不同的代码）、使用的变量/常量、编译器标志……哦，编译器可以更新并以不同的方式编写相同的代码，或者您可以使用另一个对代码生成有不同看法的编译器。或者只是买一台不同的机器，即使是同一架构系列中的另一个模型也很可能有它自己的未定义行为（查找未定义的操作码，一些有进取心的程序员发现，在一些早期的机器上有时确实做了有用的事情......）。没有“编译器对未定义的行为给出明确的行为”。有些区域是实现定义的，您应该能够依赖编译器的一致行为。

回复收藏 0 原文

挽梦忆笙歌 2024-12-15 17:51:46

i += i;

// 溢出未定义。

使用 -fwrapv 是正确的。 -fwrapv

i += i;

// the overflow is undefined.

With -fwrapv it is correct. -fwrapv

回复收藏 0 原文

无敌元气妹 2024-12-15 17:51:46

即使编译器指定整数溢出必须被视为未定义行为的“非关键”形式（如附录 L 中所定义），在没有更具体行为的特定平台承诺的情况下，整数溢出的结果应该是至少被视为“部分不确定的值”。在这样的规则下，添加 1073741824+1073741824 可以任意被视为产生 2147483648 或 -2147483648 或与 2147483648 mod 4294967296 全等的任何其他值，并且通过加法获得的值可以任意被视为产生全等的任何值到 0 mod 4294967296。

允许溢出产生“部分不确定值”的规则将被充分定义以遵守附件 L 的文字和精神，但不会阻止编译器做出与如果溢出是不受约束的未定义行为，则合理。它将阻止编译器进行一些虚假的“优化”，其在许多情况下的主要作用是要求程序员向代码添加额外的混乱，其唯一目的是防止此类“优化”；这是否是一件好事取决于一个人的观点。

回复收藏 0 原文

~没有更多了~