没有有用且可靠的方法来检测 C/C++ 中的整数溢出?

发布于 2024-11-26 12:36:00 字数 2495 浏览 0 评论 0原文

不,这不是如何检测整数溢出的重复?。问题是相同的,但问题不同。


gcc 编译器可以优化溢出检查(使用 -O2),例如:

int a, b;
b = abs(a);                     // will overflow if a = 0x80000000
if (b < 0) printf("overflow");  // optimized away

gcc 人们认为这不是一个错误。根据 C 标准,溢出是未定义的行为,它允许编译器执行任何操作。显然,任何事情都包括假设永远不会发生溢出。不幸的是,这允许编译器优化溢出检查。

最近的 CERT 论文。本文建议在两个整数相加之前做这样的事情:

if ( ((si1^si2) | (((si1^(~(si1^si2) & INT_MIN)) + si2)^si2)) >= 0) { 
  /* handle error condition */
} else {
  sum = si1 + si2;
}

显然,当你想确保结果有效时,你必须在一系列计算中的每个+、-、*、/等操作之前做这样的事情。例如,如果您想确保数组索引不越界。这太麻烦了,以至于几乎没有人这样做。至少我从未见过系统地执行此操作的 C/C++ 程序。

现在,这是一个基本问题:

  • 在访问数组之前检查数组索引很有用,但不可靠。

  • 使用 CERT 方法检查一系列计算中的每个操作是可靠的,但没有用处。

  • 结论:没有有用且可靠的方法来检查 C/C++ 中的溢出!

我不相信标准编写时是有意为之的。

我知道有某些命令行选项可以解决问题,但这并不能改变我们对标准或标准的当前解释存在根本问题的事实。

现在我的问题是: 当 gcc 人员允许优化溢出检查时,他们对“未定义行为”的解释是否太过分了,还是 C/C++ 标准被破坏了?

添加注释: 抱歉,您可能误解了我的问题。我不是在问如何解决这个问题 - 这已经得到解答 其他地方。我正在问一个关于 C 标准的更基本的问题。如果没有有用且可靠的方法来检查溢出,那么语言本身就是可疑的。例如,如果我创建一个带有边界检查的安全数组类,那么我应该是安全的,但如果可以优化边界检查,我就不安全了。

如果标准允许这种情况发生,那么要么标准需​​要修订,要么标准的解释需要修订。

添加注释2: 这里的人们似乎不愿意讨论“未定义行为”这个可疑的概念。 C99 标准列出了 191 种不同类型的未定义行为(链接)这一事实表明草率的标准。

许多程序员很容易接受这样的说法:“未定义的行为”允许做任何事情,包括格式化硬盘。我认为标准将整数溢出归入与在数组边界之外写入相同的危险类别是一个问题。

为什么这两种“未定义行为”不同?因为:

  • 许多程序依赖于良性整数溢出,但很少有程序依赖于在不知道数组边界之外写入内容的情况。

  • 实际上可以做一些与格式化硬盘一样糟糕的事情(至少在像DOS这样不受保护的操作系统中),并且大多数程序员都知道这是危险的。

  • 将整数溢出放入危险的“一切都会发生”类别时,它允许编译器执行任何操作,包括谎报其正在执行的操作(在溢出检查被优化掉的情况下)

  • 诸如写入数组边界之外的错误可以用调试器找到,但是优化掉的错误溢出检查不能,因为调试时优化通常处于关闭状态。

  • gcc 编译器显然不会在整数溢出的情况下采取“一切皆有可能”的策略。在许多情况下,它会避免优化循环,除非它可以验证不可能发生溢出。出于某种原因,gcc 人员已经认识到,如果他们遵循“一切皆有可能”的政策,我们将会遇到太多错误,但他们对优化溢出检查的问题却持不同的态度。

也许这里不适合讨论此类哲学问题。至少,这里的大多数答案都偏离了重点。有没有更好的地方来讨论这个问题?

No, this is not a duplicate of How to detect integer overflow?. The issue is the same but the question is different.


The gcc compiler can optimize away an overflow check (with -O2), for example:

int a, b;
b = abs(a);                     // will overflow if a = 0x80000000
if (b < 0) printf("overflow");  // optimized away

The gcc people argue that this is not a bug. Overflow is undefined behavior, according to the C standard, which allows the compiler to do anything. Apparently, anything includes assuming that overflow never happens. Unfortunately, this allows the compiler to optimize away the overflow check.

The safe way to check for overflow is described in a recent CERT paper. This paper recommends doing something like this before adding two integers:

if ( ((si1^si2) | (((si1^(~(si1^si2) & INT_MIN)) + si2)^si2)) >= 0) { 
  /* handle error condition */
} else {
  sum = si1 + si2;
}

Apparently, you have to do something like this before every +, -, *, / and other operations in a series of calculations when you want to be sure that the result is valid. For example if you want to make sure an array index is not out of bounds. This is so cumbersome that practically nobody is doing it. At least I have never seen a C/C++ program that does this systematically.

Now, this is a fundamental problem:

  • Checking an array index before accessing the array is useful, but not reliable.

  • Checking every operation in the series of calculations with the CERT method is reliable but not useful.

  • Conclusion: There is no useful and reliable way of checking for overflow in C/C++!

I refuse to believe that this was intended when the standard was written.

I know that there are certain command line options that can fix the problem, but this doesn't alter the fact that we have a fundamental problem with the standard or the current interpretation of it.

Now my question is:
Are the gcc people taking the interpretation of "undefined behavior" too far when it allows them to optimize away an overflow check, or is the C/C++ standard broken?

Added note:
Sorry, you may have misunderstood my question. I am not asking how to work around the problem - that has already been answered elsewhere. I am asking a more fundamental question about the C standard. If there is no useful and reliable way of checking for overflow then the language itself is dubious. For example, if I make a safe array class with bounds checking then I should be safe, but I'm not if the bounds checking can be optimized away.

If the standard allows this to happen then either the standard needs revision or the interpretation of the standard needs revision.

Added note 2:
People here seem unwilling to discuss the dubious concept of "undefined behavior". The fact that the C99 standard lists 191 different kinds of undefined behavior (link) is an indication of a sloppy standard.

Many programmers readily accept the statement that "undefined behavior" gives the license to do anything, including formatting your hard disk. I think it is a problem that the standard puts integer overflow into the same dangerous category as writing outside array bounds.

Why are these two kinds of "undefined behavior" different? Because:

  • Many programs rely on integer overflow being benign, but few programs rely on writing outside array bounds when you don't know what is there.

  • Writing outside array bounds actually can do something as bad as formatting your hard disk (at least in an unprotected OS like DOS), and most programmers know that this is dangerous.

  • When you put integer overflow into the dangerous "anything goes" category, it allows the compiler to do anything, including lying about what it is doing (in the case where an overflow check is optimized away)

  • An error such as writing outside array bounds can be found with a debugger, but the error of optimizing away an overflow check cannot, because optimization is usually off when debugging.

  • The gcc compiler evidently refrains from the "anything goes" policy in case of integer overflow. There are many cases where it refrains from optimizing e.g. a loop unless it can verify that overflow is impossible. For some reason, the gcc people have recognized that we would have too many errors if they followed the "anything goes" policy here, but they have a different attitude to the problem of optimizing away an overflow check.

Maybe this is not the right place to discuss such philosophical questions. At least, most answers here are off the point. Is there a better place to discuss this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

枯叶蝶 2024-12-03 12:36:00

gcc 开发人员在这里完全正确。当标准说行为未定义时,这恰恰意味着对编译器没有要求。

由于有效的程序不能执行任何导致 UB 的操作(否则它将不再有效),因此编译器可以很好地假设 UB 不会发生。如果仍然如此,那么编译器所做的任何事情都没有问题。

对于溢出问题,一种解决方案是考虑计算应该处理的范围。例如,在平衡我的银行账户时,我可以假设金额远低于 10 亿,因此 32 位 int 就可以了。

对于您的应用程序域,您可能可以对可能发生溢出的具体位置进行类似的估计。然后,您可以在这些点添加检查或选择其他数据类型(如果可用)。

The gcc developers are entirely correct here. When the standard says that the behavior is undefined that means exactly that there are no requirements on the compiler.

As a valid program can not do anything that causes UB (as then it would not be valid anymore), the compiler can very well assume that UB doesn't happen. And if it still does, anything the compiler does would be ok.

For your problem with overflow, one solution is to consider what ranges the caclulations are supposed to handle. For example, when balancing my bank account I can assume that the amounts would be well below 1 billion, so a 32-bit int will work.

For your application domain you can probably do similar estimates about exactly where an overflow could be possible. Then you can add checks at those points or choose another data type, if available.

木有鱼丸 2024-12-03 12:36:00
int a, b;
b = abs(a); // will overflow if a = 0x80000000
if (b < 0) printf("overflow");  // optimized away 

(你似乎假设 2s 补码......让我们来运行)

如果 a 具有该二进制模式,谁说 abs(a) “溢出”(更准确地说,如果aINT_MIN)? abs(int) 的 Linux 手册页说:

未定义尝试取最大负整数的绝对值。

未定义并不一定意味着溢出。

因此,您的假设 b 可能小于 0,并且这在某种程度上是对“溢出”的测试,从一开始就存在根本缺陷。如果您想测试,则不能对可能具有未定义行为的结果执行此操作 - 相反,请在操作之前执行此操作!

如果您关心这一点,您可以使用 C++ 的用户定义类型(即类)来围绕您需要的操作实现您自己的一组测试(或查找已经执行此操作的库)。该语言不需要对此的内置支持,因为它可以在这样的库中同样有效地实现,而使用的结果语义不变。这是 C++ 的伟大之处之一。

int a, b;
b = abs(a); // will overflow if a = 0x80000000
if (b < 0) printf("overflow");  // optimized away 

(You seem to be assuming 2s complement... let's run with that)

Who says abs(a) "overflows" if a has that binary pattern (more accurately, if a is INT_MIN)? The Linux man page for abs(int) says:

Trying to take the absolute value of the most negative integer is not defined.

Not defined doesn't necessarily mean overflow.

So, your premise that b could ever be less than 0, and that's somehow a test for "overflow", is fundamentally flawed from the start. If you want to test, you can not do it on the result that may have undefined behaviour - do it before the operation instead!

If you care about this, you can use C++'s user-defined types (i.e. classes) to implement your own set of tests around the operations you need (or find a library that already does that). The language does not need inbuilt support for this as it can be implemented equally efficiently in such a library, with the resulting semantics of use unchanged. That's fundamental power is one of the great things about C++.

青丝拂面 2024-12-03 12:36:00

问问自己:您实际上需要多久检查一次算术?如果您经常需要它,您应该编写一个 checked_int 类来重载公共运算符并将检查封装到此类中。在开源网站上分享实现的道具。

更好的是(可以说),使用 big_integer 类,这样从一开始就不会发生溢出。

Ask yourself: how often do you actually need checked arithmetic? If you need it often you should write a checked_int class that overloads the common operators and encapsulate the checks into this class. Props for sharing the implementation on an Open Source website.

Better yet (arguably), use a big_integer class so that overflows can’t happen in the first place.

平安喜乐 2024-12-03 12:36:00

只需为 b 使用正确的类型即可:

int a;
unsigned b = a;
if (b == (unsigned)INT_MIN) printf("overflow");  // never optimized away
else b = abs(a);

编辑: 使用无符号类型可以安全地完成 C 中的溢出测试。无符号类型只是围绕算术,有符号类型会安全地转换为它们。所以你可以对它们进行任何你喜欢的测试。在现代处理器上,这种转换通常只是对寄存器等的重新解释,因此不需要运行时成本。

Just use the correct type for b:

int a;
unsigned b = a;
if (b == (unsigned)INT_MIN) printf("overflow");  // never optimized away
else b = abs(a);

Edit: Test for overflow in C can be safely done with the unsigned type. Unsigned types just wrap around on arithmetic and signed types are safely converted to them. So you can do any test on them that you like. On modern processors this conversion is usually just a reinterpretation of a register or so, so it comes for no runtime cost.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文