为什么默认情况下语言不会引发整数溢出错误?
在多种现代编程语言(包括 C++、Java 和 C#)中,该语言允许整数溢出在运行时发生,不会引发任何类型的错误条件。
例如,考虑这个(人为的)C# 方法,它没有考虑上溢/下溢的可能性。 (为简洁起见,该方法也不处理指定列表为空引用的情况。)
//Returns the sum of the values in the specified list.
private static int sumList(List<int> list)
{
int sum = 0;
foreach (int listItem in list)
{
sum += listItem;
}
return sum;
}
如果按如下方式调用此方法:
List<int> list = new List<int>();
list.Add(2000000000);
list.Add(2000000000);
int sum = sumList(list);
sumList()
方法中将发生溢出(因为C# 中的 int
类型是 32 位有符号整数,列表中的值之和超过了最大 32 位有符号整数的值)。 sum 变量的值为 -294967296(而不是 4000000000); 这很可能不是 sumList 方法的(假设的)开发人员的意图。
显然,开发人员可以使用各种技术来避免整数溢出的可能性,例如使用像 Java 的 BigInteger
,或 已检查
关键字和/checked
C# 中的编译器开关。
然而,我感兴趣的问题是为什么这些语言被设计为默认情况下允许整数溢出首先发生,而不是例如在运行时执行操作时引发异常,这将导致溢出。 如果开发人员在编写执行可能导致溢出的算术运算的代码时忽略考虑溢出的可能性,这种行为似乎有助于避免出现错误。 (这些语言可能包含诸如“未检查”关键字之类的内容,该关键字可以指定一个块,在该块中允许发生整数溢出而不引发异常,在开发人员明确想要该行为的情况下;C# 实际上 确实有这个。)
答案是否简单地归结为性能 - 语言设计者没有不希望各自的语言默认具有“慢”算术整数运算,其中运行时需要做额外的工作来检查每个适用的算术运算是否发生溢出 - 并且这种性能考虑超过了避免“静默”的价值" 如果无意中发生溢出,会出现故障吗?
除了性能考虑之外,这种语言设计决策还有其他原因吗?
In several modern programming languages (including C++, Java, and C#), the language allows integer overflow to occur at runtime without raising any kind of error condition.
For example, consider this (contrived) C# method, which does not account for the possibility of overflow/underflow. (For brevity, the method also doesn't handle the case where the specified list is a null reference.)
//Returns the sum of the values in the specified list.
private static int sumList(List<int> list)
{
int sum = 0;
foreach (int listItem in list)
{
sum += listItem;
}
return sum;
}
If this method is called as follows:
List<int> list = new List<int>();
list.Add(2000000000);
list.Add(2000000000);
int sum = sumList(list);
An overflow will occur in the sumList()
method (because the int
type in C# is a 32-bit signed integer, and the sum of the values in the list exceeds the value of the maximum 32-bit signed integer). The sum variable will have a value of -294967296 (not a value of 4000000000); this most likely is not what the (hypothetical) developer of the sumList method intended.
Obviously, there are various techniques that can be used by developers to avoid the possibility of integer overflow, such as using a type like Java's BigInteger
, or the checked
keyword and /checked
compiler switch in C#.
However, the question that I'm interested in is why these languages were designed to by default allow integer overflows to happen in the first place, instead of, for example, raising an exception when an operation is performed at runtime that would result in an overflow. It seems like such behavior would help avoid bugs in cases where a developer neglects to account for the possibility of overflow when writing code that performs an arithmetic operation that could result in overflow. (These languages could have included something like an "unchecked" keyword that could designate a block where integer overflow is permitted to occur without an exception being raised, in those cases where that behavior is explicitly intended by the developer; C# actually does have this.)
Does the answer simply boil down to performance -- the language designers didn't want their respective languages to default to having "slow" arithmetic integer operations where the runtime would need to do extra work to check whether an overflow occurred, on every applicable arithmetic operation -- and this performance consideration outweighed the value of avoiding "silent" failures in the case that an inadvertent overflow occurs?
Are there other reasons for this language design decision as well, other than performance considerations?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
我认为性能是一个很好的理由。 如果考虑典型程序中增加整数的每条指令,并且如果不是简单的加 1 操作,而是每次都必须检查加 1 是否会溢出类型,那么额外周期的成本将非常严重。
I think performance is a pretty good reason. If you consider every instruction in a typical program that increments an integer, and if instead of the simple op to add 1, it had to check every time if adding 1 would overflow the type, then the cost in extra cycles would be pretty severe.
在 C# 中,这是一个性能问题。 具体来说,就是开箱即用的基准测试。
当 C# 刚刚出现时,微软希望大量 C++ 开发人员转向它。 他们知道许多 C++ 人员认为 C++ 很快,尤其是比在自动内存管理等方面“浪费”时间的语言更快。
潜在的采用者和杂志评论家都可能会获得新 C# 的副本,安装它,构建一个在现实世界中没有人会编写的简单应用程序,在一个紧密的循环中运行它,并测量它需要多长时间。 然后他们会根据该结果为公司做出决定或发表文章。
他们的测试显示 C# 比本地编译的 C++ 慢,这一事实很快就会让人们对 C# 失去兴趣。 您的 C# 应用程序将自动捕获溢出/下溢这一事实是他们可能会错过的事情。 所以,默认情况下它是关闭的。
我认为很明显,99% 的时间我们都希望 /checked 处于打开状态。 这是一个不幸的妥协。
In C#, it was a question of performance. Specifically, out-of-box benchmarking.
When C# was new, Microsoft was hoping a lot of C++ developers would switch to it. They knew that many C++ folks thought of C++ as being fast, especially faster than languages that "wasted" time on automatic memory management and the like.
Both potential adopters and magazine reviewers are likely to get a copy of the new C#, install it, build a trivial app that no one would ever write in the real world, run it in a tight loop, and measure how long it took. Then they'd make a decision for their company or publish an article based on that result.
The fact that their test showed C# to be slower than natively compiled C++ is the kind of thing that would turn people off C# quickly. The fact that your C# app is going to catch overflow/underflow automatically is the kind of thing that they might miss. So, it's off by default.
I think it's obvious that 99% of the time we want /checked to be on. It's an unfortunate compromise.
K&R、Stroustrup、编译器和库的作者以及所有 C、C++、Java、JS 用户的工作保障。 静态分析行业、书籍、很长的课程、咨询。 我很容易错过许多其他人。
Maclisp 在 1970-1971 年左右具有任意精度算术和有理数...为什么在LISP中,没有数量限制?
Job security for K&R, Stroustrup, authors of compilers and libraries and all users of C, C++, Java, JS. Industry of static analysis, books, very long courses, consulting. I can easily miss many others.
Maclisp had arbitrary precision arithmetic and rational numbers around 1970-1971... Why in LISP , there is no limitation for number?
我对为什么在运行时默认情况下不会引发错误的理解可以归结为希望创建具有类似 ACID 行为的编程语言的传统。 具体来说,这一原则是,您编写代码要做(或不编写)的任何事情,它都会做(或不做)。 如果您没有编写一些错误处理程序,那么机器将通过没有错误处理程序“假设”您确实想做您告诉它做的可笑的、容易崩溃的事情。
(ACID 参考:http://en.wikipedia.org/wiki/ACID)
My understanding of why errors would not be raised by default at runtime boils down to the legacy of desiring to create programming languages with ACID-like behavior. Specifically, the tenet that anything that you code it to do (or don't code), it will do (or not do). If you didn't code some error handler, then the machine will "assume" by virtue of no error handler, that you really want to do the ridiculous, crash-prone thing you're telling it to do.
(ACID reference: http://en.wikipedia.org/wiki/ACID)
如果整数溢出被定义为立即引发信号、抛出异常或以其他方式偏离程序执行,则任何可能溢出的计算都需要按照指定的顺序执行。 即使在整数溢出检查不会直接花费任何费用的平台上,将整数溢出捕获在程序执行序列中的正确位置的要求也会严重阻碍许多有用的优化。
如果一种语言要指定整数溢出将设置一个锁存错误标志,则要限制函数内对该标志的操作如何影响调用代码中的值,并规定在以下情况下不需要设置该标志:溢出不会导致错误的输出或行为,那么编译器可以生成比程序员可以使用的任何类型的手动溢出检查更有效的代码。 举一个简单的例子,如果 C 中有一个函数将两个数字相乘并返回结果,并在溢出时设置错误标志,则无论调用者是否会使用结果,编译器都需要执行乘法。 然而,在像我所描述的具有更宽松规则的语言中,如果编译器确定没有任何内容使用乘法结果,则可以推断溢出不会影响程序的输出,并完全跳过乘法。
从实际的角度来看,大多数程序并不关心溢出何时发生,而是需要保证不会因溢出而产生错误结果。 不幸的是,编程语言的整数溢出检测语义还没有跟上让编译器生成高效代码所需的功能。
If integer overflow is defined as immediately raising a signal, throwing an exception, or otherwise deflecting program execution, then any computations which might overflow will need to be performed in the specified sequence. Even on platforms where integer overflow checking wouldn't cost anything directly, the requirement that integer overflow be trapped at exactly the right point in a program's execution sequence would severely impede many useful optimizations.
If a language were to specify that integer overflows would instead set a latching error flag, were to limit how actions on that flag within a function could affect its value within calling code, and were to provide that the flag need not be set in circumstances where an overflow could not result in erroneous output or behavior, then compilers could generate more efficient code than any kind of manual overflow-checking programmers could use. As a simple example, if one had a function in C that would multiply two numbers and return a result, setting an error flag in case of overflow, a compiler would be required to perform the multiplication whether or not the caller would ever use the result. In a language with looser rules like I described, however, a compiler that determined that nothing ever uses the result of the multiply could infer that overflow could not affect a program's output, and skip the multiply altogether.
From a practical standpoint, most programs don't care about precisely when overflows occur, so much as they need to guarantee that they don't produce erroneous results as a consequence of overflow. Unfortunately, programming languages' integer-overflow-detection semantics have not caught up with what would be necessary to let compilers produce efficient code.
向后兼容性是一个很大的问题。 对于 C,假设您对数据类型的大小给予了足够的关注,如果发生上溢/下溢,那就是您想要的。 然后,对于 C++、C# 和 Java,“内置”数据类型的工作方式几乎没有变化。
Backwards compatibility is a big one. With C, it was assumed that you were paying enough attention to the size of your datatypes that if an over/underflow occurred, that that was what you wanted. Then with C++, C# and Java, very little changed with how the "built-in" data types worked.
这可能是 99% 的性能。 在 x86 上,必须检查每个操作的溢出标志,这会对性能造成巨大影响。
另外 1% 将涵盖人们进行奇特的位操作或在混合有符号和无符号操作时“不精确”并想要溢出语义的情况。
It is likely 99% performance. On x86 would have to check the overflow flag on every operation which would be a huge performance hit.
The other 1% would cover those cases where people are doing fancy bit manipulations or being 'imprecise' in mixing signed and unsigned operations and want the overflow semantics.
因为检查溢出需要时间。 通常转换为单个汇编指令的每个原始数学运算都必须包括溢出检查,从而导致多个汇编指令,从而可能导致程序慢几倍。
Because checking for overflow takes time. Each primitive mathematical operation, which normally translates into a single assembly instruction would have to include a check for overflow, resulting in multiple assembly instructions, potentially resulting in a program that is several times slower.
C/C++ 从不强制执行陷阱行为。 即使是明显的除以 0 也是 C++ 中未定义的行为,而不是指定类型的陷阱。
C语言没有任何陷阱的概念,除非你对信号进行计数。
C++ 有一个设计原则,除非您要求,否则它不会引入 C 中不存在的开销。 因此,Stroustrup 不会希望强制整数的行为方式需要任何显式检查。
一些早期的编译器和受限硬件的轻量级实现根本不支持异常,并且通常可以通过编译器选项禁用异常。 强制语言内置例外会产生问题。
即使 C++ 进行了整数检查,早期 99% 的程序员也会为了性能提升而关闭它......
C/C++ never mandate trap behaviour. Even the obvious division by 0 is undefined behaviour in C++, not a specified kind of trap.
The C language doesn't have any concept of trapping, unless you count signals.
C++ has a design principle that it doesn't introduce overhead not present in C unless you ask for it. So Stroustrup would not have wanted to mandate that integers behave in a way which requires any explicit checking.
Some early compilers, and lightweight implementations for restricted hardware, don't support exceptions at all, and exceptions can often be disabled with compiler options. Mandating exceptions for language built-ins would be problematic.
Even if C++ had made integers checked, 99% of programmers in the early days would have turned if off for the performance boost...
您的工作假设是整数溢出始终是不受欢迎的行为。
有时整数溢出是所需的行为。 我见过的一个例子是将绝对航向值表示为固定点数。 给定一个无符号整数,0 表示 0 或 360 度,最大 32 位无符号整数 (0xffffffff) 是略低于 360 度的最大值。
可能还有其他可接受溢出的情况,类似于本示例。
You work under the assumption that integer overflow is always undesired behavior.
Sometimes integer overflow is desired behavior. One example I've seen is representation of an absolute heading value as a fixed point number. Given an unsigned int, 0 is 0 or 360 degrees and the max 32 bit unsigned integer (0xffffffff) is the biggest value just below 360 degrees.
There are probably other situations where overflow is acceptable, similar to this example.