为什么某些语言没有实现边界检查?
根据维基百科(http://en.wikipedia.org/wiki/Buffer_overflow)
通常与缓冲区溢出相关的编程语言包括 C 和 C++,它们没有提供内置保护来防止访问或覆盖内存任何部分中的数据,并且不会自动检查写入数组的数据(内置缓冲区类型) 在该数组的边界内。边界检查可以防止缓冲区溢出。
那么,为什么 C 和 C++ 等某些语言没有实现“边界检查”呢?
According to the Wikipedia (http://en.wikipedia.org/wiki/Buffer_overflow)
Programming languages commonly associated with buffer overflows include C and C++, which provide no built-in protection against accessing or overwriting data in any part of memory and do not automatically check that data written to an array (the built-in buffer type) is within the boundaries of that array. Bounds checking can prevent buffer overflows.
So, why are 'Bounds Checking' not implemented in some of the languages like C and C++?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
基本上,这是因为这意味着每次更改索引时,都必须执行 if 语句。
让我们考虑一个简单的 C for 循环:
如果我们有边界检查,则
ary[ix]
生成的代码必须类似于如果我们没有该边界检查,那么我们可以改为编写:
这可以在循环中节省 3-4 条指令,这(尤其是在过去)意义重大。
事实上,在PDP-11机器中,它甚至更好,因为有一种叫做“自动增量寻址”的东西。在 PDP 上,所有寄存器的东西等都变成了类似的东西
(任何碰巧比我更记得 PDP 的人,不要给我带来关于精确语法等的麻烦;你是一个像我一样的老家伙,你知道这些东西是如何消失的。)
Basically, it's because it means every time you change an index, you have to do an if statement.
Let's consider a simple C for loop:
if we have bounds checking, the generated code for
ary[ix]
has to be something likeIf we don't have that bounds check, then we can write instead:
This saves 3-4 instructions in the loop, which (especially in the old days) meant a lot.
In fact, in the PDP-11 machines, it was even better, because there was something called "auto-increment addressing". On a PDP, all of the register stuff etc turned into something like
(And anyone who happens to remember the PDP better than I do, don't give me trouble about the precise syntax etc; you're an old fart like me, you know how these things slip away.)
它更容易实现,并且编译和运行时速度更快。它还简化了语言定义(如果跳过的话,可以省略很多东西)。
目前,当你这样做时:
C(和C++)只是说,“好吧,我会把一些东西放在内存中的那个位置”。
如果需要进行边界检查,C 必须说:“好吧,首先让我们看看是否可以在那里放一些东西?它已经分配了吗?是的?很好。我现在就插入。”通过跳过测试来查看是否有可以写入的内容,您可以节省一个非常昂贵的步骤。另一方面,(她戴着手套),我们现在生活在一个“优化是为那些买不起 RAM 的人”的时代,因此关于速度的争论变得越来越弱。
It is easier to implement and faster both to compile and at run-time. It also simplifies the language definition (as quite a few things can be left out if this is skipped).
Currently, when you do:
C (and C++) just says, "Okey dokey! I'll put something in that spot in memory".
If bounds checking were required, C would have to say, "Ok, first let's see if I can put something there? Has it been allocated? Yes? Good. I'll insert now." By skipping the test to see whether there is something which can be written there, you are saving a very costly step. On the other hand, (she wore a glove), we now live in an era where "optimization is for those who cannot afford RAM," so the arguments about the speed are getting much weaker.
一切都与性能有关。然而,C 和 C++ 没有边界检查的说法并不完全正确。每个库都有“调试”和“优化”版本是很常见的,并且在各种库的调试版本中启用边界检查的情况也并不罕见。
这样做的优点是可以在开发应用程序时快速、轻松地发现越界错误,同时消除运行 realz 程序时的性能影响。
我还应该补充一点,性能损失是不可忽略的,C++ 之外的许多语言将提供各种在缓冲区上操作的高级函数,这些函数直接在 C 和 C++ 中实现,专门用于避免边界检查。例如,在 Java 中,如果比较使用纯 Java 与使用 System.arrayCopy(执行一次边界检查,但随后直接复制数组而不对每个单独元素进行边界检查)将一个数组复制到另一个数组的速度,您会发现这两个操作的性能存在相当大的差异。
It's all about the performance. However, the assertion that C and C++ have no bounds checking is not entirely correct. It is quite common to have "debug" and "optimized" versions of each library, and it is not uncommon to find bounds-checking enabled in the debugging versions of various libraries.
This has the advantage of quickly and painlessly finding out-of-bounds errors when developing the application, while at the same time eliminating the performance hit when running the program for realz.
I should also add that the performance hit is non-negigible, and many languages other than C++ will provide various high-level functions operating on buffers that are implemented directly in C and C++ specifically to avoid the bounds checking. For example, in Java, if you compare the speed of copying one array into another using pure Java vs. using System.arrayCopy (which does bounds checking once, but then straight-up copies the array without bounds-checking each individual element), you will see a decently large difference in the performance of those two operations.
主要原因是向 C 或 C++ 添加边界检查的性能开销。虽然这种开销可以通过最先进的技术大幅减少(根据应用程序,可以减少 20-100% 的开销),但它仍然大得足以让许多人犹豫不决。我不确定这种反应是否合理——我有时怀疑人们过于关注绩效,仅仅因为绩效是可以量化和衡量的——但无论如何,这是生活的事实。这一事实降低了主要编译器投入精力将边界检查的最新工作集成到其编译器中的动力。
第二个原因涉及边界检查可能会破坏您的应用程序的担忧。特别是如果您使用违反标准的指针算术和强制转换进行了一些时髦的事情,则边界检查可能会阻止您的应用程序当前正在执行的操作。大型应用程序有时会做出令人惊讶的粗俗和丑陋的事情。如果编译器破坏了应用程序,那么将问题归咎于糟糕的代码是没有意义的;人们不会继续使用破坏他们的应用程序的编译器。
另一个主要原因是边界检查与 ASLR + DEP。 ASLR + DEP 被认为解决了,哦,80% 左右的问题。这减少了对全面边界检查的感知需求。
The primary reason is the performance overhead of adding bounds checking to C or C++. While this overhead can be reduced substantially with state-of-the-art techniques (to 20-100% overhead, depending upon the application), it is still large enough to make many folks hesitate. I'm not sure whether that reaction is rational -- I sometimes suspect that people focus too much on performance, simply because performance is quantifiable and measurable -- but regardless, it is a fact of life. This fact reduces the incentive for major compilers to put effort into integrating the latest work on bounds checking into their compilers.
A secondary reason involves concerns that bounds checking might break your app. Particularly if you do funky stuff with pointer arithmetic and casting that violate the standard, bounds checking might block something your application is currently doing. Large applications sometimes do amazingly crufty and ugly things. If the compiler breaks the application, then there's no point in pointing blaming the crufty code for the problem; people aren't going to keep using a compiler that breaks their application.
Another major reason is that bounds checking competes with ASLR + DEP. ASLR + DEP are perceived as solving, oh, 80% of the problem or so. That reduces the perceived need for full-fledged bounds checking.
因为它会削弱那些满足 HPC 要求的通用语言。在许多应用程序中,缓冲区溢出实际上无关紧要,因为它们不会发生。这些功能在库中效果更好(实际上您已经可以在库中找到 C/C++ 的示例)。
对于特定于领域的语言,将这些功能融入到语言定义中并以由此产生的性能损失换取更高的安全性可能是有意义的。
Because it would cripple those general purpose languages for HPC requirements. There are plenty of applications where buffer overflows really do not matter one iota, simply because they do not happen. Such features are much better off in a library (where in fact you can already find examples for C/C++).
For domain specific languages it may make sense to bake such features into the language definition and trade the resulting performance hit for increased security.