未定义的行为值得吗?
由于未定义的行为,许多糟糕的事情发生并继续发生(或者不发生,谁知道,任何事情都可能发生)。据我所知,引入这一点是为了给编译器优化留下一些回旋余地,也许也是为了使 C++ 更容易移植到不同的平台和体系结构。然而,由未定义行为引起的问题似乎太大,无法通过这些论点来证明其合理性。未定义行为的其他论点是什么?如果没有,为什么未定义的行为仍然存在?
编辑 为我的问题添加一些动机:由于与不太熟练的 C++ 同事的几次不好的经历,我已经习惯了让我的代码尽可能安全。断言每一个论点、严格的常量正确性等等。我尽量不留有可能以错误方式使用我的代码的空间,因为经验表明,如果存在漏洞,人们就会使用它们,然后他们会打电话给我,说我的代码很糟糕。我认为让我的代码尽可能安全是一个很好的做法。这就是为什么我不明白为什么存在未定义的行为。有人可以给我一个在运行时或编译时无法检测到的未定义行为的示例吗?
Many bad things happened and continue to happen (or not, who knows, anything can happen) due to undefined behavior. I understand that this was introduced to leave some wiggle-room for compilers to optimize, and maybe also to make C++ easier to port to different platforms and architectures. However the problems caused by undefined behavior seem to be too large to be justified by these arguments. What are other arguments for undefined behavior? If there are none, why does undefined behavior still exist?
Edit To add some motivation for my question: Due to several bad experiences with less C++-crafty co-workers I have gotten used to making my code as safe as possible. Assert every argument, rigorous const-correctness and stuff like that. I try to leave as little room has possible to use my code the wrong way, because experience shows that, if there are loopholes, people will use them, and then they will call me about my code being bad. I consider making my code as safe as possible a good practice. This is why I do not understand why undefined behavior exists. Can someone please give me an example of undefined behavior that cannot be detected at runtime or compile time without considerable overhead?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
我认为关注的核心来自于 C/C++ 速度至上的哲学。
这些语言是在原始能力匮乏的时候创建的,你需要尽可能地进行所有优化才能获得可用的东西。
指定如何处理 UB 意味着首先检测它,然后当然指定正确的处理方法。然而检测它违背了语言的速度第一哲学!
今天,我们还需要快速程序吗?是的,对于我们这些资源非常有限(嵌入式系统)或限制非常严格(响应时间或每秒事务数)的人来说,我们确实需要尽可能多地挤出资源。
我知道这句座右铭:用更多的硬件来解决问题。我工作的地方有一个应用程序:
它运行在大约 40 个怪物上:8 个双核 opteron (2800MHz) 和 32GB RAM。此时,使用更多的硬件很难变得“更快”,因此我们需要优化的代码,以及允许它的语言(我们确实限制在其中添加汇编代码)。
我必须说,无论如何,我不太关心UB。如果您的程序调用了 UB,那么它需要修复实际发生的任何行为。当然,如果立即报告,修复它们会更容易:这就是调试版本的用途。
因此,也许我们不应该关注 UB,而是应该学习使用这种语言:
一切都突然变得更好:)
I think the heart of the concern comes from the C/C++ philosophy of speed above all.
These languages were created at a time when raw power was sparse and you needed to get all the optimizations you could just to have something usable.
Specifying how to deal with UB would mean detecting it in the first place and then of course specifying the handling proper. However detecting it is against the speed first philosophy of the languages!
Today, do we still need fast programs ? Yes, for those of us working either with very limited resources (embedded systems) or with very harsh constraints (on response time or transactions per second), we do need to squeeze out as much as we can.
I know the motto throw more hardware at the problem. We have an application where I work:
It runs on about 40 monsters: 8 dual core opteron (2800MHz) with 32GB of RAM. It gets difficult to be "faster" with more hardware at this point, so we need optimized code, and a language that allows it (we did restrain to throw assembly code in there).
I must say that I don't care much for UB anyway. If you get to the point that your program invokes UB then it needs fixing whatever the behavior that actually occurred. Of course it would be easier to fix them if it was reported straight away: that's what debug builds are for.
So perhaps that instead of focusing on UB we should learn to use the language:
And everything is suddenly better :)
我对未定义行为的看法是:
标准定义了如何使用语言,以及以正确的方式使用时实现应该如何反应。然而,涵盖每个功能的所有可能用途将需要大量工作,因此标准仅此而已。
然而,在编译器实现中,你不能只是“就这样”,代码必须转换成机器指令,而且不能只留下空白。在许多情况下,编译器可能会抛出错误,但这并不总是可行:在某些情况下,需要额外的工作来检查程序员是否做错了事情(例如:调用析构函数两次——以检测到这一点) ,编译器必须计算某些函数被调用的次数,或者添加额外的状态,等等)。因此,如果标准没有定义它,而编译器只是让它发生,那么有时会发生有趣的事情,如果你不走运的话。
My take on undefined behavior is this:
The standard defines how the language is to be used, and how the implementation is supposed to react when used in the correct manner. However, it would be a lot of work to cover every possible use of every feature, so the standard just leaves it at that.
However, in a compiler implementation, you can't just "leave it at that," the code has to be turned into machine instructions, and you can't just leave blank spots. In many cases, the compiler can throw an error, but that's not always feasible: There are some instances where it would take extra work to check whether the programmer is doing the wrong thing (for instance: calling a destructor twice -- to detect this, the compiler would have to count how many times certain functions have been called, or add extra state, or something). So if the standard doesn't define it, and the compiler just lets it happen, witty things can sometimes happen, maybe, if you're unlucky.
这些问题不是由未定义的行为引起的,而是由编写导致该行为的代码引起的。答案很简单 - 不要编写那种代码 - 不这样做并不完全是火箭科学。
至于:
一个现实世界的问题:
在编译时检测到这一点是不可能的。在运行时,这只是极其困难的,并且需要内存分配系统比我们简单地说第二个删除未定义的情况进行更多的簿记(即更慢并占用更多内存)。如果您不喜欢这个,也许 C++ 不适合您 - 为什么不切换到 java?
The problems are not caused by undefined behaviour, they are caused by writing the code that leads to it. The answer is simple - don't write that kind of code - not doing so is not exactly rocket science.
As for:
A real world issue:
Detecting this at compile time is imposisible. at run-time it is merely extremely difficult and would require the memory allocation system to do far more book-keeping (i.e. be slower and take up more memory) than is the case ifwe simply say the second delete is undefined. If you don't like this, perhaps C++ is not the language for you - why not switch to java?
未定义行为的主要来源是指针,这就是 C 和 C++ 有大量未定义行为的原因。
考虑这段代码:
这段代码看起来很糟糕,但是它应该发出错误吗?如果该地址确实可读,即它是我以某种方式获得的值(可能是设备地址等)怎么办?
在这种情况下,无法知道该操作是否合法,如果不合法,其行为确实是不可预测的。
除此之外:一般来说,C++ 的设计考虑了“零开销规则”(请参阅 C++ 的设计和演进),因此它不可能对检查极端情况等的实现施加任何负担。您应该始终记住,这种语言的设计和确实不仅在桌面上使用,而且在桌面上使用。在资源有限的嵌入式系统中也是如此。
The main source of undefined behaviour are pointers, and that's why C and C++ have a lot of undefined behaviour.
Consider this code:
This code looks very bad, but should it issue an error? What if that address is indeed readable i.e. it's a value I obtained somehow (maybe a device address, etc.)?
In cases like this, there's no way to know if the operation is legal or not, and if it isn't, it's behaviour is indeed unpredictable.
Apart from this: in general C++ was designed with "The zero overhead rule" in mind (see The Design and Evolution of C++), so it couldn't possibly impose any burden on implementations to check for corner cases etc. You should always keep in mind that this language was designed and is indeed used not only on the desktop but also in embedded systems with limited resources.
许多被定义为未定义行为的事情将很难通过编译器或运行时环境进行诊断,即使不是不可能。
那些简单的行为已经变成了定义-未定义的行为。考虑调用纯虚方法:这是未定义的行为,但大多数编译器/运行时环境都会提供相同术语的错误:调用了纯虚方法。事实上的标准是,在我所知道的所有环境中,调用纯虚方法调用都是运行时错误。
Many things that are defined as undefined behavior would be hard if not impossible to diagnose by the compiler or runtime environment.
The ones that are easy have already turned into defined-undefined behavior. Consider calling a pure virtual method: it is undefined behavior, but most compilers/runtime environments will provide an error in the same terms: pure virtual method called. The defacto standard is that calling a pure virtual method call is a runtime error in all environments I know of.
该标准未定义“某些”行为,以便允许多种实现,而不会给这些实现带来检测“某些”情况的开销,也不会给程序员带来防止首先出现这些情况所需的约束。
曾经有一段时间,对于大量项目来说,避免这种开销是 C 和 C++ 的主要优势。
现在的计算机比发明 C 语言时快了数千倍,并且诸如始终检查数组边界或使用几兆字节的代码来实现沙盒运行时之类的开销对于大多数项目。此外,由于我们的程序每秒处理许多兆字节的潜在恶意数据,(例如)缓冲区溢出的成本增加了几个因素。
因此,令人有些沮丧的是,没有一种语言具有 C++ 的所有有用功能,并且还具有定义每个编译程序的行为(受特定于实现的行为影响)的属性。但只是有一点——在 Java 中编写行为如此令人困惑的代码实际上并不那么困难,以至于从调试的角度来看,如果不是安全的话,它也可能是未定义的。编写不安全的 Java 代码也并不困难 - 只是不安全通常仅限于泄露敏感信息或授予对应用程序不正确的权限,而不是放弃对运行 JVM 的操作系统进程的完全控制
。我认为优秀的软件工程需要所有语言的纪律,区别在于当我们的纪律失败时会发生什么,以及我们对其他语言(在性能和占用空间以及您喜欢的 C++ 功能方面)收取多少费用以防止这种情况发生。如果其他语言提供的保险对于您的项目来说是值得的,那就接受吧。如果 C++ 提供的功能值得冒着未定义行为的风险而付出代价,那么就选择 C++。我不认为试图争论 C++ 的好处是否“证明”其成本是合理的,就好像它是一个对每个人都一样的全球财产一样。它们在 C++ 语言设计的职权范围内是合理的,即您不必为不使用的东西付费。因此,正确的程序不应该变慢,以便不正确的程序得到有用的错误消息而不是 UB,并且大多数情况下都会在异常情况下表现(例如 32- 的
<<32
)如果需要在委员会希望“有效”支持 C++ 的硬件上显式检查异常情况,则不应定义(例如,结果为 0)。再看另一个例子:我认为英特尔专业 C 和 C++ 编译器的性能优势不足以证明购买它的成本是合理的。因此,我没有购买它。并不意味着其他人会做出与我相同的计算,也不意味着我将来也会始终进行相同的计算。
The standard leaves "certain" behaviour undefined in order to allow a variety of implementations, without burdening those implementations with the overhead of detecting "certain" situations, or burdening the programmer with constraints required to prevent those situations arising in the first place.
There was a time when avoiding this overhead was a major advantage of C and C++ for a huge range of projects.
Computers are now several thousand times faster than they were when C was invented, and the overheads of things like checking array bounds all the time, or having a few megabytes of code to implement a sandboxed runtime, don't seem like a big deal for most projects. Furthermore, the cost of (e.g.) overrunning a buffer has increased by several factors, now that our programs handle many megabytes of potentially-malicious data per second.
It is therefore somewhat frustrating that there is no language which has all of C++'s useful features, and which in addition has the property that the behaviour of every program which compiles is defined (subject to implementation-specific behaviour). But only somewhat - it's not actually all that difficult in Java to write code whose behaviour is so confusing that from the POV of debugging, if not security, it might as well be undefined. It's also not at all difficult to write insecure Java code - it's just that the insecurity usually is limited to leaking sensitive information or granting incorrect privileges over the app, rather than giving up complete control of the OS process the JVM is running in.
So the way I see it is that good software engineering requires discipline in all languages, the difference is what happens when our discipline fails, and how much we're charged by other languages (in performance and footprint and C++ features you like) for insurance against that. If the insurance provided by some other language is worth it for your project, take it. If the features provided by C++ are worth paying for with the risk of undefined behaviour, take C++. I don't think there's much mileage in trying to argue, as if it was a global property that's the same for everyone, whether the benefits of C++ "justify" the costs. They're justified within the terms of reference for the design of the C++ language, which are that you don't pay for what you don't use. Hence, correct programs should not be made slower in order that incorrect programs get a useful error message instead of UB, and most of the time behaviour in unusual cases (e.g.
<< 32
of a 32-bit value) should not be defined (e.g. to result in 0) if that would require the unusual case to be checked for explicitly on hardware which the committee wants to support C++ "efficiently".Look at another example: I don't think the performance benefits of Intel's professional C and C++ compiler justify the cost of buying it. Hence, I haven't bought it. Doesn't mean others will make the same calculation I made, or that I will always make the same calculation in future.
编译器和编程语言是我最喜欢的主题之一。过去我做过一些与编译器相关的研究,并且多次发现未定义的行为。
C++和Java非常流行。这并不意味着他们有出色的设计。它们之所以被广泛使用,是因为它们冒着损害设计质量的风险只是为了获得认可。 Java追求垃圾收集、虚拟机和无指针外观。他们是部分先驱者,无法从以前的许多项目中学习。
就 C++ 而言,主要目标之一是为 C 用户提供面向对象的编程。即使是 C 程序也应该使用 C++ 编译器进行编译。这造成了很多令人讨厌的开放点,而 C 已经有很多含糊之处。 C++ 强调的是力量和流行度,而不是完整性。没有多少语言可以提供多重继承,C++ 可以提供多重继承,尽管不是以一种非常完美的方式。未定义的行为将始终存在以支持其荣耀和向后兼容性。
如果你真的想要一种健壮且定义良好的语言,你必须寻找其他地方。遗憾的是,这并不是大多数人主要关心的问题。例如,Ada 是一种很棒的语言,其中清晰定义的行为很重要,但由于其用户基础狭窄,几乎没有人关心该语言。我对这个例子有偏见,因为我真的很喜欢这种语言,我在我的博客上发布了一些内容 但如果您想了解更多有关语言定义如何帮助减少错误的信息,甚至在编译之前,请查看 这些幻灯片
我并不是说 C++ 是一种糟糕的语言!它只是有不同的目标,我喜欢与之合作。您还拥有大型社区、出色的工具以及更多出色的东西,例如 STL、Boost 和 QT。但你的怀疑也是成为一名伟大的C++程序员的根源。如果您想精通 C++,这应该是您关心的问题之一。我鼓励您阅读前面的幻灯片以及这位评论家。当语言没有达到你的预期时,它将对你理解有很大帮助。
顺便说一下。未定义的行为完全违背可移植性。例如,在 Ada 中,您可以控制数据结构的布局(在 C 和 C++ 中,它可以根据机器和编译器进行更改)。线程是语言的一部分。所以移植C和C++软件会给你带来更多的痛苦而不是快乐
Compilers and programming languages are one of my favorite topics. In the past I did some research related with compilers and I have found many many times undefined behavior.
C++ and Java are very popular. It does not mean that they have a great design. They are widely used because they took risks in detriment of their design quality just to gain acceptance. Java went for garbage collection, virtual machine and pointer-free appearance. They were the partly pioneers and could not learn from many previous projects.
In the case of C++ one of the main goals was to give object oriented programming to C users. Even C programs should compile with a C++ compiler. That made a lot of nasty open points and C had already many ambiguities. C++ emphasis was power and popularity, not integrity. Not many languages give you multiple-inheritance, C++ give you that although not in a very polished way. Undefined behavior will always be there to support its glory and backwards compatibility.
If you really want a robust and well defined language you must look somewhere else. Sadly that is not the main concern of most people. Ada for example is a great language where a clear and defined behavior is important, but hardly anyone cares about the language because of its narrow user base. I am biased with the example because I really like that language, I posted something on my blog but if you want to learn more about how a language definition can help to to have less bugs even before you compile have a look at these slides
I am not saying C++ is a bad language! It just have different goals and I love working with it. You also have a large community, great tools, and much more great stuff such as STL, Boost and QT. But your doubt is also the root to become a great C++ programmer. If you want to be great with C++ this should be one of your concerns. I would encourage you to read the previous slides and also this critic. It will help you a lot to understand those times when the language is not doing what you expect.
And by the way. Undefined behavior goes totally against portability. In Ada for example, you have control about the layout of data structures (in C and C++ it can change according machine and compiler). Threads are part of the language. So porting C and C++ software will give you more pain than pleasure
明确未定义行为和实现定义行为之间的差异非常重要。实现定义的行为使编译器编写者有机会向语言添加扩展以利用其平台。为了编写在现实世界中运行的代码,这样的扩展是必要的。
另一方面,UB 存在于在不对语言进行重大更改或与 C 语言进行重大差异的情况下很难或不可能设计解决方案的情况。一个例子取自 BS 讨论此问题的页面是:
范围错误为 UB。这是一个错误,但标准未定义平台应如何精确地处理此问题,因为标准无法定义它。每个平台都不同。它不能被设计为错误,因为这需要在语言中包括自动范围检查,这将需要对语言的功能集进行重大更改。对于语言来说,无论是在编译时还是运行时,
p[100] = 0
错误都更难以生成诊断,因为编译器无法知道p< /code> 确实指向没有运行时支持。
It's important to be clear on the differences between undefined behavior and implementation-defined behavior. Implementation defined behavior gives compiler writers the opportunities to add extensions to the language in order to leverage their platform. Such extensions are necessary in order to write code that works in the real world.
UB on the other hand exists in cases where it is difficult or impossible to engineer a solution without imposing major changes in the language or big differences from C. One example taken from a page where BS talks about this is:
The range error is UB. It is an error, but how precisely the platform should deal with this is undefined by the Standard because the Standard can't define it. Each platform is different. It can't be engineered to an error because this would necessitate including automatic range checking in the language, which would require a major change to the language's feature set. The
p[100] = 0
error is even more difficult for the language to generate a diagnostic for, either at compile- or run-time, because the compiler can't know whatp
really points to without run-time support.几年前我也问过自己同样的问题。当我试图为写入空指针的函数的行为提供正确的定义时,我立即停止考虑它。
并非所有设备都有受保护内存的概念。因此,您不可能依靠系统通过段错误或类似情况来保护您。并非所有设备都具有只读内存,因此您不能说写入根本不执行任何操作。我能想到的唯一其他选择是要求应用程序在没有系统帮助的情况下引发异常[或中止,或其他]。但在这种情况下,编译器必须在每次内存写入之前插入代码以检查 null ,除非它可以保证自列表内存写入以来指针没有更改。这显然是不可接受的。
因此,保留行为未定义是我能做出的唯一合乎逻辑的决定,无需说“兼容的 C++ 编译器只能在具有受保护内存的平台上实现”。
I asked myself that same question a few years ago. I stopped considering it right away, when I tried to provide a proper definition for the behavior of a function that writes to a null pointer.
Not all devices have a concept of protected memory. So you can't possibly rely on the system to protect you via a segfault or similar. Not all devices have read only memory, so you can't possibly say that the write simply does nothing. The only other option I could think of is to require that the application raise an exception [or abort, or something] without help from the system. But in that case, the compiler has to insert code before every single memory write to check for null unless it can guarantee that the pointer has not changed since the list memory write. That is clearly unacceptable.
So, leaving the behavior undefined was the only logical decision I could come to, without saying "Compliant C++ compilers can only be implemented on platforms with protected memory."
这是我最喜欢的:在使用它对非空指针进行
delete
(不仅取消引用,而且还castin等)之后就是UB(请参阅这个问题)。如何遇到 UB:
现在我知道上面的代码在所有架构上都可以正常运行。教导编译器或运行时执行此类情况的分析非常困难且昂贵。不要忘记,有时在删除和使用指针之间可能有数百万行代码。在
删除
后立即将设置指针设置为 null 的成本可能很高,因此它也不是通用的解决方案。这就是为什么有UB的概念。您不希望代码中出现 UB。也许有效也许无效。在这个实现上工作,在另一个实现上中断。
Here's my favourite: after you've done
delete
on a non-null pointer using it (not only dereferencing, but also castin, etc) is UB (see this question).How you can run into UB:
Now on all architectures I know the code above will run fine. Teaching the compiler or runtime to perform analysis of such situations is very hard and expensive. Don't forget that sometimes it might be millions lines of code between
delete
and using the pointer. Settings pointers to null immediately afterdelete
can be costly, so it's not a universal solution as well.That's why there's the concept of UB. You don't want UB in your code. Maybe works maybe not. Works on this implementation, breaks on another.
有时,未定义的行为是好的。以一个大整数为例。
<代码>
规范规定,如果我们最后读取或写入整体,则从部分读取/写入是未定义的。
现在,这对我来说有点愚蠢,因为如果我们无法触及工会的任何其他部分,那么一开始就没有任何意义,对吧?
但无论如何,也许某些函数将采用 __int64,而其他函数则采用两个单独的整数。我们可以使用这个联合,而不是每次都进行转换。我所知道的每个编译器都会以非常清晰的方式处理这种未定义的行为。所以在我看来,未定义的行为在这里并没有那么糟糕。
There are times when undefined behavior is good. Take a big int for example.
The spec says if we last read or wrote to Whole then reading/writing from Parts is undefined.
Now, that's just a tad silly to me because if we couldn't touch any other parts of the union then there is no point in having the union in the first place, right?
But anyway, maybe some functions will take __int64 while other functions take the two separated ints. Rather than convert every time we can just use this union. Every compiler I know treats this undefined behavior in a pretty clear way. So in my opinion undefined behavior isn't so bad here.