有没有 C++检测所有未定义行为的实现?

发布于 2024-12-02 07:10:15 字数 608 浏览 1 评论 0 原文

C++ 中的大量操作会导致未定义的行为,其中规范对于程序的行为应该是什么完全保持沉默,并允许任何事情发生。因此,在各种情况下,人们的代码可以在调试模式下编译,但不能在发布模式下编译,或者在进行看似不相关的更改之前一直有效,或者在一台机器上工作但在另一台机器上工作,等等。

我的问题是是否有一个实用程序可以查看 C++ 代码的执行情况并标记程序调用未定义行为的所有实例。虽然我们有像 valgrind 和检查的 STL 实现这样的工具很好,但这些工具并不像我想象的那么强大 - 例如,如果你丢弃仍然分配的内存,并且检查的 STL 实现,valgrind 可能会出现误报不会捕获通过基类指针的删除。

这个工具存在吗?或者把它放在身边还有用吗?

编辑:我知道,一般来说,静态检查 C++ 程序是否可能执行具有未定义行为的内容是无法确定的。但是,可以确定 C++ 的特定执行是否产生未定义的行为。实现此目的的一种方法是创建一个 C++ 解释器,根据规范中规定的定义逐步执行代码,在每个点确定代码是否具有未定义的行为。这不会检测特定程序执行中未发生的未定义行为,但会发现程序中实际表现出来的任何未定义行为。这与图灵如何识别来确定 TM 是否接受某些输入有关,即使它通常仍然是不可判定的。

谢谢!

A huge number of operations in C++ result in undefined behavior, where the spec is completely mute about what the program's behavior ought to be and allows for anything to happen. Because of this, there are all sorts of cases where people have code that compiles in debug but not release mode, or that works until a seemingly unrelated change is made, or that works on one machine but not another, etc.

My question is whether there is a utility that looks at the execution of C++ code and flags all instances where the program invokes undefined behavior. While it's nice that we have tools like valgrind and checked STL implementations, these aren't as strong as what I'm thinking about - valgrind can have false negatives if you trash memory that you still have allocated, for example, and checked STL implementations won't catch deleting through a base class pointer.

Does this tool exist? Or would it even be useful to have it lying around at all?

EDIT: I am aware that in general it is undecidable to statically check whether a C++ program may ever execute something that has undefined behavior. However, it is possible to determine whether a specific execution of a C++ produced undefined behavior. One way to do this would be to make a C++ interpreter that steps through the code according to the definitions set out in the spec, at each point determining whether or not the code has undefined behavior. This won't detect undefined behavior that doesn't occur on a particular program execution, but it will find any undefined behavior that actually manifests itself in the program. This is related to how it is Turing-recognizable to determine if a TM accepts some input, even if it's still undecidable in general.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

无声无音无过去 2024-12-09 07:10:15

约翰·雷格尔通过查找死代码来查找未定义的行为错误指出了一个名为STACK 我引用了站点(强调我的):

优化不稳定代码(简称不稳定代码)是一类新兴的软件错误:由于程序中未定义的行为而被编译器优化意外消除的代码。不稳定代码存在于许多软件中。系统,包括Linux内核和Postgres数据库服务器。不稳定代码的后果包括功能不正确到缺少安全检查。

STACK 是一个静态检查器,用于检测 C/C++ 程序中的不稳定代码。将 STACK 应用到广泛使用的系统发现了 160 个新错误,这些错误已被开发人员确认并修复。

同样在C++11中,对于constexpr变量和函数未定义行为应该在编译时捕获

我们还有 gcc ubsan

GCC 最近(版本 4.9)获得了 Undefined Behaviour Sanitizer
(ubsan),C 和 C++ 语言的运行时检查器。为了
使用 ubsan 检查您的程序,编译并链接该程序
-fsanitize=未定义选项。必须执行此类已检测的二进制文件;如果 ubsan 检测到任何问题,它会输出“运行时错误:”
消息,并且在大多数情况下继续执行程序。

Clang 静态分析器,其中包括 对未定义行为进行多次检查。例如 clangs -fsanitize 检查其中包括 -fsanitize=undefined

-fsanitize=undefined:快速且兼容的未定义行为检查器。启用运行时成本较小的未定义行为检查
对地址空间布局或 ABI 没有影响。这包括所有的
下面列出了除无符号整数溢出之外的检查。

对于C,我们可以看看他的文章是时候认真对待利用未定义行为了 其中说:

[..]我承认我个人没有必要的进取心通过最好的可用动态未定义行为检查器:KCC 和 Frama-C 来塞满 GCC 或 LLVM。[...]

这是一个kcc 链接,我引用:

[...]如果您尝试运行一个未定义的程序(或者我们缺少语义的程序),该程序将会卡住。该消息应该告诉您卡在哪里,并可能提示原因。如果您需要帮助破译输出,或者帮助理解程序未定义的原因,请将您的 .kdump 文件发送给我们。[...]

这里有一个 Frama-C 链接,一篇文章 其中第一次使用 Frama-C 作为本文描述了 C 解释器以及附录

John Regehr in Finding Undefined Behavior Bugs by Finding Dead Code points out a tool called STACK and I quote from the site (emphasis mine):

Optimization-unstable code (unstable code for short) is an emerging class of software bugs: code that is unexpectedly eliminated by compiler optimizations due to undefined behavior in the program. Unstable code is present in many systems, including the Linux kernel and the Postgres database server. The consequences of unstable code range from incorrect functionality to missing security checks.

STACK is a static checker that detects unstable code in C/C++ programs. Applying STACK to widely used systems has uncovered 160 new bugs that have been confirmed and fixed by developers.

Also in C++11 for the case of constexpr variables and functions undefined behavior should be caught at compile time.

We also have gcc ubsan:

GCC recently (version 4.9) gained Undefined Behavior Sanitizer
(ubsan), a run-time checker for the C and C++ languages. In order to
check your program with ubsan, compile and link the program with
-fsanitize=undefined option. Such instrumented binaries have to be executed; if ubsan detects any problem, it outputs a “runtime error:”
message, and in most cases continues executing the program.

and Clang Static Analyzer which includes many checks for undefined behavior. For example clangs -fsanitize checks which includes -fsanitize=undefined:

-fsanitize=undefined: Fast and compatible undefined behavior checker. Enables the undefined behavior checks that have small runtime cost and
no impact on address space layout or ABI. This includes all of the
checks listed below other than unsigned-integer-overflow.

and for C we can look at his article It’s Time to Get Serious About Exploiting Undefined Behavior which says:

[..]I confess to not personally having the gumption necessary for cramming GCC or LLVM through the best available dynamic undefined behavior checkers: KCC and Frama-C.[...]

Here is a link to kcc and I quote:

[...]If you try to run a program that is undefined (or one for which we are missing semantics), the program will get stuck. The message should tell you where it got stuck and may give a hint as to why. If you want help deciphering the output, or help understanding why the program is undefined, please send your .kdump file to us.[...]

and here are a link to Frama-C, an article where the first use of Frama-C as a C interpreter is described and an addendum to the article.

人│生佛魔见 2024-12-09 07:10:15

这是一个很好的问题,但让我来解释一下为什么我认为这通常是不可能的(或者至少非常困难)。

据推测,这样的实现几乎是一个 C++ 解释器,或者至少是一个类似于 Lisp 或 Java 的编译器。它需要为每个指针保留额外的数据,以确保您不会在数组之外执行算术或取消引用已释放的内容或其他内容。

现在,考虑以下代码:

int *p = new int;
delete p;
int *q = new int;

if (p == q)
    *p = 17;

*p = 17 是未定义的行为吗?一方面,它在释放后取消引用p。另一方面,取消引用q很好,并且p == q...

但这不是真正的重点。关键是 if 的计算结果是否为 true 完全取决于堆实现的细节,而这些细节可能因实现而异。因此,用一些实际的未定义行为替换 *p = 17 ,您的程序很可能在普通编译器上崩溃,但在假设的“UB 检测器”上运行良好。 (典型的 C++ 实现将使用 LIFO 空闲列表,因此指针很可能相等。假设的“UB 检测器”可能更像垃圾收集语言,以便检测释放后使用问题。

)换句话说,我怀疑仅仅实现定义行为的存在使得编写适用于所有程序的“UB检测器”变得不可能。

也就是说,创建“超级严格的 C++ 编译器”的项目将会非常有趣。如果您想开始,请告诉我。 :-)

This is a great question, but let me give an idea for why I think it might be impossible (or at least very hard) in general.

Presumably, such an implementation would almost be a C++ interpreter, or at least a compiler for something more like Lisp or Java. It would need to keep extra data for each pointer to ensure you did not perform arithmetic outside of an array or dereference something that was already freed or whatever.

Now, consider the following code:

int *p = new int;
delete p;
int *q = new int;

if (p == q)
    *p = 17;

Is the *p = 17 undefined behavior? On the one hand, it dereferences p after it has been freed. On the other hand, dereferencing q is fine and p == q...

But that is not really the point. The point is that whether the if evaluates to true at all depends on the details of the heap implementation, which can vary from implementation to implementation. So replace *p = 17 by some actual undefined behavior, and you have a program that might very well blow up on a normal compiler but run fine on your hypothetical "UB detector". (A typical C++ implementation will use a LIFO free list, so the pointers have a good chance of being equal. A hypothetical "UB detector" might work more like a garbage collected language in order to detect use-after-free problems.)

Put another way, the existence of merely implementation-defined behavior makes it impossible to write a "UB detector" that works for all programs, I suspect.

That said, a project to create an "uber-strict C++ compiler" would be very interesting. Let me know if you want to start one. :-)

岁月静好 2024-12-09 07:10:15

Clang 有一套清理器,可以捕获各种形式的未定义行为。他们的最终目标是能够捕获所有 C++ 核心语言未定义行为,但目前缺少对一些棘手形式的未定义行为的检查。

对于一组不错的清理程序,请尝试:

clang++ -fsanitize=undefined,address

-fsanitize=address 检查是否使用了错误指针(未指向有效内存),并且 -fsanitize=undefined 启用一组轻量级 UB 检查(整数溢出、错误移位、指针未对齐……)。

-fsanitize=memory(用于检测未初始化的内存读取)和 -fsanitize=thread(用于检测数据争用)也很有用,但这些都不能与 组合使用>-fsanitize=address 也不能相互影响,因为这三者都会对程序的地址空间产生侵入性影响。

Clang has a suite of sanitizers that catch various forms of undefined behavior. Their eventual goal is to be able to catch all C++ core language undefined behavior, but checks for a few tricky forms of undefined behavior are missing right now.

For a decent set of sanitizers, try:

clang++ -fsanitize=undefined,address

-fsanitize=address checks for use of bad pointers (not pointing to valid memory), and -fsanitize=undefined enables a set of lightweight UB checks (integer overflow, bad shifts, misaligned pointers, ...).

-fsanitize=memory (for detecting uninitialized memory reads) and -fsanitize=thread (for detecting data races) are also useful, but neither of these can be combined with -fsanitize=address nor with each other because all three have an invasive impact on the program's address space.

魔法唧唧 2024-12-09 07:10:15

使用 g++

-Wall -Werror -pedantic-error

(最好也带有适当的 -std 参数)将拾取


-Wall 让您包含的 UB 事物 的很多情况:

-迂腐
发出严格的 ISO C 和 ISO C++ 要求的所有警告;拒绝
所有使用禁止扩展的程序以及一些其他程序
不遵循 ISO C 和 ISO C++。对于 ISO C,遵循
由使用的任何 -std 选项指定的 ISO C 标准版本。

-Winit-self(仅限 C、C++、Objective-C 和 Objective-C++)
警告未初始化的变量
他们自己。请注意,此选项只能与
-Wuninitialized 选项,该选项仅适用于 -O1 和
如上所述。

-W未初始化
如果在未先使用自动变量的情况下使用自动变量,则会发出警告
初始化或者变量是否可能被“setjmp”调用破坏。

以及您可以使用 printfscanf 系列函数的说明符执行的各种不允许的操作。

Using g++

-Wall -Werror -pedantic-error

(preferably with an appropriate -std argument as well) will pick up quite a few case of U.B.


Things that -Wall gets you include:

-pedantic
Issue all the warnings demanded by strict ISO C and ISO C++; reject
all programs that use forbidden extensions, and some other programs
that do not follow ISO C and ISO C++. For ISO C, follows the
version of the ISO C standard specified by any -std option used.

-Winit-self (C, C++, Objective-C and Objective-C++ only)
Warn about uninitialized variables which are initialized with
themselves. Note this option can only be used with the
-Wuninitialized option, which in turn only works with -O1 and
above.

-Wuninitialized
Warn if an automatic variable is used without first being
initialized or if a variable may be clobbered by a "setjmp" call.

and various disallowed things you can do with specifiers to printf and scanf family functions.

毁虫ゝ 2024-12-09 07:10:15

您可能想了解SAFECode

这是伊利诺伊大学的一个研究项目,目标在首页上注明(上面链接):

SAFECode 项目的目的是在没有垃圾收集的情况下实现程序安全,并在可能的情况下使用静态分析并在必要时使用运行时检查进行最少的运行时检查。 SAFECode 使用本项目中开发的积极的编译器技术定义了具有最小语义限制的代码表示形式,旨在实现安全的静态强制执行。

对我来说真正有趣的是,只要可以证明程序静态正确,就消除运行时检查,例如:

int array[N];
for (i = 0; i != N; ++i) { array[i] = 0; }

不应比常规版本产生更多开销。

据我所知,以一种更轻松的方式,Clang 对未定义的行为也有一些保证,但我无法掌握它。 。

You might want to read about SAFECode.

This is a research project from the University of Illinois, the goal is stated on the front page (linked above):

The purpose of the SAFECode project is to enable program safety without garbage collection and with minimal run-time checks using static analysis when possible and run-time checks when necessary. SAFECode defines a code representation with minimal semantic restrictions designed to enable static enforcement of safety, using aggressive compiler techniques developed in this project.

What is really interesting to me is the elimination of the runtime checks whenever the program can be proved to be correct statically, for example:

int array[N];
for (i = 0; i != N; ++i) { array[i] = 0; }

Should not incur any more overhead than the regular version.

In a lighter fashion, Clang has some guarantees about undefined behavior too as far as I recall, but I cannot get my hands on it...

乖不如嘢 2024-12-09 07:10:15

clang 编译器可以检测一些未定义的行为并发出警告。可能没有您想要的那么完整,但这绝对是一个好的开始。

The clang compiler can detect some undefined behaviors and warn against them. Probably not as complete as you want, but it's definitely a good start.

菩提树下叶撕阳。 2024-12-09 07:10:15

不幸的是我不知道有任何这样的工具。通常,UB 正是这样定义的,因为编译器很难或不可能在所有情况下对其进行诊断。

事实上,你最好的工具可能是编译器警告:它们经常警告 UB 类型项(例如,基类中的非虚拟析构函数、滥用严格别名规则等)。

代码审查还可以帮助发现依赖 UB 的情况。

然后你必须依靠 valgrind 来捕获剩余的情况。

Unfortunately I'm not aware of any such tool. Typically UB is defined as such precisely because it would be hard or impossible for a compiler to diagnose it in all cases.

In fact your best tool is probably compiler warnings: They often warn about UB type items (for example, non-virtual destructor in base classes, abusing the strict-aliasing rules, etc).

Code review can also help catch cases where UB is relied upon.

Then you have to rely on valgrind to capture the remaining cases.

情仇皆在手 2024-12-09 07:10:15

作为一个侧面观察,根据可计算性理论,你不可能有一个程序可以检测所有可能的未定义行为。

您只能拥有使用启发式方法并检测遵循特定模式的某些特定情况的工具。或者,在某些情况下,您可以证明程序的行为符合您的要求。但一般情况下您无法检测到未定义的行为。

编辑

如果程序未在给定输入上终止(挂起、永远循环),则其输出未定义。

如果你同意这个定义,那么判断一个程序是否终止就是众所周知的“Halting Problem”,它已经被证明是不可判定的,即不存在程序(图灵机、C程序、C++程序、Pascal程序、任何语言)都可以解决这个问题。

简单地说:不存在任何程序 P 可以将任何程序 Q 和输入数据 I 作为输入,并且如果 Q(I) 终止则打印为输出 TRUE,否则如果 Q(I) 未终止则打印为 FALSE。

有关详细信息,您可以查看http://en.wikipedia.org/wiki/Halting_problem

Just as a side observation, according to the theory of computability, you cannot have a program that detects all possible undefined behaviours.

You can only have tools that use heuristics and detect some particular cases that follow certain patterns. Or you can in certain cases prove that a program behaves as you want. But you cannot detect undefined behaviour in general.

Edit

If a program does not terminate (hangs, loops forever) on a given input, then its output is undefined.

If you agree on this definition, then determining whether a program terminates is the well-known "Halting Problem", which has been proven to be undecidable, i.e. there exists no program (Turing Machine, C program, C++ program, Pascal program, in whatever language) that can solve this problem in general.

Simply put: there exists no program P that can take as input any program Q and input data I and print as output TRUE if Q(I) terminates, or else print FALSE if Q(I) does not terminate.

For more information you can look at http://en.wikipedia.org/wiki/Halting_problem.

扭转时空 2024-12-09 07:10:15

未定义的行为是未定义。您能做的最好的事情就是迂腐地遵守标准,正如其他人所建议的那样,但是,您无法测试未定义的内容,因为您不知道它是什么。如果你知道它是什么并且标准指定了它,那么它就不会是未定义的。

但是,如果您出于某种原因,确实依赖于标准所说的未定义,并且它产生了特定的结果,那么您可以选择定义它,并编写一些单元测试来确认对于您的特定构建,它是定义的。然而,最好尽可能避免未定义的行为。

Undefined behaviour is undefined. The best you can do is conform to the standard pedantically, as others have suggested, however, you can not test for what is undefined, because you don't know what it is. If you knew what it was and standards specified it, it would not be undefined.

However, if you for some reason, do actually rely on what the standard says is undefined, and it results in a particular result, then you may choose to define it, and write some unit tests to confirm that for your particular build, it is defined. It is much better, however, to simply avoid undefined behaviour whenever possible.

孤独陪着我 2024-12-09 07:10:15

看看PCLint,它在检测 C++ 中的许多坏东西方面相当不错。

这里是它捕获的内容的子集

Take a look at PCLint its pretty decent at detecting a lot of bad things in C++.

Here's a subset of what it catches

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文