静态分析仪应该分析哪些级别?

发布于 2024-10-20 03:14:23 字数 215 浏览 6 评论 0原文

我注意到一些静态分析器对源代码进行操作,而另一些则对字节码进行操作(例如FindBugs)。我确信甚至有一些可以处理目标代码。

我的问题很简单,针对不同级别的分析编写不同类型的静态分析器有何优缺点?

在“静态分析器”下,我包括了 linter、bug 查找器,甚至是成熟的验证器。 通过分析级别,我将包括源代码、高级 IR、低级 IR、字节码、目标代码和可以访问所有阶段的编译器插件。

I've noticed that some static analyzers operate on source code, while others operate on bytecode (e.g., FindBugs). I'm sure there are even some that work on object code.

My question is a simple one, what are the advantages and disadvantages of writing different kinds of static analyzers for different levels of analysis?

Under "static analyzers" I'm including linters, bug finders, and even full-blown verifiers.
And by levels of analysis I would include source code, high-level IRs, low-level IRs, bytecode, object code, and compiler plugins that have access to all phases.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

半步萧音过轻尘 2024-10-27 03:14:23

这些不同的方面可能会影响分析器可能决定工作的级别:

  1. 设计静态分析器是一项大量工作。如果不将多种语言的这项工作分解为相同的字节码,那将是一种耻辱,特别是当字节码保留源程序的大部分结构时:Java (FindBugs)、.NET(各种工具)与代码合同相关)。在某些情况中,尽管编译方案并不遵循,但出于分析目的而编写了通用目标语言 设计静态分析器

  2. 与 1 相关,您可能希望静态分析器的编写成本会稍微低一些,前提是它可以在具有最少数量构造的程序的规范化版本上运行。在编写静态分析器时,当您已经编写了 while do 时,还必须编写 repeat Until 的处理方法,这很麻烦。您可以构建分析器,以便为这两种情况共享多个函数,但处理此问题的无忧无虑的方法是将一个函数翻译为另一个函数,或者将源代码翻译为只有其中一种的中间语言。< /p>

  3. 另一方面,正如 Flash Sheridan 的回答中已经指出的那样,源代码包含最多的信息。例如,在具有模糊语义的语言中,源代码级别的错误可以通过编译来消除。 C 和 C++ 有许多“未定义的行为”,允许编译器执行任何操作,包括生成意外运行的程​​序。好吧,您可能会想,如果错误不在可执行文件中,那么它就不是一个有问题的错误。但是,当您为另一个体系结构或使用下一版本的编译器重新编译程序时,该错误可能会再次出现。这是在任何可能消除错误的阶段之后不进行分析的原因之一。

  4. 某些属性只能在编译代码上以合理的精度进行检查。这包括不存在 Flash Sheridan 再次指出的编译器引入的错误,以及最坏情况执行时间。同样,许多语言也不会让你知道浮点代码到底做了什么,除非你查看编译器生成的汇编(这是因为现有的硬件不方便它们保证更多)。然后,选择是编写一个考虑所有可能性的不精确的源代码级分析器,或者精确分析浮点程序的一个特定编译,只要理解它将执行的是精确的汇编代码.

These different facets can influence the level at which an analyzer may decide to work:

  1. Designing a static analyzer is a lot of work. It would be a shame not to factor this work for several languages compiled to the same bytecode, especially when the bytecode retains most of the structure of the source program: Java (FindBugs), .NET (various tools related to Code Contracts). In some cases, the common target language was made up for the purpose of analysis although the compilation scheme wasn't following this path.

  2. Related to 1, you may hope that your static analyzer will be a little less costly to write if it works on a normalized version of the program with a minimum number of constructs. When authoring static analyzers, having to write the treatment for repeat until when you have already written while do is a bother. You may structure your analyzer so that several functions are shared for these two cases, but the care-free way to handle this is to translate one to the other, or to translate the source to an intermediate language that only has one of them.

  3. On the other hand as already pointed out in Flash Sheridan's answer, source code contains the most information. For instance, in languages with fuzzy semantics, bugs at the source level may be removed by compilation. C and C++ have numerous "undefined behaviors" where the compiler is allowed to do anything, including generating a program that works accidentally. Fine, you might think, if the bug is not in the executable it's not a problematic bug. But when you ever re-compile the program for another architecture or with the next version of the compiler, the bug may appear again. This is one reason for not doing the analysis after any phase that might potentially remove bugs.

  4. Some properties can only be checked with reasonable precision on compiled code. That includes absence of compiler-introduced bugs as pointed out again by Flash Sheridan, but also worst-case execution time. Similarly, many languages do not let you know what floating-point code does precisely unless you look at the assembly generated by the compiler (this is because existing hardware does not make it convenient for them to guarantee more). The choice is then to write an imprecise source-level analyzer that takes into account all possibilities, or to analyze precisely one particular compilation of a floating-point program, as long as it is understood that it is that precise assembly code that will be executed.

﹎☆浅夏丿初晴 2024-10-27 03:14:23

当然,源代码分析是最普遍有用的;有时启发式甚至需要分析注释或格式。但你是对的,甚至目标代码分析也是必要的,例如,检测 由GCC 的缺陷。 GrammaTech 负责人、威斯康星州教授 Thomas Reps 几年前在斯坦福大学就此问题发表了精彩演讲:http://pages.cs.wisc.edu/~reps/#TOPLAS-WYSINWYX

Source code analysis is the most generally useful, of course; sometimes heuristics even need to analyze comments or formatting. But you’re right that even object code analysis can be necessary, e.g., to detect bugs introduced by GCC misfeatures. Thomas Reps, head of GrammaTech and a Wisconsin professor, gave a good talk on this at Stanford a couple of years ago: http://pages.cs.wisc.edu/~reps/#TOPLAS-WYSINWYX.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文