对源代码、字节代码、机器代码等执行静态分析的权衡是什么?
在不同级别的代码上执行静态分析有哪些不同的权衡?例如,对于 Java,为什么有人会对 Java 源代码与 Jasmin 代码与 Java 字节码?该选择是否限制或扩展了能够进行的各种类型的分析?选择是否会影响分析的正确性?谢谢。
What are the various tradeoffs for performing static analysis on various levels of code? For instance for Java, why would someone perform static analysis on Java source code vs. Jasmin code vs. Java bytecode? Does the choice restrict or expand the various types of analyses able to be done? Does the choice influence the correctness of the analyses? Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
从用户的角度来看,我会这么说,除非您非常了解具体的、易于形式化的、要分析的属性(例如纯安全属性)与支持 Java 源代码的工具相匹配。
从工具开发人员的角度来看,使用一个级别或另一个级别可能更容易。我在这里提出我想到的差异。 (请注意,使用编译器和/或像样的反编译器,工具可以在一层上运行并在另一层上显示结果。)
Java 源代码的优点:
字节码的优点:
机器代码的优点:
最先进的工具,例如 Spec# 等(C# 的形式方法方言)通常会使用专为形式分析而设计的中间语言(BoogiePL(在 Spec# 情况下,与 MSIL 或 C# 更接近))。
最后...不,不是真的。无论您选择分析哪种(图灵完备)语言,您都会面临相同的基本问题。不过,这取决于您分析的属性。
如果您热衷于形式化方法并考虑自己实现分析,我怀疑您会找到更好的字节码工具支持。如果您是用户或开发人员并且想要对自己的代码库进行分析,我怀疑您会从在 Java 源代码级别运行的工具中受益更多。
取决于您所说的正确性是什么意思。静态分析通常是“防御性”的,因为您不会假设任何您不知道的事情都是真的。如果您将注意力限制在声音验证系统上,那么所有这些都会“同样正确”。
From a user perspective, I'd say that, unless you have very specific, easy to formalize, properties to analyze (such as pure safety properties) go with a tool that supports Java source code.
From a tool-developer perspective, it may be easier to work with one level or another. I here present the differences that come to my mind. (Note that with a compiler and/or a decent decompiler a tool for instance operate on one layer and present the results on another.)
Pros for Java source code:
Pros for Bytecode:
Pros for machine code:
State of the art tools such as Spec# etc (formal methods dialect of C#) usually go through an intermediate language (BoogiePL (neighter MSIL nor C#) in the Spec# case) specifically designed for formal analysis.
In the end... no, not really. You face the same fundamental problems regardless of which (Turing complete) language you choose to analyze. Depending on what properties you analyze, YMMV though.
If you're into formal methods and thinking about implementing an analysis yourself, I suspect you'll find better tool-support for bytecode. If you're a user or developer and want to perform analysis on your own code-base, I suspect you'll benefit more from tools operating on Java-source code level.
Depends on what you mean by correctness. A static analysis is most often "defensive" in the sense that you don't assume anything that you don't know is true. If you restrict your attention to sound verification systems, all of them will be "equally correct".
IntelliJ 对注释进行静态分析,例如 Javadoc 和参数名称,这在字节代码中不可用。例如拼写错误和名称不一致。代码分析可确保您了解任何问题的行号和行内位置。
分析字节码的好处是它更简单并且可能就是您所需要的。您可能有行号,但没有位置。您还可以分析没有源代码的编译代码,例如库。
IntelliJ has static analysis for comments e.g. Javadoc and parameter names which is not available in the byte code. e.g. spelling mistakes and name inconsistencies. Analysis of code ensures you have line numbers and position within a line of any issue.
The benefit of analysing byte code is that its much simpler and may be all you need. You might have line numbers but you won't have the position. And you can analise compiled code which you don't have the source for, e.g. libraries.
这样想吧。如果您从 Jasmin 或字节码得到负面结果(表明或暗示负面或有害属性的结果),您会采取什么措施?您将如何及时且具有成本效益地解决这个问题?
现在考虑这样一种情况:对源代码(很可能是您的源代码或您拥有的代码)进行静态分析返回报告负数/需要解决的有害属性?
您是否认为解决这个映射到源代码的有害方面会比处理映射到字节码或 Jasmin 的有害方面(可能相似或相关)更困难?
问题是 1) Jasmin 应该是合法字节码的一对一表示,2) 该字节码是由真正的编译器生成的。 在表现良好的编译器存在的情况下,字节码中的问题直接映射到源代码中引入的问题的可能性非常小。
无论在字节码级别检测到的问题是源代码级别引入的问题的结果还是错误的编译器/环境的结果,这些问题通常都无法解决(sp?)。您通常无法对其采取行动,至少不能直接采取行动。
在源代码级别检测到的问题,OTH,它们可以有效地采取行动。也就是说,您可以亲自使用它并修复它们(并通过推断,消除从前者派生的字节代码中的任何问题。)
可以在字节代码级别检测到一些事情,特别是在打包(即打包不必要的库。)但是您几乎不需要在字节代码级别进行验证。
除非您从事编译器和语言设计业务(在本例中针对 VM),出于效率和实用性的目的,1) 您假设编译器是正确的,并且 2) 考虑到 JVM 的指定方式,您还假设编译器在编译时执行验证,而 JVM 在运行时执行验证。
你如何定义正确性?在这种情况下什么是正确性?它如何影响正确性?我们谈论的是类型系统级别的正确性吗?部分和/或完全正确?公平性、活泼性等属性是否正确?分析过程本身的正确性?满足一项或多项要求的正确性?
定义你的术语老兄:)
无论如何,你必须假设编译器正在将你的代码足够正确地翻译成目标指令集(再次强调,除非你从事编译器/语言设计业务。)
如果你按照这个假设工作代码的“本机”表示是正确的(也就是说,它根据所需的目标平台和类型系统“映射”到它),然后将验证范围缩小到您想要的属性的源代码来验证。
Think of it this way. If you get negative results (results indicating or suggesting a negative or detrimental attribute) from Jasmin or bytecode, what would you do about it? How would you go about addressing that in a manner that is timely and cost effective?
Now consider the scenario where static analysis on the source code (most likely your source code or code that you own) comes back reporting a negative/detrimental attribute that needs addressing?
Do you think you will have a harder time addressing this detrimental aspect that is being mapped to source code than doing the same to a detrimental aspect (possibly similar or related) but this time mapped to bytecode or Jasmin?
The thing is that 1) Jasmin is expected to be a one-to-one representation of legitimate bytecode, and 2) that bytecode has been generated by a bona-fide compiler. The chances that a problem in bytecode map directly to a problem introduced in source code in the presence of a well-behaved compiler are very minimal.
Independently of whether problems detected at the bytecode level are a result of problems introduced at the source code level or the result of a faulty compiler/environment, these problems are typically not actionable(sp?). You typically cannot act upon it, at least not directly.
Problems detected at the source code level, OTH, they are efficiently actionable. That is, you can get your hands on it and fix them (and by inference, removing any problems at the byte code derived from the former.)
There are things that can be detected at the byte code level, in particular in the context of packaging (ie. packaging unnecessary libraries.) But hardly you ever need to do verification at the byte code level.
Unless you are in the business of compiler and language design (in this case targeting the VM), for efficiency and practicality purposes, 1) you assume the compiler is correct, and that 2) given the way the JVM is spec'ed, you also assume the compiler performs verification at compile time and the JVM does verification at run-time.
How do you define correctness? What is correctness in this context? And how could it affect correctness? Are we talking correctness at the type system level? Partial and/or total correctness? Correctness with respect to attributes such as fairness, liveliness? Correctness of the analysis process itself? Correctness with respect to meeting one or more requirements?
Define your terms dude :)
Regardless, you have to assume the compiler is doing a sufficiently correct translation of your code into the target instruction set (again, unless you are in the business of compiler/language design.)
If you work on the assumption that the "native" representation of your code is correct (that is, it "maps" to it according to a desired target platform and a type system), then you narrow your field of verification down to your source code for the attributes you want to verify.
另一个考虑因素是“抽象会丢失高级信息”。
我们使用源代码(高级)来完成此操作,因为我们需要源代码中出现表达式的位置。
源到二进制的映射在源代码可视化领域非常重要。
Another consideration is 'abstraction will lost high level informations'.
We are doing it with source code(high level), because we need where expression are occur in source code.
a source-to-binary mapping is very important in source code visualization area.