通过静态分析查找C项目中未使用的函数
我正在尝试对 C 项目运行静态分析来识别死代码,即从未调用过的函数或代码行。我可以使用适用于 Windows 的 Visual Studio .Net 或使用适用于 Linux 的 gcc 来构建此项目。我一直在尝试寻找一些合理的工具来为我做到这一点,但到目前为止我还没有成功。我已阅读 Stack Overflow 上的相关问题,即 this 和 this 我尝试将 -Wunreachable-code
与 gcc 一起使用,但输出在gcc 不是很有帮助。它具有以下格式
/home/adnan/my_socket.c: In function ‘my_sockNtoH32’:
/home/adnan/my_socket.c:666: warning: will never be executed
,但当我查看 my_socket.c 中的第 666 行时,它实际上位于从函数 my_sockNtoH32() 调用的另一个函数内,并且不会针对该特定实例执行,但会当从其他一些函数调用时执行。
我需要的是找到永远不会被执行的代码。有人可以帮忙吗?
PS:我无法说服管理层购买用于此任务的工具,因此请坚持使用免费/开源工具。
I am trying to run static analysis on a C project to identify dead code i.e functions or code lines that are never ever called. I can build this project with Visual Studio .Net for Windows or using gcc for Linux. I have been trying to find some reasonable tool that can do this for me but so far I have not succeeded. I have read related questions on Stack Overflow i.e this and this and I have tried to use -Wunreachable-code
with gcc but the output in gcc is not very helpful. It is of the following format
/home/adnan/my_socket.c: In function ‘my_sockNtoH32’:
/home/adnan/my_socket.c:666: warning: will never be executed
but when I look at line 666 in my_socket.c
, it's actually inside another function that is being called from function my_sockNtoH32() and will not be executed for this specific instance but will be executed when called from some other functions.
What I need is to find the code which will never be executed. Can someone plz help with this?
PS: I can't convince management to buy a tool for this task, so please stick to free/open source tools.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果 GCC 不适合您,请尝试 clang (或更准确地说,它的 静态分析器)。它(通常,您的里程当然可能会有所不同)具有比 GCC 更好的静态分析(并且产生更好的输出)。它在 Apple 的 Xcode 中使用,但它是开源的,可以单独使用。
If GCC isn't cutting it for you, try clang (or more accurately, its static analyzer). It (generally, your mileage may vary of course) has a much better static analysis than GCC (and produces much better output). It's used in Apple's Xcode but it's open-source and can be used seperately.
当海湾合作委员会说“永远不会被处决”时,就是这个意思。事实上,您可能遇到了一个错误,该错误确实导致了死代码。例如,类似这样的内容:
当然,如果没有看到代码,就不可能具体。
请注意,如果第 666 行有一个宏,则 GCC 也可能引用该宏的一部分。
When GCC says "will never be executed", it means it. You may have a bug that, in fact, does make that dead code. For example, something like:
Without seeing the code it's not possible to be specific, of course.
Note that if there is a macro at line 666, it's possible GCC refers to a part of that macro as well.
GCC 将帮助您找到编译中的死代码。如果它可以跨多个编译单元找到死代码,我会感到惊讶。编译单元中函数或变量的文件级声明意味着其他一些编译单元可能会引用它。因此,在文件顶层声明的任何内容,GCC 都无法消除,因为可以说它一次只能看到一个编译单元。
问题变得更加困难。想象一下,编译单元A声明了函数a,编译单元B有一个调用a的函数b。是死了吗?从表面上看,没有。但事实上,这取决于;如果 b 已死,并且对 a 的唯一引用在 b 中,则 a 也已死。如果 b 仅接受 &a 并将其放入数组 X 中,我们也会遇到同样的问题。现在要确定 a 是否已死,我们需要对整个系统进行点分析,看看是否指向 a 的指针可以在任何地方使用。
为了获得这种准确的“死”信息,您需要对整个编译单元集有一个全局视图,并且需要计算指向分析,然后基于该指向分析构建调用图。仅当调用图(作为树,
以 main 作为根)没有在某处引用它。
(一些警告是必要的:无论分析是什么,作为一个实际问题,它必须是保守的,因此即使是完整的分析也可能无法正确地将函数识别为死函数。您还必须担心从外部使用 C 工件C 函数集,例如,从某些汇编代码中调用 a)。
线程使情况变得更糟;每个线程都有一些根函数,可能位于调用 DAG 的顶部。由于线程如何启动不是由 C 编译器定义的,因此应该清楚的是,要确定多线程 C 应用程序是否有死代码,必须以某种方式告诉分析线程根函数,或者告诉分析如何通过以下方式发现它们寻找线程初始化原语。
您没有得到太多关于如何获得正确答案的回复。虽然它不是开源的,但我们的 DMS 软件重新工程工具包及其C 前端 拥有执行此操作的所有机制,包括 C解析器、控制流和数据流分析、本地和全局指向分析以及全局调用图DMS 很容易定制,以包含额外的信息,例如来自汇编程序的外部调用,和/或线程根列表或线程初始化调用的特定源模式,我们实际上已经做到了(很容易)对于一些大的具有数百万行代码的嵌入式发动机控制器。为了构建此类调用图,DMS 已应用于多达 2600 万行代码(约 18,000 个编译单元)的系统。
[有趣的是:在处理单个编译单元时,DMS 出于缩放原因实际上会删除该编译单元中未使用的符号和相关代码。值得注意的是,当您考虑到通常隐藏在包含文件嵌套中的冰山时,这按体积消除了大约 95% 的代码。它说 C 软件通常分解的包含文件很差。我怀疑你们都已经知道了。]
像 GCC 这样的工具将在编译时删除死代码。。这很有帮助,但是死代码仍然存在于您的编译单元源代码中,耗尽了开发人员的注意力(他们也必须弄清楚它是否已死!)。可以对 DMS 的程序转换模式进行配置,对某些预处理器问题进行取模,以实际从源代码中删除死代码。在非常大的软件系统上,您实际上不想手动完成此操作。
GCC will help you find dead code within a compilation. I'd be surprised if it can find dead code across multiple compilation units. A file-level declaration of a function or variable in a compilation unit means that some other compilation unit might reference it. So anything declared at the top level of a file, GCC can't eliminate, as it arguably only sees one compilation unit at a time.
The problem gets get harder. Imagine that compilation unit A declares function a, and compilation unit B has a function b that calls a. Is a dead? On the face of it, no. But in fact, it depends; if b is dead, and the only reference to a is in b, then a is dead, too. We get the same problem if b merely takes &a and puts it into an array X. Now to decide if a is dead, we need a points-to analysis across the entire system, to see if that pointer to a is used anywhere.
To get this kind of accurate "dead" information, you need a global view of the entire set of compilation units, and need to compute a points-to analysis, followed by the construction of a call-graph based on that points-to analysis. Function a is dead only if the call graph (as a tree,
with main as the root) doesn't reference it somewhere.
(Some caveats are necessary: whatever the analysis is, as a practical matter it must be conservative, so even a full-points to analysis may not identify a function correctly as dead. You also have to worry about uses of a C artifact from outside the set of C functions, e.g., a call to a from some bit of assembler code).
Threading makes this worse; each thread has some root function which is probably at the top of the call DAG. Since how a thread gets started isn't defined by C compilers, it should be clear that to determine if a multithreaded C application has dead code, somehow the analysis has to be told the thread root functions, or be told how to discover them by looking for thread-initialization primitives.
You aren't getting a lot responses on how to get a correct answer. While it isn't open source, our DMS Software Reengineering Toolkit with its C Front End has all the machinery to do this, including C parsers, control- and dataflow- analysis, local and global points-to analysis, and global call graph construction. DMS is easily customized to include extra information such as external calls from assembler, and/or a list of thread roots or specific source-patterns that are thread initialization calls, and we've actually done that (easily) for some large embedded engine controllers with millions of lines of code. DMS has been applied to systems as large as 26 million lines of code (some 18,000 compilation units) for the purpose of building such calls graphs.
[An interesting aside: in processing individual comilation units, DMS for scaling reasons in effect deletes symbols and related code that aren't used in that compilation unit. Remarkably, this gets rid of about 95% of code by volume when you take into account the iceberg usually hiding in the include file nest. It says C software typically has poorly factored include files. I suspect you all know that already.]
Tools like GCC will remove dead code while compiling. That's helpful, but the dead code is still lying around in your compilation unit source code using up developer's attention (they have to figure out if it is dead, too!). DMS in its program transformation mode can be configured, modulo some preprocessor issues, to actually remove that dead code from the source. On very large software systems, you don't really want to do this by hand.