我如何知道代码中的哪些部分从未被使用过?
我有遗留的 C++ 代码,我应该从中删除未使用的代码。问题是代码库很大。
我如何找出哪些代码从未被调用/从未使用过?
I have legacy C++ code that I'm supposed to remove unused code from. The problem is that the code base is large.
How can I find out which code is never called/never used?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(19)
未使用的代码有两种:
对于第一种类型,一个好的编译器可以提供帮助:
-Wunreachable-code
(旧版 GCC, 2010 年删除)应该警告从未访问过的本地块(它发生在早期返回或始终评估为 true 的条件下),catch
块,因为编译器通常无法证明不会抛出异常。对于第二种,就困难得多。静态地,它需要整个程序分析,尽管链接时优化实际上可以删除死代码,但实际上程序在执行时已经发生了很大的变化,几乎不可能向用户传达有意义的信息。
因此有两种方法:
如果您对这个主题非常感兴趣,并且有时间和意愿自己实际开发一个工具,我建议使用 Clang 库来构建这样一个工具。
因为 Clang 会为您解析代码,并执行重载解析,所以您不必处理 C++语言占主导地位,您将能够专注于手头的问题。
然而,这种技术无法识别未使用的虚拟覆盖,因为它们可能被您无法推理的第三方代码调用。
There are two varieties of unused code:
For the first kind, a good compiler can help:
-Wunused
(GCC, Clang) should warn about unused variables, Clang unused analyzer has even been incremented to warn about variables that are never read (even though used).-Wunreachable-code
(older GCC, removed in 2010) should warn about local blocks that are never accessed (it happens with early returns or conditions that always evaluate to true)catch
blocks, because the compiler generally cannot prove that no exception will be thrown.For the second kind, it's much more difficult. Statically it requires whole program analysis, and even though link time optimization may actually remove dead code, in practice the program has been so much transformed at the time it is performed that it is near impossible to convey meaningful information to the user.
There are therefore two approaches:
gcov
. Note that specific flags should be passed during compilation for it to work properly). You run the code coverage tool with a good set of varied inputs (your unit-tests or non-regression tests), the dead code is necessarily within the unreached code... and so you can start from here.If you are extremely interested in the subject, and have the time and inclination to actually work out a tool by yourself, I would suggest using the Clang libraries to build such a tool.
Because Clang will parse the code for you, and perform overload resolution, you won't have to deal with the C++ languages rules, and you'll be able to concentrate on the problem at hand.
However this kind of technique cannot identify the virtual overrides that are unused, since they could be called by third-party code you cannot reason about.
对于未使用的整个函数(以及未使用的全局变量)的情况,只要您使用 GCC 和 GNU ld,GCC 实际上可以为您完成大部分工作。
编译源代码时,使用
-ffunction-sections
和-fdata-sections
,然后在链接时使用-Wl,--gc-sections,--print- gc-sections
。链接器现在将列出所有可以删除的函数,因为它们从未被调用,以及所有从未被引用的全局变量。(当然,您也可以跳过
--print-gc-sections
部分,让链接器静默删除这些函数,但将它们保留在源代码中。)注意:这只会找到未使用的完整函数,它不会对函数内的死代码执行任何操作。从活动函数中的死代码调用的函数也将保留。
一些 C++ 特有的功能也会引起问题,特别是:
在这两种情况下,虚函数或全局变量构造函数使用的任何内容都必须保留。
另一个需要注意的是,如果您正在构建共享库,GCC 中的默认设置将导出共享库中的每个函数,从而导致链接器“使用”它。要解决此问题,您需要将默认值设置为隐藏符号而不是导出(使用例如
-fvisibility=hidden
),然后显式选择需要导出的导出函数。For the case of unused whole functions (and unused global variables), GCC can actually do most of the work for you provided that you're using GCC and GNU ld.
When compiling the source, use
-ffunction-sections
and-fdata-sections
, then when linking use-Wl,--gc-sections,--print-gc-sections
. The linker will now list all the functions that could be removed because they were never called and all the globals that were never referenced.(Of course, you can also skip the
--print-gc-sections
part and let the linker remove the functions silently, but keep them in the source.)Note: this will only find unused complete functions, it won't do anything about dead code within functions. Functions called from dead code in live functions will also be kept around.
Some C++-specific features will also cause problems, in particular:
In both cases, anything used by a virtual function or a global-variable constructor also has to be kept around.
An additional caveat is that if you're building a shared library, the default settings in GCC will export every function in the shared library, causing it to be "used" as far as the linker is concerned. To fix that you need to set the default to hiding symbols instead of exporting (using e.g.
-fvisibility=hidden
), and then explicitly select the exported functions that you need to export.如果你使用 g++,你可以使用这个标志
-Wunused
根据文档:
http://docs.freebsd.org/info/gcc/gcc。 info.Warning_Options.html
编辑:这是其他有用的标志
-Wunreachable-code
根据文档:
更新:我发现类似的主题旧版中的死代码检测C/C++项目
Well if you using g++ you can use this flag
-Wunused
According documentation:
http://docs.freebsd.org/info/gcc/gcc.info.Warning_Options.html
Edit: Here is other useful flag
-Wunreachable-code
According documentation:
Update: I found similar topic Dead code detection in legacy C/C++ project
我认为您正在寻找代码覆盖率工具。代码覆盖率工具会在代码运行时对其进行分析,让您知道哪些代码行被执行了、执行了多少次,以及哪些代码没有被执行。
您可以尝试给这个开源代码覆盖工具一个机会:TestCocoon - 适用于 C/C++ 和 C# 的代码覆盖工具。
I think you are looking for a code coverage tool. A code coverage tool will analyze your code as it is running, and it will let you know which lines of code were executed and how many times, as well as which ones were not.
You could try giving this open source code coverage tool a chance: TestCocoon - code coverage tool for C/C++ and C#.
这里真正的答案是:你永远无法真正确定。
至少,对于重要的情况,你无法确定你已经了解了所有内容。请考虑维基百科关于无法访问代码的文章中的以下内容:
正如维基百科正确指出的那样,聪明的编译器可能能够抓住这样的东西。但考虑一下修改:
编译器会捕获这个吗?或许。但要做到这一点,它需要做的不仅仅是针对常量标量值运行
sqrt
。必须弄清楚(double)y
始终是整数(简单),然后了解整数集的sqrt
数学范围(困难) 。一个非常复杂的编译器可能能够为sqrt
函数,或者为 math.h 中的每个函数,或者对于其域可以计算的任何固定输入函数执行此操作出去。这变得非常非常复杂,而且复杂性基本上是无限的。您可以不断地向编译器添加复杂的层,但是总会有一种方法可以潜入一些对于任何给定的输入集都无法访问的代码。还有一些输入集永远不会被输入。输入在现实生活中没有意义,或者被其他地方的验证逻辑阻止。编译器无法知道这些。
最终结果是,虽然其他人提到的软件工具非常有用,但除非您事后手动检查代码,否则您永远无法确定您是否捕获了所有内容。即便如此,您也永远无法确定自己没有错过任何事情。
恕我直言,唯一真正的解决方案是尽可能保持警惕,使用您可以使用的自动化功能,尽可能进行重构,并不断寻找改进代码的方法。当然,无论如何这样做都是个好主意。
The real answer here is: You can never really know for sure.
At least, for nontrivial cases, you can't be sure you've gotten all of it. Consider the following from Wikipedia's article on unreachable code:
As Wikipedia correctly notes, a clever compiler may be able to catch something like this. But consider a modification:
Will the compiler catch this? Maybe. But to do that, it will need to do more than run
sqrt
against a constant scalar value. It will have to figure out that(double)y
will always be an integer (easy), and then understand the mathematical range ofsqrt
for the set of integers (hard). A very sophisticated compiler might be able to do this for thesqrt
function, or for every function in math.h, or for any fixed-input function whose domain it can figure out. This gets very, very complex, and the complexity is basically limitless. You can keep adding layers of sophistication to your compiler, but there will always be a way to sneak in some code that will be unreachable for any given set of inputs.And then there are the input sets that simply never get entered. Input that would make no sense in real life, or get blocked by validation logic elsewhere. There's no way for the compiler to know about those.
The end result of this is that while the software tools others have mentioned are extremely useful, you're never going to know for sure that you caught everything unless you go through the code manually afterward. Even then, you'll never be certain that you didn't miss anything.
The only real solution, IMHO, is to be as vigilant as possible, use the automation at your disposal, refactor where you can, and constantly look for ways to improve your code. Of course, it's a good idea to do that anyway.
我自己没有使用过,但是 cppcheck,声称找到未使用的功能。它可能无法解决完整的问题,但它可能是一个开始。
I haven't used it myself, but cppcheck, claims to find unused functions. It probably won't solve the complete problem but it might be a start.
您可以尝试使用Gimple Software 的 PC-lint/FlexeLint。它声称
我使用它进行静态分析,发现它非常好,但我必须承认我没有使用它来专门查找死代码。
You could try using PC-lint/FlexeLint from Gimple Software. It claims to
I've used it for static analysis and found it very good but I have to admit that I have not used it to specifically find dead code.
我查找未使用内容的常规方法是
watch "make 2>&1"
在 Unix 上往往能达到目的。这是一个有点漫长的过程,但确实带来了良好的结果。
My normal approach to finding unused stuff is
watch "make 2>&1"
tends to do the trick on Unix.This is a somewhat lengthy process, but it does give good results.
将尽可能多的公共函数和变量标记为私有或受保护,而不会导致编译错误,同时尝试重构代码。通过将函数设为私有并在某种程度上受到保护,您可以减少搜索区域,因为私有函数只能从同一个类中调用(除非存在愚蠢的宏或其他技巧来规避访问限制,如果是这种情况,我建议您找到一份新工作)。确定您不需要私有函数要容易得多,因为只有您当前正在处理的类可以调用此函数。如果您的代码库具有小型类并且松散耦合,则此方法会更容易。如果您的代码库没有小类或耦合非常紧密,我建议首先清理它们。
接下来将标记所有剩余的公共函数并制作调用图以弄清楚类之间的关系。从这棵树上,尝试找出树枝的哪一部分看起来可以修剪。
这种方法的优点是,您可以在每个模块的基础上执行此操作,因此很容易继续通过单元测试,而无需在代码库损坏时花费大量时间。
Mark as much public functions and variables as private or protected without causing compilation error, while doing this, try to also refactor the code. By making functions private and to some extent protected, you reduced your search area since private functions can only be called from the same class (unless there are stupid macro or other tricks to circumvent access restriction, and if that's the case I'd recommend you find a new job). It is much easier to determine that you don't need a private function since only the class you're currently working on can call this function. This method is easier if your code base have small classes and is loosely coupled. If your code base does not have small classes or have very tight coupling, I suggest cleaning those up first.
Next will be to mark all the remaining public functions and make a call graph to figure out the relationship between the classes. From this tree, try to figure out which part of the branch looks like it can be trimmed.
The advantage of this method is that you can do it on per module basis, so it is easy to keep passing your unittest without having large period of time when you've got broken code base.
如果您使用的是 Linux,您可能需要研究一下
callgrind
,这是一个 C/C++ 程序分析工具,它是valgrind
套件的一部分,该套件还包含用于检查内存泄漏和其他内存错误(您也应该使用它们)。它分析程序的运行实例,并生成有关其调用图以及调用图上节点的性能成本的数据。它通常用于性能分析,但它也会为您的应用程序生成调用图,以便您可以查看调用了哪些函数以及它们的调用者。这显然是对页面其他地方提到的静态方法的补充,它只会有助于消除完全未使用的类、方法和函数——它并不能帮助找到实际调用的方法内的死代码。
If you are on Linux, you may want to look into
callgrind
, a C/C++ program analysis tool that is part of thevalgrind
suite, which also contains tools that check for memory leaks and other memory errors (which you should be using as well). It analyzes a running instance of your program, and produces data about its call graph, and about the performance costs of nodes on the call graph. It is usually used for performance analysis, but it also produces a call graph for your applications, so you can see what functions are called, as well as their callers.This is obviously complementary to the static methods mentioned elsewhere on the page, and it will only be helpful for eliminating wholly unused classes, methods, and functions - it well not help find dead code inside methods which are actually called.
我真的没有使用过任何工具来做这样的事情......但是,据我在所有答案中看到的,没有人说过这个问题是不可计算的。
我这是什么意思?这个问题无法通过计算机上的任何算法来解决。这个定理(这样的算法不存在)是图灵停止问题的推论。
您将使用的所有工具都不是算法,而是启发式(即不是精确算法)。他们不会为您提供所有未使用的代码。
I really haven't used any tool that does such a thing... But, as far as I've seen in all the answers, no one has ever said that this problem is uncomputable.
What do I mean by this? That this problem cannot be solved by any algorithm ever on a computer. This theorem (that such an algorithm doesn't exist) is a corollary of Turing's Halting Problem.
All the tools you will use are not algorithms but heuristics (i.e not exact algorithms). They will not give you exactly all the code that's not used.
一种方法是使用调试器和编译器功能来消除编译期间未使用的机器代码。
一旦消除了某些机器代码,调试器将不允许您在相应的源代码行上放置断点。因此,您在各处放置断点并启动程序并检查断点 - 那些处于“没有为此源加载代码”状态的断点对应于已删除的代码 - 要么该代码从未被调用,要么已被内联,并且您必须执行一些最小的操作分析以确定这两者中哪一个发生了。
至少 Visual Studio 中是这样工作的,我想其他工具集也可以做到这一点。
这是很多工作,但我想比手动分析所有代码更快。
One way is use a debugger and the compiler feature of eliminating unused machine code during compilation.
Once some machine code is eliminated the debugger won't let you put a breakpojnt on corresponding line of source code. So you put breakpoints everywhere and start the program and inspect the breakpoints - those which are in "no code loaded for this source" state correspond to eliminated code - either that code is never called or it has been inlined and you have to perform some minimum analysis to find which of those two happened.
At least that's how it works in Visual Studio and I guess other toolsets also can do that.
That's lots of work, but I guess faster than manually analyzing all code.
CppDepend 是一个商业工具,可以检测未使用的类型、方法和字段,以及执行更多操作。它适用于 Windows 和 Linux(但目前不支持 64 位),并提供 2 周的试用期。
免责声明:我不在那里工作,但我拥有此工具的许可证(以及 NDepend,这是一个更.NET 代码的强大替代方案)。
对于那些好奇的人,这里是一个用于检测死方法的内置(可自定义)规则示例,用 CQLinq 编写:
CppDepend is a commercial tool which can detect unused types, methods and fields, and do much more. It is available for Windows and Linux (but currently has no 64-bit support), and comes with a 2-week trial.
Disclaimer: I don't work there, but I own a license for this tool (as well as NDepend, which is a more powerful alternative for .NET code).
For those who are curious, here is an example built-in (customizable) rule for detecting dead methods, written in CQLinq:
这取决于您用来创建应用程序的平台。
例如,如果您使用 Visual Studio,则可以使用类似 .NET ANTS Profiler 能够解析和分析您的代码。这样,您应该很快就知道实际使用了代码的哪一部分。 Eclipse 也有等效的插件。
否则,如果您需要知道最终用户实际使用了应用程序的哪些功能,并且如果您可以轻松发布应用程序,则可以使用日志文件进行审核。
对于每个主要功能,您可以跟踪其使用情况,几天/一周后即可获取该日志文件并查看它。
It depends of the platform you use to create your application.
For example, if you use Visual Studio, you could use a tool like .NET ANTS Profiler which is able to parse and profile your code. This way, you should quickly know which part of your code is actually used. Eclipse also have equivalent plugins.
Otherwise, if you need to know what function of your application is actually used by your end user, and if you can release your application easily, you can use a log file for an audit.
For each main function, you can trace its usage, and after a few days/week just get that log file, and have a look at it.
我今天有一个朋友问了我这个问题,我环顾了一些有前途的 Clang 开发,例如 ASTMatcher 和 静态分析器 在编译过程中可能有足够的可见性确定死代码部分,但后来我发现了这个:
https://blog.flameeyes.eu/2008/01/today-how-to-identify-unused-exported-functions-and-variables
这几乎是如何使用的完整描述一些 GCC 标志似乎是为了识别未引用的符号而设计的!
I had a friend ask me this very question today, and I looked around at some promising Clang developments, e.g. ASTMatchers and the Static Analyzer that might have sufficient visibility in the goings-on during compiling to determine the dead code sections, but then I found this:
https://blog.flameeyes.eu/2008/01/today-how-to-identify-unused-exported-functions-and-variables
It's pretty much a complete description of how to use a few GCC flags that are seemingly designed for the purpose of identifying unreferenced symbols!
我不认为这可以自动完成。
即使使用代码覆盖工具,您也需要提供足够的输入数据才能运行。
可能是非常复杂且价格昂贵的静态分析工具,例如来自 Coverity 或 LLVM 编译器 可能会有所帮助。
但我不确定,我更喜欢手动代码审查。
更新
嗯..只删除未使用的变量,未使用的函数并不难。
更新
在阅读其他答案和评论后,我更加坚信这是不可能完成的。
您必须了解代码才能进行有意义的代码覆盖率测量,并且如果您知道大量手动编辑将比准备/运行/审查覆盖率结果更快。
I don't think it can be done automatically.
Even with code coverage tools, you need to provide sufficient input data to run.
May be very complex and high priced static analysis tool such as from Coverity's or LLVM compiler could be help.
But I'm not sure and I would prefer manual code review.
UPDATED
Well.. only removing unused variables, unused functions is not hard though.
UPDATED
After read other answers and comments, I'm more strongly convinced that it can't be done.
You have to know the code to have meaningful code coverage measure, and if you know that much manual editing will be faster than prepare/run/review coverage results.
GNU 链接器有一个
--cref
选项,可以生成交叉引用信息。您可以通过-Wl,--cref
从gcc
命令行传递此信息。例如,假设
foo.o
定义了一个符号foo_sym
,该符号也在bar.o
中使用。然后在输出中您将看到:如果
foo_sym
仅限于foo.o
,那么您将看不到任何其他目标文件;它后面会跟着另一个符号:现在,我们不知道
foo_sym
没有被使用。它只是一个候选者:我们知道它是在一个文件中定义的,并且没有在任何其他文件中使用。foo_sym
可以在foo.o
中定义并在那里使用。因此,您对这些信息所做的就是
当然,我忽略了其中一些符号故意未使用的可能性,因为它们是为了动态链接而导出的(即使链接了可执行文件也可能是这种情况);这是一个更微妙的情况,您必须了解并明智地处理。
The GNU linker has a
--cref
option which produces cross-reference information. You can pass this from thegcc
command line via-Wl,--cref
.For instance, suppose that
foo.o
defines a symbolfoo_sym
which is also used inbar.o
. Then in the output you will see:If
foo_sym
is confined tofoo.o
, then you won't see any additional object files; it will be followed by another symbol:Now, from this we do not know that
foo_sym
is not used. It's just a candidate: we know that it's defined in one file, and not used in any others.foo_sym
could be defined infoo.o
and used there.So, what you do with this information is
static
, like it should have.Of course, I'm ignoring the possibility that some of those symbols are unused on purpose, because they are exported for dynamic linkage (which can be the case even when an executable is linked); that's a more nuanced situation that you have to know about and intelligently deal with.
如果你使用 g++,你可以使用这个标志 -Wunused
根据文档:
http: //docs.freebsd.org/info/gcc/gcc.info.Warning_Options.html
编辑:这是其他有用的标志 -Wunreachable-code 根据文档:
Well if you using g++ you can use this flag -Wunused
According documentation:
http://docs.freebsd.org/info/gcc/gcc.info.Warning_Options.html
Edit: Here is other usefull flag -Wunreachable-code According documentation:
是否会调用某个函数的一般问题是无法确定的。您无法以一般方式提前知道是否会调用某个函数,因为您不知道图灵机是否会停止。如果有一些路径(静态地)从 main() 到您编写的函数,您可以得到,但这并不能保证它会被调用。如果采用通用形式,则决定是否调用该函数的决策集是不可判定的。
函数可以被其他模块引用,并且可以从
main()
访问(这使得函数可访问),但这不会使其自动调用。这意味着您可以决定该函数是否实际上会被调用(这是问题的不可判定部分),而无需执行它。该问题类似于通过对角线方法创建的图灵机,仅当函数(如程序中定义的)被分类为不可调用时,才执行实际调用该函数的所有函数。我们可以构建这样一个函数,使程序永远不会因结果而停止,从而使问题无法判定......在这种情况下会调用该函数吗?恐怕你不能说。The general problem of if some function will be called is undecidable. You cannot know in advance in a general way if some function will be called as you won't know if a Turing machine will ever stop. You can get if there's some path (statically) that goes from main() to the function you have written, but that doesn't warrant you it will ever be called. The set of decisions to decide if the function will be called is undecidable, if taken from a general form.
A function can be referenced by other modules and be reachable from
main()
(and this makes the function reachable) but this doesn't make it automatically callable. This means that you can decide if the function will actually be called or not (this is the undecidable part of the problem) without the need of executing it. The problem is similar to the turing machine that is created by a diagonal approach by executing all functions that actually call this function only if the function (as defined in the program) is classified as non-callable. We can build such a function, making the program never stop with a result, so making the problem undecidable... will the function be called in this case? I'm afraid you cannot say.