使用Python代码覆盖工具来理解和修剪大型库的源代码
我的项目针对低成本和低资源的嵌入式设备。我依赖于一个相对庞大且庞大的 Python 代码库,我对其 API 的使用非常具体。
我热衷于将该库的代码修剪到最低限度,方法是在 Ned Batchelder 的 coverage 或 figleaf 等覆盖工具中执行我的测试套件,然后编写脚本删除各个模块/文件中未使用的代码。这不仅有助于理解库的内部结构,而且还使编写任何补丁变得更容易。内德实际上在他的一次在线演讲中提到了使用覆盖工具对复杂代码进行“逆向工程”。
我向 SO 社区提出的问题是,人们是否有以这种方式使用覆盖工具的经验并且不介意分享?如果有的话,有哪些陷阱? 覆盖率工具是一个不错的选择吗?或者我把时间花在 figleaf 上会更好吗?
最终目标是能够基于原始树自动为库生成一个新的源代码树,但仅包括当我运行nosetests时实际使用的代码。
如果有人开发了一种工具可以为他们的 Python 应用程序和库完成类似的工作,那么获得一个开始开发的基线将是非常棒的。
希望我的描述对读者有意义......
My project targets a low-cost and low-resource embedded device. I am dependent on a relatively large and sprawling Python code base, of which my use of its APIs is quite specific.
I am keen to prune the code of this library back to its bare minimum, by executing my test suite within a coverage tools like Ned Batchelder's coverage or figleaf, then scripting removal of unused code within the various modules/files. This will help not only with understanding the libraries' internals, but also make writing any patches easier. Ned actually refers to the use of coverage tools to "reverse engineer" complex code in one of his online talks.
My question to the SO community is whether people have experience of using coverage tools in this way that they wouldn't mind sharing? What are the pitfalls if any? Is the coverage tool a good choice? Or would I be better off investing my time with figleaf?
The end-game is to be able to automatically generate a new source tree for the library, based on the original tree, but only including the code actually used when I run nosetests.
If anyone has developed a tool that does a similar job for their Python applications and libraries, it would be terrific to get a baseline from which to start development.
Hopefully my description makes sense to readers...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您想要的不是“测试覆盖率”,而是来自计算根的“可以调用”的传递闭包。 (在线程应用程序中,您必须包含“can fork”)。
您想要指定一些构成应用程序入口点的小函数集(可能只有 1 个),并且想要跟踪该小集合的所有可能的被调用者(有条件或无条件)。这是您必须拥有的一组功能。
由于动态调度,尤其是“eval”,Python 使得这一切变得非常困难(IIRC,我不是一个深入的 Python 专家)。对于应用于高度动态语言的静态分析器来说,推理可以调用哪些函数可能非常棘手。
人们可以使用测试覆盖率作为一种方式,将“可以调用”关系与特定的“确实调用”事实联系起来;这可以捕获大量动态调度(取决于您的测试套件覆盖范围)。那么你想要的结果是“can or did”调用的传递闭包。这仍然可能是错误的,但可能不会那么严重。
一旦您获得了一组“必要”函数,下一个问题将是从源文件中删除不必要的函数。如果您开始使用的文件数量很大,则删除无用文件的手动工作量可能会相当高。更糟糕的是,您可能会修改您的申请,然后关于保留哪些内容的答案会发生变化。因此,对于每次更改(发布),您都需要可靠地重新计算这个答案。
我的公司构建了一个工具,可以对 Java 包进行这种分析(带有有关动态加载和反射的适当警告):输入是一组 Java 文件和(如上所述)一组指定的根函数。该工具计算调用图,还查找所有死成员变量并生成两个输出:a)据称死方法和成员的列表,b)删除所有“死”内容的修订后的文件集。如果你相信a),那么你就使用b)。如果您认为 a) 是错误的,则将 a) 中列出的元素添加到根集合中并重复分析,直到您认为 a) 是正确的为止。为此,您需要一个静态分析工具来解析 Java、计算调用图,然后修改代码模块以删除死条目。基本思想适用于任何语言。
我预计您需要一个类似的 Python 工具。
也许您可以坚持只删除完全未使用的文件,尽管这可能仍然需要大量工作。
What you want isn't "test coverage", it is the transitive closure of "can call" from the root of the computation. (In threaded applications, you have to include "can fork").
You want to designate some small set (perhaps only 1) of functions that make up the entry points of your application, and want to trace through all possible callees (conditional or unconditional) of that small set. This is the set of functions you must have.
Python makes this very hard in general (IIRC, I'm not a deep Python expert) because of dynamic dispatch and especially due to "eval". Reasoning about what function can get called can be pretty tricky for a static analyzers applied to highly dynamic languages.
One might use test coverage as a way to seed the "can call" relation with specific "did call" facts; that could catch a lot of dynamic dispatches (dependent on your test suite coverage). Then the result you want is the transitive closure of "can or did" call. This can still be erroneous, but is likely to be less so.
Once you get a set of "necessary" functions, the next problem will be removing the unnecessary functions from the source files you have. If the number of files you start with is large, the manual effort to remove the dead stuff may be pretty high. Worse, you're likely to revise your application, and then the answer as to what to keep changes. So for every change (release), you need to reliably recompute this answer.
My company builds a tool that does this analysis for Java packages (with appropriate caveats regarding dynamic loads and reflection): the input is a set of Java files and (as above) a designated set of root functions. The tool computes the call graph, and also finds all dead member variables and produces two outputs: a) the list of purportedly dead methods and members, and b) a revised set of files with all the "dead" stuff removed. If you believe a), then you use b). If you think a) is wrong, then you add elements listed in a) to the set of roots and repeat the analysis until you think a) is right. To do this, you need a static analysis tool that parse Java, compute the call graph, and then revise the code modules to remove the dead entries. The basic idea applies to any language.
You'd need a similar tool for Python, I'd expect.
Maybe you can stick to just dropping files that are completely unused, although that may still be a lot of work.
正如其他人指出的那样,覆盖率可以告诉您执行了哪些代码。对您来说,诀窍是确保您的测试套件真正充分地运用了代码。这里的失败案例是过度修剪,因为您的测试跳过了一些生产中真正需要的代码。
请务必获取最新版本的coverage.py (v3.4):它添加了一个新功能来指示根本不会执行的文件。
BTW:: 对于第一次修剪,Python 提供了一个巧妙的技巧:删除源代码树中的所有 .pyc 文件,然后运行测试。仍然没有 .pyc 文件的文件显然没有被执行!
As others have pointed out, coverage can tell you what code has been executed. The trick for you is to be sure that your test suite truly exercises the code fully. The failure case here is over-pruning because your tests skipped some code that will really be needed in production.
Be sure to get the latest version of coverage.py (v3.4): it adds a new feature to indicate files that are never executed at all.
BTW:: for a first cut prune, Python provides a neat trick: remove all the .pyc files in your source tree, then run your tests. Files that still have no .pyc file were clearly not executed!
我没有使用覆盖率进行修剪,但看起来它应该效果很好。我使用了nosetests + 覆盖率的组合,它对我来说比figleaf 效果更好。特别是,我发现来自nosetests+coverage的html报告很有帮助——这应该有助于您了解库中未使用的部分在哪里。
I haven't used coverage for pruning out, but it seems like it should do well. I've used the combination of nosetests + coverage, and it worked better for me than figleaf. In particular, I found the html report from nosetests+coverage to be helpful -- this should be helpful to you in understanding where the unused portions of the library are.