我想以编程方式编辑 python 源代码。 基本上我想读取 .py
文件,生成 AST,然后写回修改后的python源代码(即另一个.py
文件)。
有多种方法可以使用标准 python 模块解析/编译 python 源代码,例如 ast
或编译器
。 然而,我不认为他们中的任何一个支持修改源代码的方法(例如删除这个函数声明),然后写回修改的python源代码。
更新:我想这样做的原因是我想编写一个 突变测试库 python,主要是通过删除语句/表达式,重新运行测试并查看发生了什么问题。
I want to programmatically edit python source code. Basically I want to read a .py
file, generate the AST, and then write back the modified python source code (i.e. another .py
file).
There are ways to parse/compile python source code using standard python modules, such as ast
or compiler
. However, I don't think any of them support ways to modify the source code (e.g. delete this function declaration) and then write back the modifying python source code.
UPDATE: The reason I want to do this is I'd like to write a Mutation testing library for python, mostly by deleting statements / expressions, rerunning tests and seeing what breaks.
发布评论
评论(15)
内置的 ast 模块似乎没有转换回源代码的方法。 然而,这里的 codegen 模块为 ast 提供了一个漂亮的打印机,可以让你做所以。
例如。
这将打印:
请注意,您可能会丢失确切的格式和注释,因为它们不会被保留。
但是,您可能不需要这样做。 如果您需要的只是执行替换后的 AST,则只需在 AST 上调用compile(),然后执行生成的代码对象即可。
The builtin ast module doesn't seem to have a method to convert back to source. However, the codegen module here provides a pretty printer for the ast that would enable you do do so.
eg.
This will print:
Note that you may lose the exact formatting and comments, as these are not preserved.
However, you may not need to. If all you require is to execute the replaced AST, you can do so simply by calling compile() on the ast, and execing the resulting code object.
花了一段时间,但 Python 3.9 有这个:
https://docs.python.org/3.9/whatsnew/3.9.html#ast
https://docs.python.org/3.9/library/ast.html #ast.unparse
Took a while, but Python 3.9 has this:
https://docs.python.org/3.9/whatsnew/3.9.html#ast
https://docs.python.org/3.9/library/ast.html#ast.unparse
在另一个答案中,我建议使用
astor
包,但我后来发现了一个更新的 AST 解析包,名为astunparse
:我已经在 Python 3.5 上对此进行了测试。
In a different answer I suggested using the
astor
package, but I have since found a more up-to-date AST un-parsing package calledastunparse
:I have tested this on Python 3.5.
您可能不需要重新生成源代码。 当然,这对我来说有点危险,因为您实际上还没有解释为什么您认为需要生成一个充满代码的 .py 文件; 但是:
如果您想生成一个人们实际使用的 .py 文件,也许这样他们就可以填写表单并获取有用的 .py 文件插入到他们的项目中,那么您不想更改将其转换为 AST 并返回,因为您将丢失
所有格式(想想通过将相关行集分组在一起而使 Python 变得如此可读的空行)(ast 节点具有lineno
和col_offset
属性 ) 评论。 相反,您可能想要使用模板引擎(Django 模板例如,语言旨在使模板化(甚至是文本文件变得容易)自定义 .py 文件,或者使用 Rick Copeland 的 MetaPython 扩展。如果您尝试在模块编译期间进行更改,请注意您不必一直返回到文本; 您可以直接编译 AST,而不是将其重新转换为 .py 文件。
但在几乎所有情况下,您可能都在尝试做一些动态的事情,而 Python 这样的语言实际上使之变得非常容易,而无需编写新的 .py 文件! 如果您扩展问题以让我们知道您实际想要完成的任务,则新的 .py 文件可能根本不会参与答案; 我见过数百个 Python 项目在做数百个现实世界的事情,但没有一个项目需要编写 .py 文件。 因此,我必须承认,我对您是否找到了第一个好的用例持怀疑态度。 :-)
更新:既然你已经解释了你想要做什么,无论如何我都会想对 AST 进行操作。 您需要通过删除整个语句来进行变异,而不是删除文件中的行(这可能会导致半语句因语法错误而终止),还有什么地方比 AST 更好呢?
You might not need to re-generate source code. That's a bit dangerous for me to say, of course, since you have not actually explained why you think you need to generate a .py file full of code; but:
If you want to generate a .py file that people will actually use, maybe so that they can fill out a form and get a useful .py file to insert into their project, then you don't want to change it into an AST and back because you'll lose
all formatting (think of the blank lines that make Python so readable by grouping related sets of lines together)(ast nodes havelineno
andcol_offset
attributes) comments. Instead, you'll probably want to use a templating engine (the Django template language, for example, is designed to make templating even text files easy) to customize the .py file, or else use Rick Copeland's MetaPython extension.If you are trying to make a change during compilation of a module, note that you don't have to go all the way back to text; you can just compile the AST directly instead of turning it back into a .py file.
But in almost any and every case, you are probably trying to do something dynamic that a language like Python actually makes very easy, without writing new .py files! If you expand your question to let us know what you actually want to accomplish, new .py files will probably not be involved in the answer at all; I have seen hundreds of Python projects doing hundreds of real-world things, and not a single one of them needed to ever writer a .py file. So, I must admit, I'm a bit of a skeptic that you've found the first good use-case. :-)
Update: now that you've explained what you're trying to do, I'd be tempted to just operate on the AST anyway. You will want to mutate by removing, not lines of a file (which could result in half-statements that simply die with a SyntaxError), but whole statements — and what better place to do that than in the AST?
在ast模块的帮助下当然可以解析和修改代码结构,稍后我将在示例中展示它。 但是,仅使用
ast
模块无法写回修改后的源代码。 还有其他模块可用于此作业,例如 在这里。注意:下面的示例可以视为有关
ast
模块使用的介绍性教程,但有关使用ast
模块的更全面的指南可在 绿树蛇教程 和ast
模块的官方文档。ast
简介:只需调用 API
ast.parse()
即可解析 Python 代码(以字符串形式表示)。 这将返回抽象语法树 (AST) 结构的句柄。 有趣的是,您可以编译回该结构并执行它,如上所示。另一个非常有用的 API 是
ast.dump()
,它以字符串形式转储整个 AST。 它可以用来检查树结构,对调试非常有帮助。 例如,在 Python 2.7 上:
在 Python 3.5 上:
请注意 Python 2.7 与 Python 3.5 中 print 语句的语法差异以及各自的 AST 节点类型的差异树。
如何使用
ast
修改代码:现在,让我们看一下通过
ast
模块修改Python代码的示例。 修改AST结构的主要工具是ast.NodeTransformer类。 每当需要修改 AST 时,他/她都需要从它进行子类化并相应地编写节点转换。对于我们的示例,让我们尝试编写一个简单的实用程序,将 Python 2 、 print 语句转换为 Python 3 函数调用。
将语句打印到 Fun 调用转换器实用程序:print2to3.py:
可以在小型示例文件(例如下面的文件)上尝试此实用程序,并且它应该可以正常工作。
测试输入文件:py2.py
请注意,上述转换仅用于
ast
教程目的,在实际情况下,人们必须查看所有不同的场景,例如print " x is %s" % ("Hello Python")
.Parsing and modifying the code structure is certainly possible with the help of
ast
module and I will show it in an example in a moment. However, writing back the modified source code is not possible withast
module alone. There are other modules available for this job such as one here.NOTE: Example below can be treated as an introductory tutorial on the usage of
ast
module but a more comprehensive guide on usingast
module is available here at Green Tree snakes tutorial and official documentation onast
module.Introduction to
ast
:You can parse the python code (represented in string) by simply calling the API
ast.parse()
. This returns the handle to Abstract Syntax Tree (AST) structure. Interestingly you can compile back this structure and execute it as shown above.Another very useful API is
ast.dump()
which dumps the whole AST in a string form. It can be used to inspect the tree structure and is very helpful in debugging. For example,On Python 2.7:
On Python 3.5:
Notice the difference in syntax for print statement in Python 2.7 vs. Python 3.5 and the difference in type of AST node in respective trees.
How to modify code using
ast
:Now, let's a have a look at an example of modification of python code by
ast
module. The main tool for modifying AST structure isast.NodeTransformer
class. Whenever one needs to modify the AST, he/she needs to subclass from it and write Node Transformation(s) accordingly.For our example, let's try to write a simple utility which transforms the Python 2 , print statements to Python 3 function calls.
Print statement to Fun call converter utility: print2to3.py:
This utility can be tried on small example file, such as one below, and it should work fine.
Test Input file : py2.py
Please note that above transformation is only for
ast
tutorial purpose and in real case scenario one will have to look at all different scenarios such asprint " x is %s" % ("Hello Python")
.如果您在 2019 年查看此内容,那么您可以使用这个 libcst
包裹。 它的语法类似于 ast。 这就像一个魅力,并且保留了代码结构。 它对于必须保留注释、空格、换行符等的项目基本上很有帮助。
如果您不需要关心保留注释、空格等,那么 ast 和 astor 效果很好。
If you are looking at this in 2019, then you can use this libcst
package. It has syntax similar to ast. This works like a charm, and preserve the code structure. It's basically helpful for the project where you have to preserve comments, whitespace, newline etc.
If you don't need to care about the preserving comments, whitespace and others, then the combination of ast and astor works well.
我最近创建了相当稳定的(核心确实经过了很好的测试)和可扩展的代码段,它从
ast
树生成代码:https://github.com/paluh/code-formatter 。我使用我的项目作为一个小型 vim 插件(我每天都在使用)的基础,所以我的目标是生成非常漂亮且可读的 python 代码。
聚苯乙烯
我尝试扩展
codegen
,但它的架构基于ast.NodeVisitor
接口,因此格式化程序(visitor_
方法)只是函数。 我发现这种结构非常有限且难以优化(在长嵌套表达式的情况下,更容易保留对象树并缓存一些部分结果 - 换句话说,如果您想搜索最佳布局,您可以达到指数复杂度)。 但是codegen
因为mitsuhiko 的每一篇作品(我读过)都写得很好并且简洁。I've created recently quite stable (core is really well tested) and extensible piece of code which generates code from
ast
tree: https://github.com/paluh/code-formatter .I'm using my project as a base for a small vim plugin (which I'm using every day), so my goal is to generate really nice and readable python code.
P.S.
I've tried to extend
codegen
but it's architecture is based onast.NodeVisitor
interface, so formatters (visitor_
methods) are just functions. I've found this structure quite limiting and hard to optimize (in case of long and nested expressions it's easier to keep objects tree and cache some partial results - in other way you can hit exponential complexity if you want to search for best layout). BUTcodegen
as every piece of mitsuhiko's work (which I've read) is very well written and concise.其他答案之一推荐
codegen
,它似乎已被astor
。 PyPI 上的astor
版本(撰写本文时为 0.5 版) )似乎也有点过时了,因此您可以按如下方式安装astor
的开发版本。然后,您可以使用 astor.to_source 将 Python AST 转换为人类可读的 Python 源代码:
我已在 Python 3.5 上对此进行了测试。
One of the other answers recommends
codegen
, which seems to have been superceded byastor
. The version ofastor
on PyPI (version 0.5 as of this writing) seems to be a little outdated as well, so you can install the development version ofastor
as follows.Then you can use
astor.to_source
to convert a Python AST to human-readable Python source code:I have tested this on Python 3.5.
不幸的是,上面的答案实际上都没有满足这两个条件
我最近编写了一个小工具包来进行基于纯 AST 的重构,名为 refactor。 例如,如果您想用
42
替换所有placeholder
,您可以简单地编写这样的规则;它会找到所有可接受的节点,用新节点替换它们并生成最终的形式;
Unfortunately none of the answers above actually met both of these conditions
I've recently written a small toolkit to do pure AST based refactorings, called refactor. For example if you want to replace all
placeholder
s with42
, you can simply write a rule like this;And it will find all acceptable nodes, replace them with the new nodes and generate the final form;
我们有类似的需求,这里的其他答案没有解决这个需求。 因此,我们为此创建了一个库 ASTTokens,它采用使用 ast 或 astroid 模块,并用原始源代码中的文本范围对其进行标记。
它不会直接修改代码,但是在顶部添加并不难,因为它确实告诉您需要修改的文本范围。
例如,这将函数调用包装在
WRAP(...)
中,保留注释和其他所有内容:生成:
希望这会有所帮助!
We had a similar need, which wasn't solved by other answers here. So we created a library for this, ASTTokens, which takes an AST tree produced with the ast or astroid modules, and marks it with the ranges of text in the original source code.
It doesn't do modifications of code directly, but that's not hard to add on top, since it does tell you the range of text you need to modify.
For example, this wraps a function call in
WRAP(...)
, preserving comments and everything else:Produces:
Hope this helps!
程序转换系统是一个解析源文本、构建 AST 并允许您修改它们的工具使用源到源的转换(“如果您看到此模式,请将其替换为该模式”)。 此类工具非常适合对现有源代码进行突变,即“如果您看到此模式,则用模式变体替换”。
当然,您需要一个程序转换引擎,它可以解析您感兴趣的语言,并且仍然执行模式导向的转换。 我们的 DMS Software Reengineering Toolkit 是一个可以做到这一点的系统,并处理 Python ,以及各种其他语言。
请参阅SO 答案,了解用于 Python 准确捕获评论的 DMS 解析 AST 的示例。 DMS 可以对 AST 进行更改,并重新生成有效文本,包括注释。 您可以要求它使用自己的格式约定(您可以更改这些约定)来漂亮地打印 AST,或者进行“保真打印”,它使用原始行和列信息来最大程度地保留原始布局(布局中的一些更改,其中新代码插入是不可避免的)。
要使用 DMS 为 Python 实现“变异”规则,您可以编写以下内容:
此规则以语法正确的方式将“+”替换为“-”; 它在 AST 上运行,因此不会触及碰巧看起来正确的字符串或注释。 “mutate_this_place”的额外条件是让你控制这种情况发生的频率; 你不想改变程序中的每个位置。
您显然需要更多这样的规则来检测各种代码结构,并用变异版本替换它们。 DMS 很乐意应用一套规则。 然后对突变的 AST 进行漂亮打印。
A Program Transformation System is a tool that parses source text, builds ASTs, allows you to modify them using source-to-source transformations ("if you see this pattern, replace it by that pattern"). Such tools are ideal for doing mutation of existing source codes, which are just "if you see this pattern, replace by a pattern variant".
Of course, you need a program transformation engine that can parse the language of interest to you, and still do the pattern-directed transformations. Our DMS Software Reengineering Toolkit is a system that can do that, and handles Python, and a variety of other languages.
See this SO answer for an example of a DMS-parsed AST for Python capturing comments accurately. DMS can make changes to the AST, and regenerate valid text, including the comments. You can ask it to prettyprint the AST, using its own formatting conventions (you can changes these), or do "fidelity printing", which uses the original line and column information to maximally preserve the original layout (some change in layout where new code is inserted is unavoidable).
To implement a "mutation" rule for Python with DMS, you could write the following:
This rule replace "+" with "-" in a syntactically correct way; it operates on the AST and thus won't touch strings or comments that happen to look right. The extra condition on "mutate_this_place" is to let you control how often this occurs; you don't want to mutate every place in the program.
You'd obviously want a bunch more rules like this that detect various code structures, and replace them by the mutated versions. DMS is happy to apply a set of rules. The mutated AST is then prettyprinted.
我已经编写了几个实用程序来执行此类操作,在每种情况下我选择的工具都是
libcst
。 Instagram 创建这个是为了操纵他们的 Python 代码库; 例如插入类型注释。 诚然,它不是使用 AST,而是使用 CST,但结构非常相似,并且易于使用。I have written several utilities to do this kind of thing, and in each case my tool of choice was
libcst
. Instagram created this for manipulating their Python code base; e.g. to insert type annotations. Admittedly it is not using the AST, its a CST, but the structure is quite similar, and its easy to use.我曾经使用 baron 来实现此目的,但后来改用了 parso,因为它与现代 python 保持同步。 不幸的是,Python 的解析器已经发生了很大的变化,而到 2024 年,parso 还没有跟上。
我还需要这个作为突变测试器。 使用 parso 制作一个非常简单,请查看我的代码 https://github.com/boxed/mutmut
I used to use baron for this, but switched to parso because it was up to date with modern python. Unfortunately Python has changed their parser quite a lot and parso has not caught up as of 2024.
I also needed this for a mutation tester. It's really quite simple to make one with parso, check out my code at https://github.com/boxed/mutmut
最小代码
min code
Pythscope 对它自动生成的测试用例执行此操作,就像 2to3 python 2.6 工具(它将 python 2.x 源代码转换为 python 3.x 源代码)。
这两个工具都使用 lib2to3 库,它是 python 的实现解析器/编译器机制,可以在从源代码往返时保留源代码中的注释 -> AST-> 来源。
如果您想要进行更多重构(例如转换),rope 项目可能会满足您的需求。
ast 模块是您的另一个选择,并且 有一个较旧的示例,说明如何将语法树“解析”回代码(使用解析器模块) 。 但是,当对代码进行 AST 转换然后将其转换为代码对象时,ast 模块更有用。
redbaron 项目也可能是一个不错的选择 (ht Xavier Combelle)
Pythoscope does this to the test cases it automatically generates as does the 2to3 tool for python 2.6 (it converts python 2.x source into python 3.x source).
Both these tools uses the lib2to3 library which is an implementation of the python parser/compiler machinery that can preserve comments in source when it's round tripped from source -> AST -> source.
The rope project may meet your needs if you want to do more refactoring like transforms.
The ast module is your other option, and there's an older example of how to "unparse" syntax trees back into code (using the parser module). But the
ast
module is more useful when doing an AST transform on code that is then transformed into a code object.The redbaron project also may be a good fit (ht Xavier Combelle)