从Python到CIL(C中间语言)的翻译
最近在做Python源码的静态分析。 我们组已经有一个用 Ocaml 编写的 CIL(C 中间语言)静态分析器。我们希望重用这个分析器,因此我们理想的方法是将 Python 转换为 CIL。
目前,我使用Python内置的ast模块将Python解析为Python AST。然后我将 ast.dump 打印的 Python AST 转换为 C AST。考虑到 C AST 到 CIL API 和静态分析器都是用 Ocaml 编写的。我选择 Ocamllex&Ocamlyacc 将 Python AST 解析为 C AST。然而,也存在一些大问题。
ast.dump 打印的 AST 表示形式很难识别。这使得我的解析器不容易实现。另一方面,我无法使用 Ocaml 访问 Python ast 内部结构。即使我可以,数据结构与 Ocaml 不同。
我想知道我一开始从Python代码到C AST的转换是否选择了错误的方法? 还有其他现有的工具或方法可以满足我的要求吗?
如果我有什么遗漏的地方,请指出,这对我会有很大的帮助。 谢谢。
I have worked on the static analysis on Python source code recently.
There is already a static analyzer written in Ocaml for CIL(C Intermediate Language) in our group. We want to reuse this analyzer, so our ideal approach is to translate Python to CIL.
Currently, I use Python built-in ast module to parse Python to Python AST. And then I translate the Python AST that ast.dump printed to C AST. In consider of C AST to CIL API and the static analyzer all written in Ocaml. I choose Ocamllex&Ocamlyacc to parse Python AST to C AST. However, there are some big problems.
The AST representation which ast.dump printed is hard to identify. That make my parser not easy to implement. On the other hand, I can't use Ocaml to acess the Python ast internal structure. Even I could, the data structure is different from Ocaml.
I wonder whether I choose a wrong approach on the translation from Python code to C AST at first?
Is there any other existing tools or approaches that may meet my requirements?
If there is anything I miss, please just point out that will be a lot of help for me.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为这不会很好地发挥作用。 CIL 本质上就是 C 语言。为了让你的技巧发挥作用,你已经将 Python 完全 翻译成 C...但是这些语言的概念非常不同。您将如何为 Python 对象建模?延续?动态负载?运行时打字?无限精度算术?我认为你的问题不是 AST;而是相反,它们是概念性的。
如果您可以翻译成CIL,那么您现在就会遇到一个新问题。当分析器需要查找的结构很容易检测到时,分析器就更容易构建。一旦将延续转换为 C,推理与延续的交互将会很困难,因为它们不容易识别。
我想我会花精力尝试构建一个 Python 静态分析器,其中的 Python 概念很容易检测。
I don't think this is going to work very well. CIL is essentially just the C langauge. For your trick to work, you have translate Python completely to C... but the langauges have very dissimilar concepts. How will you model Python objects? Continuations? Dynamic load? Runtime typing? Infinite precision arithmetic? I think your problems are not the AST; rather they are conceptual.
If you could translate to CIL, you'd now have a new problem. Analyzers are easier to build when the constructs they need to find are easily detected. Once you translate you continuation to C, reasoning about interactions with continuations will be hard, because they won't be easy to recognize.
I think I'd spend my energy trying to build a Python static analyzer where the Python concepts were easy to detect.