适用于任意编程语言或 IR 的 AST
是否可以单独使用 C 或 C++ 为任意编程语言或 IR 创建 AST(无需 YACC 和 LEX 等工具的帮助)?
如果是这样,如何实现词法和句法分析?
如果没有,必须增强到 C 或 C++ 才能成功创建 AST 的工具是什么?
希望我能澄清我的疑问。如果我的问题看起来含糊或断章取义,请指出必填项。
PS:我实际上正在尝试为 LLVM 的 .ll 格式的 IR 表示创建 AST。我确实知道 .ll 源自 AST。但我正在尝试静态分析实践。所以我正在考虑创建 AST。
Is it possible to create an AST for any arbitrary programming language or IR using C or C++ alone (without the help of tools like YACC and LEX )?
If so, how to implement the lexical and syntactic analysis ?
If not, what are the tools that have to augmented to C or C++ to successfully create an AST ?
Hope I made my doubt clear. If My question looks vague or out of context, please indicate the required.
P.S : I Am actually trying to create the AST for the LLVM's .ll format of IR representation. I do know that .ll is derived from AST. But I am trying out static analysis practices. So I am looking at creating the AST.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在没有解析器生成器的情况下创建解析器的最直接方法是递归下降。它有很好的记录 - 该领域的标准书籍是 The Dragon Book 。
可以使用标准字符串操作技术来编写将文本作为输入并生成一串标记作为输出的扫描器。
The most straight-forward methodology for creating the parser without a parser-generator is recursive descent. It is very well documented - the standard book in the field is The Dragon Book.
A scanner, that takes text as input and produces a string of tokens as output, can be written using standard string manipulation techniques.
我怀疑您的任意语言和 LLVM 的 AST 之间是否存在一对一的映射。
这意味着您可能确实希望分两个阶段执行此操作:
使用您可以获得的最佳解析工具来解析您的“任意语言”,以简化解析语言的问题。使用它为您的语言构建 AST,遵循解析器生成器生成 AST 的标准方法。 LEX/YACC 都可以,但是还有很多好的替代品。您很可能需要构建一个符号表。
遍历您解析的语言的 AST 来构建您的 LLVM AST。这不会是一对一的,但是能够在 AST 中的树节点附近环视树以收集生成 LLVM 代码所需的信息可能会非常有帮助。
这是简单编译器的经典风格。
我建议您阅读 Aho/Ullman Dragon 关于语法定向翻译的书。一天的教育将为您节省数月浪费的工程时间。
I doubt there's a one-to-one mapping between your arbitrary langauge and LLVM's ASTs.
That means it is likely that you really want to do this in two stages:
Parse your 'arbitrary language' using the best parsing tools you can get to simplify the problem of parsing your language. Use that to build an AST for your language, following standard methods for parser generators producing ASTs. LEX/YACC are OK but there are plenty of good alternatives out there. Its pretty likely you'll need to build a symbol table.
Walk the AST of the your parsed langauge to build your LLVM AST. This won't be one-to-one, but the ability to look around the tree near a tree node in your AST to collect information need to generate the LLVM code will likely be extremely helpful.
This is a classic style for a simple compiler.
I suggest you read the Aho/Ullman Dragon book on syntax directed translation. A day's worth of education will save you months of wasted engineering time.