如何将令牌流转换为解析树

发布于 2024-07-12 05:53:38 字数 1542 浏览 14 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

奈何桥上唱咆哮 2024-07-19 05:53:38

我真的会推荐 http://www.antlr.org/,当然还有经典的 Dragon Compilers 书。

对于像 JavaScript 这样的简单语言,手动运行递归下降解析器并不难,但使用 yacc 或 antlr 这样的工具几乎总是更容易。

我认为回到你的问题的基础知识,你真的想研究 BNF 式语法并为你的目标选择一种语法。 如果你有这个,解析树应该会脱落,成为该语法的“实例”表现。

另外,不要尝试将解析树的创建转变为最终解决方案(例如生成代码或其他什么)。 这似乎是可行的,而且更有效; 但总有一天,您会真正希望自己拥有“原样”的解析树。

I would really recommend http://www.antlr.org/ and of course the classic Dragon Compilers book.

For an easy language like JavaScript it's not hard to hand roll a recursive descent parser, but it's almost always easier to use a tool like yacc or antlr.

I think to step back to the basics of your question, you really want to study up on BNF-esque grammar syntax and pick a syntax for your target. If you have that, the parse tree should sort of fall out, being the 'instance' manifestation of that grammar.

Also, don't try to turn the creation of your parse tree into your final solution (like generating code, or what-not). It might seem do-able and more effecient; but invariably there will come a time when you'll really wish you had that parse tree 'as is' laying around.

挽心 2024-07-19 05:53:38

您应该研究适合您平台的解析器生成器工具。 解析器生成器允许您为您的语言指定上下文无关语法。 该语言由许多规则组成,这些规则将一系列符号“简化”为新符号。 您通常还可以指定不同规则的优先级和关联性,以消除语言中的歧义。 例如,一个非常简单的计算器语言可能看起来像这样:

%left PLUS, MINUS           # low precedence, evaluated left-to-right
%left TIMES, DIV            # high precedence, left-to-right

expr ::= INT
| expr PLUS expr
| expr MINUS expr
| expr TIMES expr
| expr DIV expr
| LEFT_PAREN expr RIGHT_PAREN

通常,您可以将一些代码与每个规则相关联,以根据该规则中的其他符号构造一个新值(在本例中是一个表达式)。 解析器生成器将接收语法并用您的语言生成代码,将令牌流转换为解析树。

大多数解析器生成器都是特定于语言的。 ANTLR 众所周知,支持 C、C++、Objective C、Java 和 Python。 我听说它很难使用。 我使用过用于 C/C++ 的 bison、用于 Java 的 CUP 以及用于 OCaml 的 ocamlyacc,它们都非常好。 如果您已经在使用词法分析器生成器,则应该寻找与其专门兼容的解析器生成器。

You should investigate parser generator tools for your platform. A parser generator allows you to specify a context-free grammar for your language. The language consists of a number of rules which "reduce" a series of symbols into a new symbol. You can usually also specify precedence and associativity for different rules to eliminate ambiguity in the language. For instance, a very simple calculator language might look something like this:

%left PLUS, MINUS           # low precedence, evaluated left-to-right
%left TIMES, DIV            # high precedence, left-to-right

expr ::= INT
| expr PLUS expr
| expr MINUS expr
| expr TIMES expr
| expr DIV expr
| LEFT_PAREN expr RIGHT_PAREN

Usually, you can associate a bit of code with each rule to construct a new value (in this case an expression) from the other symbols in that rule. The parser generator will take in the grammar and produce code in your language that translates a token stream to a parse tree.

Most parser generators are language specific. ANTLR is well-known and supports C, C++, Objective C, Java, and Python. I've heard it's hard to use though. I've used bison for C/C++, CUP for Java, and ocamlyacc for OCaml, and they're all pretty good. If you are already using a lexer generator, you should look for a parser generator that is specifically compatible with it.

红衣飘飘貌似仙 2024-07-19 05:53:38

我相信一种常见的方法是使用有限状态机。 例如,如果您读取一个操作数,您将进入接下来需要运算符的状态,并且通常使用该运算符作为操作数的根节点,依此类推。

I believe a common a approach is to use a Finite State Machine. For example if you read an operand you go into a state where you next expect an operator, and you usually use the operator as the root node for the operands and so on.

蒲公英的约定 2024-07-19 05:53:38

正如 Marcos Marin 上面所描述的,如果您想自己做的话,使用 BNF 中的语言规则来解析标记列表的状态机就可以解决问题。 只是,正如 Paul Hollingsworth 在上面的评论中所说,更简单的方法是使用 Pushdown-Automaton它有一个简单的 FiFo 内存堆栈。
每个类别的标记在语法中都有下一个预期标记,该标记也在状态机中表示。 堆栈用于“记住”先前的标记类是什么,以减少所需的状态(可以在没有堆栈的情况下完成,但您需要为语法树中的每个类和子类拆分一个新的状态)。
接受状态将是(在自然语言和大多数编程语言中也是如此)起始状态,并且在特定情况下可能是其他一些状态。

如果你想使用一个工具(速度更快,范围更小),我的建议是 Antlr。 祝你好运!

As described above by Marcos Marin, a state machine that uses your language rules in BNF to parse your token list will do the trick if you want to do it yourself. Only, as said in above comment by Paul Hollingsworth, the easier way is to use a Pushdown-Automaton that has a simple FiFo memory stack.
Every class of token has a next expected token in your grammar, which also is represented in your state-machine. The stack is used to "remember" what the previous token class was, to reduce the required states (could be done without stack, but you would need a new state for every class and subclass split in the grammar tree).
The accepting state(s) would be (in natural languages and most programming languages too) the starting state, and maybe some other state in particular cases.

Antlr would be my suggestion if you want to use a tool (waaay faster and less extensive). Good luck!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文