递归下降解析 - 从 LL(1) 开始
下面的简单“计算器表达式”语法 (BNF) 可以使用简单的递归下降解析器轻松解析,该解析器具有预测性 LL(1):
<expr> := <term> + <term>
| <term> - <term>
| <term>
<term> := <factor> * <factor>
<factor> / <factor>
<factor>
<factor> := <number>
| <id>
| ( <expr> )
<number> := \d+
<id> := [a-zA-Z_]\w+
因为看到下一个标记就足以了解要选择的规则。 但是,假设我添加以下规则:
<command> := <expr>
| <id> = <expr>
为了在命令行上与计算器进行交互,使用变量,如下所示:
calc> 5+5
=> 10
calc> x = 8
calc> 6 * x + 1
=> 49
我是否真的不能使用简单的 LL(1) 预测解析器来解析 <命令>
规则? 我尝试为其编写解析器,但似乎我需要了解更多的标记。 解决方案是使用回溯,还是我可以只实现 LL(2) 并始终向前查看两个标记?
RD 解析器生成器如何处理这个问题(例如 ANTLR)?
The following simple "calculator expression" grammar (BNF) can be easily parsed with the a trivial recursive-descent parser, which is predictive LL(1):
<expr> := <term> + <term>
| <term> - <term>
| <term>
<term> := <factor> * <factor>
<factor> / <factor>
<factor>
<factor> := <number>
| <id>
| ( <expr> )
<number> := \d+
<id> := [a-zA-Z_]\w+
Because it is always enough to see the next token in order to know the rule to pick. However, suppose that I add the following rule:
<command> := <expr>
| <id> = <expr>
For the purpose of interacting with the calculator on the command line, with variables, like this:
calc> 5+5
=> 10
calc> x = 8
calc> 6 * x + 1
=> 49
Is it true that I can not use a simple LL(1) predictive parser to parse <command>
rules ? I tried to write the parser for it, but it seems that I need to know more tokens forward. Is the solution to use backtracking, or can I just implement LL(2) and always look two tokens forward ?
How to RD parser generators handle this problem (ANTLR, for instance)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
问题在于语法:
不是一个相互递归的过程。 对于递归体面的解析器,您将需要确定非递归等效项。
rdentato 帖子展示了如何解决这个问题,假设您可以使用语法。 此幻灯片更详细地说明了问题并展示了如何纠正它:
http://www.google.com/url?sa =t&source=web&ct=res&cd=7&url=http%3A%2F%2Fxml.cs.nccu.edu.tw%2Fcourses%2Fcompiler%2Fcp2006%2Fslides%2Flec3-Parsing%26TopDownParsing.ppt&ei =-YLaSPrWGaPwhAK5ydCqBQ&usg=AFQjCNGAFrODJxoxkgJEwDMQ8A8594vn0Q&sig2=nlYKQVfakmqy_57137XzrQ
The problem is that the grammar:
is not a mutually-recursive procedure. For a recursive decent parser you will need to determine a non-recursive equivalent.
rdentato post's shows how to fix this, assuming you can play with the grammar. This powerpoint spells out the problem in a bit more detail and shows how to correct it:
http://www.google.com/url?sa=t&source=web&ct=res&cd=7&url=http%3A%2F%2Fxml.cs.nccu.edu.tw%2Fcourses%2Fcompiler%2Fcp2006%2Fslides%2Flec3-Parsing%26TopDownParsing.ppt&ei=-YLaSPrWGaPwhAK5ydCqBQ&usg=AFQjCNGAFrODJxoxkgJEwDMQ8A8594vn0Q&sig2=nlYKQVfakmqy_57137XzrQ
问题在于
,当您“看到”
时,您无法判断它是作业的开始(第二条规则)还是“
”代码>”。 您只会知道何时读取下一个令牌。AFAIK ANTLR 是 LL(*) (如果我没记错的话,也能够生成rat-pack解析器),所以它可能会同时考虑两个标记来处理这个语法。
如果您可以使用语法,我建议为分配添加关键字(例如
let x = 8
):或使用
=
表示评估:THe problem with
is that when you "see"
<id>
you can't tell if it's the beginning of an assignement (second rule) or it's a "<factor>
". You will only know when you'll read the next token.AFAIK ANTLR is LL(*) (and is also able to generate rat-pack parsers if I'm not mistaken) so it will probably handle this grammare considering two tokens at once.
If you can play with the grammar I would suggest to either add a keyword for the assignment (e.g.
let x = 8
) :or use the
=
to signify evaluation:我认为有两种方法可以使用递归下降解析器来解决这个问题:要么使用(更多)前瞻,要么通过回溯。
前瞻
回溯
I think there are two ways to solve this with a recursive descent parser: either by using (more) lookahead or by backtracking.
Lookahead
Backtracking
ANTLR 3 使用“LL(*)”解析器而不是 LL (k) 解析器,因此如果必须的话,它将使用专门优化的确定性有限自动机 (DFA) 向前查找,直到到达输入末尾,而不进行回溯。
ANTLR 3 uses a "LL(*)" parser as opposed to a LL(k) parser, so it will look ahead until it reaches the end of the input if it has to, without backtracking, using a specially optimized determinstic finite automata (DFA).