递归下降解析 - 从 LL(1) 开始

发布于 2024-07-05 21:03:30 字数 951 浏览 12 评论 0原文

下面的简单“计算器表达式”语法 (BNF) 可以使用简单的递归下降解析器轻松解析，该解析器具有预测性 LL(1)：

<expr>      :=  <term> + <term>
            |   <term> - <term>
            |   <term>
<term>      :=  <factor> * <factor>
                <factor> / <factor>
                <factor>
<factor>    :=  <number>
            |   <id>
            |   ( <expr> )
<number>    :=  \d+
<id>        :=  [a-zA-Z_]\w+

因为看到下一个标记就足以了解要选择的规则。但是，假设我添加以下规则：

<command>   :=  <expr>
            |   <id> = <expr>

为了在命令行上与计算器进行交互，使用变量，如下所示：

calc> 5+5
=> 10
calc> x = 8
calc> 6 * x + 1
=> 49

我是否真的不能使用简单的 LL(1) 预测解析器来解析 <命令> 规则？我尝试为其编写解析器，但似乎我需要了解更多的标记。解决方案是使用回溯，还是我可以只实现 LL(2) 并始终向前查看两个标记？

RD 解析器生成器如何处理这个问题（例如 ANTLR）？

原文

The following simple "calculator expression" grammar (BNF) can be easily parsed with the a trivial recursive-descent parser, which is predictive LL(1):

<expr>      :=  <term> + <term>
            |   <term> - <term>
            |   <term>
<term>      :=  <factor> * <factor>
                <factor> / <factor>
                <factor>
<factor>    :=  <number>
            |   <id>
            |   ( <expr> )
<number>    :=  \d+
<id>        :=  [a-zA-Z_]\w+

Because it is always enough to see the next token in order to know the rule to pick. However, suppose that I add the following rule:

<command>   :=  <expr>
            |   <id> = <expr>

For the purpose of interacting with the calculator on the command line, with variables, like this:

calc> 5+5
=> 10
calc> x = 8
calc> 6 * x + 1
=> 49

Is it true that I can not use a simple LL(1) predictive parser to parse <command> rules ? I tried to write the parser for it, but it seems that I need to know more tokens forward. Is the solution to use backtracking, or can I just implement LL(2) and always look two tokens forward ?

How to RD parser generators handle this problem (ANTLR, for instance)?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

眉黛浅 2024-07-12 21:03:30

问题在于语法：


<command>   :=  <expr>
            |   <id> = <expr>

不是一个相互递归的过程。对于递归体面的解析器，您将需要确定非递归等效项。

rdentato 帖子展示了如何解决这个问题，假设您可以使用语法。此幻灯片更详细地说明了问题并展示了如何纠正它：
http://www.google.com/url?sa =t&source=web&ct=res&cd=7&url=http%3A%2F%2Fxml.cs.nccu.edu.tw%2Fcourses%2Fcompiler%2Fcp2006%2Fslides%2Flec3-Parsing%26TopDownParsing.ppt&ei =-YLaSPrWGaPwhAK5ydCqBQ&usg=AFQjCNGAFrODJxoxkgJEwDMQ8A8594vn0Q&sig2=nlYKQVfakmqy_57137XzrQ

The problem is that the grammar:


<command>   :=  <expr>
            |   <id> = <expr>

is not a mutually-recursive procedure. For a recursive decent parser you will need to determine a non-recursive equivalent.

rdentato post's shows how to fix this, assuming you can play with the grammar. This powerpoint spells out the problem in a bit more detail and shows how to correct it:
http://www.google.com/url?sa=t&source=web&ct=res&cd=7&url=http%3A%2F%2Fxml.cs.nccu.edu.tw%2Fcourses%2Fcompiler%2Fcp2006%2Fslides%2Flec3-Parsing%26TopDownParsing.ppt&ei=-YLaSPrWGaPwhAK5ydCqBQ&usg=AFQjCNGAFrODJxoxkgJEwDMQ8A8594vn0Q&sig2=nlYKQVfakmqy_57137XzrQ

回复收藏 0 原文

岛徒 2024-07-12 21:03:30

问题在于

<command>   :=  <expr>
            |   <id> = <expr>

，当您“看到”时，您无法判断它是作业的开始（第二条规则）还是“”代码>”。您只会知道何时读取下一个令牌。

AFAIK ANTLR 是 LL(*) （如果我没记错的话，也能够生成rat-pack解析器），所以它可能会同时考虑两个标记来处理这个语法。

如果您可以使用语法，我建议为分配添加关键字（例如 let x = 8）：

<command>   :=  <expr>
            |   "let" <id> "=" <expr>

或使用 = 表示评估：

<command>   :=  "=" <expr>
            |   <id> "=" <expr>

THe problem with

<command>   :=  <expr>
            |   <id> = <expr>

is that when you "see" <id> you can't tell if it's the beginning of an assignement (second rule) or it's a "<factor>". You will only know when you'll read the next token.

AFAIK ANTLR is LL(*) (and is also able to generate rat-pack parsers if I'm not mistaken) so it will probably handle this grammare considering two tokens at once.

If you can play with the grammar I would suggest to either add a keyword for the assignment (e.g. let x = 8) :

<command>   :=  <expr>
            |   "let" <id> "=" <expr>

or use the = to signify evaluation:

<command>   :=  "=" <expr>
            |   <id> "=" <expr>

回复收藏 0 原文

梦冥 2024-07-12 21:03:30

我认为有两种方法可以使用递归下降解析器来解决这个问题：要么使用（更多）前瞻，要么通过回溯。

前瞻

command() {
    if (currentToken() == id && lookaheadToken() == '=') {
        return assignment();
    } else {
        return expr();
    }
}

回溯

command() {
    savedLocation = scanLocation();
    if (accept( id )) {
         identifier = acceptedTokenValue();
         if (!accept( '=' )) {
             setScanLocation( savedLocation );
             return expr();
         }
         return new assignment( identifier, expr() );
    } else {
         return expr();
    }
}

I think there are two ways to solve this with a recursive descent parser: either by using (more) lookahead or by backtracking.

Lookahead

command() {
    if (currentToken() == id && lookaheadToken() == '=') {
        return assignment();
    } else {
        return expr();
    }
}

Backtracking

command() {
    savedLocation = scanLocation();
    if (accept( id )) {
         identifier = acceptedTokenValue();
         if (!accept( '=' )) {
             setScanLocation( savedLocation );
             return expr();
         }
         return new assignment( identifier, expr() );
    } else {
         return expr();
    }
}

回复收藏 0 原文