递归下降解析 - 从 LL(1) 开始

发布于 2024-07-05 21:03:30 字数 951 浏览 10 评论 0原文

下面的简单“计算器表达式”语法 (BNF) 可以使用简单的递归下降解析器轻松解析,该解析器具有预测性 LL(1):

<expr>      :=  <term> + <term>
            |   <term> - <term>
            |   <term>
<term>      :=  <factor> * <factor>
                <factor> / <factor>
                <factor>
<factor>    :=  <number>
            |   <id>
            |   ( <expr> )
<number>    :=  \d+
<id>        :=  [a-zA-Z_]\w+

因为看到下一个标记就足以了解要选择的规则。 但是,假设我添加以下规则:

<command>   :=  <expr>
            |   <id> = <expr>

为了在命令行上与计算器进行交互,使用变量,如下所示:

calc> 5+5
=> 10
calc> x = 8
calc> 6 * x + 1
=> 49

我是否真的不能使用简单的 LL(1) 预测解析器来解析 <命令> 规则? 我尝试为其编写解析器,但似乎我需要了解更多的标记。 解决方案是使用回溯,还是我可以只实现 LL(2) 并始终向前查看两个标记?

RD 解析器生成器如何处理这个问题(例如 ANTLR)?

The following simple "calculator expression" grammar (BNF) can be easily parsed with the a trivial recursive-descent parser, which is predictive LL(1):

<expr>      :=  <term> + <term>
            |   <term> - <term>
            |   <term>
<term>      :=  <factor> * <factor>
                <factor> / <factor>
                <factor>
<factor>    :=  <number>
            |   <id>
            |   ( <expr> )
<number>    :=  \d+
<id>        :=  [a-zA-Z_]\w+

Because it is always enough to see the next token in order to know the rule to pick. However, suppose that I add the following rule:

<command>   :=  <expr>
            |   <id> = <expr>

For the purpose of interacting with the calculator on the command line, with variables, like this:

calc> 5+5
=> 10
calc> x = 8
calc> 6 * x + 1
=> 49

Is it true that I can not use a simple LL(1) predictive parser to parse <command> rules ? I tried to write the parser for it, but it seems that I need to know more tokens forward. Is the solution to use backtracking, or can I just implement LL(2) and always look two tokens forward ?

How to RD parser generators handle this problem (ANTLR, for instance)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

眉黛浅 2024-07-12 21:03:30

问题在于语法:


<command>   :=  <expr>
            |   <id> = <expr>

不是一个相互递归的过程。 对于递归体面的解析器,您将需要确定非递归等效项。

rdentato 帖子展示了如何解决这个问题,假设您可以使用语法。 此幻灯片更详细地说明了问题并展示了如何纠正它:
http://www.google.com/url?sa =t&source=web&ct=res&cd=7&url=http%3A%2F%2Fxml.cs.nccu.edu.tw%2Fcourses%2Fcompiler%2Fcp2006%2Fslides%2Flec3-Parsing%26TopDownParsing.ppt&ei =-YLaSPrWGaPwhAK5ydCqBQ&usg=AFQjCNGAFrODJxoxkgJEwDMQ8A8594vn0Q&sig2=nlYKQVfakmqy_57137XzrQ

The problem is that the grammar:


<command>   :=  <expr>
            |   <id> = <expr>

is not a mutually-recursive procedure. For a recursive decent parser you will need to determine a non-recursive equivalent.

rdentato post's shows how to fix this, assuming you can play with the grammar. This powerpoint spells out the problem in a bit more detail and shows how to correct it:
http://www.google.com/url?sa=t&source=web&ct=res&cd=7&url=http%3A%2F%2Fxml.cs.nccu.edu.tw%2Fcourses%2Fcompiler%2Fcp2006%2Fslides%2Flec3-Parsing%26TopDownParsing.ppt&ei=-YLaSPrWGaPwhAK5ydCqBQ&usg=AFQjCNGAFrODJxoxkgJEwDMQ8A8594vn0Q&sig2=nlYKQVfakmqy_57137XzrQ

岛徒 2024-07-12 21:03:30

问题在于

<command>   :=  <expr>
            |   <id> = <expr>

,当您“看到”时,您无法判断它是作业的开始(第二条规则)还是“”代码>”。 您只会知道何时读取下一个令牌。

AFAIK ANTLR 是 LL(*) (如果我没记错的话,也能够生成rat-pack解析器),所以它可能会同时考虑两个标记来处理这个语法。

如果您可以使用语法,我建议为分配添加关键字(例如 let x = 8):

<command>   :=  <expr>
            |   "let" <id> "=" <expr>

或使用 = 表示评估:

<command>   :=  "=" <expr>
            |   <id> "=" <expr>

THe problem with

<command>   :=  <expr>
            |   <id> = <expr>

is that when you "see" <id> you can't tell if it's the beginning of an assignement (second rule) or it's a "<factor>". You will only know when you'll read the next token.

AFAIK ANTLR is LL(*) (and is also able to generate rat-pack parsers if I'm not mistaken) so it will probably handle this grammare considering two tokens at once.

If you can play with the grammar I would suggest to either add a keyword for the assignment (e.g. let x = 8) :

<command>   :=  <expr>
            |   "let" <id> "=" <expr>

or use the = to signify evaluation:

<command>   :=  "=" <expr>
            |   <id> "=" <expr>
梦冥 2024-07-12 21:03:30

我认为有两种方法可以使用递归下降解析器来解决这个问题:要么使用(更多)前瞻,要么通过回溯。

前瞻

command() {
    if (currentToken() == id && lookaheadToken() == '=') {
        return assignment();
    } else {
        return expr();
    }
}

回溯

command() {
    savedLocation = scanLocation();
    if (accept( id )) {
         identifier = acceptedTokenValue();
         if (!accept( '=' )) {
             setScanLocation( savedLocation );
             return expr();
         }
         return new assignment( identifier, expr() );
    } else {
         return expr();
    }
}

I think there are two ways to solve this with a recursive descent parser: either by using (more) lookahead or by backtracking.

Lookahead

command() {
    if (currentToken() == id && lookaheadToken() == '=') {
        return assignment();
    } else {
        return expr();
    }
}

Backtracking

command() {
    savedLocation = scanLocation();
    if (accept( id )) {
         identifier = acceptedTokenValue();
         if (!accept( '=' )) {
             setScanLocation( savedLocation );
             return expr();
         }
         return new assignment( identifier, expr() );
    } else {
         return expr();
    }
}
玻璃人 2024-07-12 21:03:30

ANTLR 3 使用“LL(*)”解析器而不是 LL (k) 解析器,因此如果必须的话,它将使用专门优化的确定性有限自动机 (DFA) 向前查找,直到到达输入末尾,而不进行回溯。

ANTLR 3 uses a "LL(*)" parser as opposed to a LL(k) parser, so it will look ahead until it reaches the end of the input if it has to, without backtracking, using a specially optimized determinstic finite automata (DFA).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文