移位和前瞻之间的区别

发布于 2024-12-20 19:11:45 字数 191 浏览 1 评论 0原文

给定一个简单的语法，例如

rule1
    := token1 token2 token3 token4
    || token1 token2 token3 token3;

移动前三个标记，然后查看第四个标记以查看要减少哪个规则，以及简单地执行三个标记的前瞻以查看要减少哪个规则之间有什么区别？

原文

Given a simple grammar, like

rule1
    := token1 token2 token3 token4
    || token1 token2 token3 token3;

What's the difference between shifting the first three tokens, then looking at the fourth to see which rule to reduce, and simply performing a lookahead of three tokens to see which rule to reduce?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

故事和酒 2024-12-27 19:11:45

在移位/归约解析器中，前瞻并不用于确定正在考虑哪个产生式，而是用于决定解析器是否应该移位下一个标记或采取某种归约操作。如果您有一个针对上述语法的移位/归约解析器，则解析器在决定是否归约之前总是会移位四个标记；请记住，在 LR 解析器中，仅当适当的符号系列位于解析堆栈顶部时才会执行归约。仅当解析器无法判断是否应该减少其拥有的四个标记，或者继续移动更多符号并稍后减少时，才需要先行查找。

具体来说，解析器可能会执行如下操作：

Stack                           Input                        Action
-------------------------------------------------------------------------------
                                token1 token2 token3 token4  Shift
token1                                 token2 token3 token4  Shift
token1 token2                                 token3 token4  Shift
token1 token2 token3                                 token4  Shift
token1 token2 token3 token4                                  Reduce, Option 1
rule1                                                        Accept

或者

Stack                           Input                        Action
-------------------------------------------------------------------------------
                                token1 token2 token3 token3  Shift
token1                                 token2 token3 token3  Shift
token1 token2                                 token3 token3  Shift
token1 token2 token3                                 token3  Shift
token1 token2 token3 token3                                  Reduce, Option 2
rule1                                                        Accept

请注意，这与自上而下的解析器（例如 LL(k) 解析器）形成对比，后者通过尝试预测要使用的产生式来工作。在这种情况下，将需要四个先行标记，因为解析器正在猜测产生式，然后检查其猜测（预测/匹配解析）。例如，在自上而下的解析器中（此处必须是 LL(4)），它将执行以下操作：

Stack                           Input                             Action
----------------------------------------------------------------------------------
rule1                           token1 token2 token3 token4 $$  Predict, Option 1
token1 token2 token3 token4     token1 token2 token3 token4 $$  Match
token2 token3 token4            token2 token3 token4 $$         Match
token3 token4                   token3 token4 $$                Match
token4                          token4 $$                       Match
                                $$                              Accept

或者

Stack                           Input                             Action
----------------------------------------------------------------------------------
rule1                           token1 token2 token3 token3 $$  Predict, Option 2
token1 token2 token3 token3     token1 token2 token3 token3 $$  Match
token2 token3 token3            token2 token3 token3 $$         Match
token3 token3                   token3 token3 $$                Match
token3                          token3 $$                       Match
                                $$                              Accept

注意如何需要先行来预测要使用哪个产生式，因此解析器必须具有四个标记向前看。在 LR 解析器中，解析器通过检查更多标记来工作，直到它满意地看到它正在寻找的内容，然后减少（移位/减少解析）。在这种情况下，根本不需要前瞻。在 LR 解析器中，仅需要先行判断解析器是否已看到句柄的末尾（要归约的字符串），或者它是否位于句柄的中间并且必须不断移位。这就是为什么，例如，一些有趣的语法可以显示为 LR(0)，但唯一的 LL(0) 语法是每个非终结符恰好有一个与之关联的产生式的语法；前瞻在自上而下和自下而上的解析中具有根本不同的用途。

一般来说，自上而下的解析器可以比自下而上的解析器处理更少的语法，并且事实上任何 LL(k) 语法都保证是 LR(k)，但反之则不然。

希望这有帮助！

In a shift/reduce parser, the lookahead is not used to determine which production is being considered, but rather to decide whether the parser should shift the next token or take some sort of reduce action. If you had a shift/reduce parser for the above grammar, the parser would always shift four tokens before deciding whether or not to reduce; remember that in LR parsers, reductions only are performed when the appropriate series of symbols is atop the parsing stack. Lookahead would only be necessary here if the parser couldn't tell whether it should reduce the four tokens it had or to keep shifting more symbols and reduce later on.

Specifically, the parser would probably do something like this:

Stack                           Input                        Action
-------------------------------------------------------------------------------
                                token1 token2 token3 token4  Shift
token1                                 token2 token3 token4  Shift
token1 token2                                 token3 token4  Shift
token1 token2 token3                                 token4  Shift
token1 token2 token3 token4                                  Reduce, Option 1
rule1                                                        Accept

Stack                           Input                        Action
-------------------------------------------------------------------------------
                                token1 token2 token3 token3  Shift
token1                                 token2 token3 token3  Shift
token1 token2                                 token3 token3  Shift
token1 token2 token3                                 token3  Shift
token1 token2 token3 token3                                  Reduce, Option 2
rule1                                                        Accept

Note that this contrasts with top-down parsers like LL(k) parsers, which work by trying to predict the production to use. In that case, four lookahead tokens would be required, because the parser is guessing the production and then checking its guess (predict/match parsing). For example, in a top-down parser (which would have to be LL(4) here), it would do the following:

Stack                           Input                             Action
----------------------------------------------------------------------------------
rule1                           token1 token2 token3 token4 $$  Predict, Option 1
token1 token2 token3 token4     token1 token2 token3 token4 $$  Match
token2 token3 token4            token2 token3 token4 $$         Match
token3 token4                   token3 token4 $$                Match
token4                          token4 $$                       Match
                                $$                              Accept

Stack                           Input                             Action
----------------------------------------------------------------------------------
rule1                           token1 token2 token3 token3 $$  Predict, Option 2
token1 token2 token3 token3     token1 token2 token3 token3 $$  Match
token2 token3 token3            token2 token3 token3 $$         Match
token3 token3                   token3 token3 $$                Match
token3                          token3 $$                       Match
                                $$                              Accept

Notice how the lookahead is needed to predict which production to use, so the parser must have four tokens of lookahead. In an LR parser, the parser works by inspecting more tokens until it's comfortable that it's seen what it's looking for, then reduces (shift/reduce parsing). In this case, no lookahead is required at all. Lookahead is only required in an LR parser to determine whether the parser has seen the end of the handle (the string to reduce), or whether it's in the middle of the handle and has to keep shifting. This is why, for example, some interesting grammars can be shown to be LR(0), but the only grammars that are LL(0) are grammars in which each nonterminal has exactly one production associated with it; the lookahead has fundamentally different uses in top-down versus bottom-up parsing.

Generally speaking, top-down parsers can handle fewer grammars than bottom-up parsers, and in fact any LL(k) grammar is guaranteed to be LR(k) but not the other way around.

Hope this helps!

回复收藏 0 原文

~没有更多了~