Shift-reduce：什么时候停止减少？

发布于 2024-08-28 06:24:59 字数 1392 浏览 15 评论 0原文

我正在尝试学习移位归约解析。假设我们有以下语法，使用强制执行操作顺序的递归规则，灵感来自 ANSI C Yacc 语法：

S: A;

P
    : NUMBER
    | '(' S ')'
    ;

M
    : P
    | M '*' P
    | M '/' P
    ;

A
    : M
    | A '+' M
    | A '-' M
    ;

我们想要使用shift-reduce 解析来解析1+2。首先，1 被移位为数字。我的问题是，然后它会减少到P，然后是M，然后是A，最后是S吗？它怎么知道在哪里停下来？

假设它确实一直减少到 S，然后移动“+”。我们现在有一个堆栈，其中包含：

S '+'

如果我们移动 '2'，则减少可能是：

S '+' NUMBER
S '+' P
S '+' M
S '+' A
S '+' S

现在，在最后一行的任一侧，S 可以是 P、M、A 或 NUMBER，并且它在任何组合都是文本的正确表示的感觉。解析器如何“知道”使其

A '+' M

能够将整个表达式简化为 A，然后是 S？换句话说，它如何知道在移动下一个令牌之前停止减少？这是 LR 解析器生成的一个关键困难吗？

编辑：对问题进行补充。

现在假设我们解析1+2*3。一些移位/归约操作如下：

Stack    | Input | Operation
---------+-------+----------------------------------------------
         | 1+2*3 | 
NUMBER   | +2*3  | Shift
A        | +2*3  | Reduce (looking ahead, we know to stop at A)
A+       | 2*3   | Shift
A+NUMBER | *3    | Shift (looking ahead, we know to stop at M)
A+M      | *3    | Reduce (looking ahead, we know to stop at M)

这是正确的（当然，它还没有完全解析）？此外，向前看 1 个符号是否也告诉我们不要将 A+M 简化为 A，因为这样做会导致读取 *3 后不可避免的语法错误？

原文

I'm trying to learn about shift-reduce parsing. Suppose we have the following grammar, using recursive rules that enforce order of operations, inspired by the ANSI C Yacc grammar:

S: A;

P
    : NUMBER
    | '(' S ')'
    ;

M
    : P
    | M '*' P
    | M '/' P
    ;

A
    : M
    | A '+' M
    | A '-' M
    ;

And we want to parse 1+2 using shift-reduce parsing. First, the 1 is shifted as a NUMBER. My question is, is it then reduced to P, then M, then A, then finally S? How does it know where to stop?

Suppose it does reduce all the way to S, then shifts '+'. We'd now have a stack containing:

S '+'

If we shift '2', the reductions might be:

S '+' NUMBER
S '+' P
S '+' M
S '+' A
S '+' S

Now, on either side of the last line, S could be P, M, A, or NUMBER, and it would still be valid in the sense that any combination would be a correct representation of the text. How does the parser "know" to make it

A '+' M

So that it can reduce the whole expression to A, then S? In other words, how does it know to stop reducing before shifting the next token? Is this a key difficulty in LR parser generation?

Edit: An addition to the question follows.

Now suppose we parse 1+2*3. Some shift/reduce operations are as follows:

Stack    | Input | Operation
---------+-------+----------------------------------------------
         | 1+2*3 | 
NUMBER   | +2*3  | Shift
A        | +2*3  | Reduce (looking ahead, we know to stop at A)
A+       | 2*3   | Shift
A+NUMBER | *3    | Shift (looking ahead, we know to stop at M)
A+M      | *3    | Reduce (looking ahead, we know to stop at M)

Is this correct (granted, it's not fully parsed yet)? Moreover, does lookahead by 1 symbol also tell us not to reduce A+M to A, as doing so would result in an inevitable syntax error after reading *3 ?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

九厘米的零° 2024-09-04 06:24:59

您描述的问题是创建 LR(0) 解析器的问题 - 也就是说，自下而上的解析器不会对它们正在解析的当前符号之外的符号进行任何前瞻。您描述的语法似乎不是 LR(0) 语法，这就是为什么您在尝试不先行解析它时遇到麻烦的原因。然而，它确实看起来是LR(1)，因此通过在输入中向前查看 1 个符号，您可以轻松确定是移位还是归约。在这种情况下，LR(1) 解析器会在堆栈上有 1 时向前查看，查看下一个符号是 + >，并意识到它不应该减少过去的 A （因为这是它唯一可以减少到仍然与第二个位置具有 + 的规则匹配的东西）。

LR 语法的一个有趣特性是，对于任何 k>1 为 LR(k) 的语法，可以构造一个 < code>LR(1) 语法是等价的。然而，同样的情况并没有一直延伸到 LR(0) - 有许多语法无法转换为 LR(0)。

有关 LR(k)-ness 的更多详细信息，请参阅此处：

http:// en.wikipedia.org/wiki/LR_parser