Shift-reduce:什么时候停止减少?
我正在尝试学习移位归约解析。假设我们有以下语法,使用强制执行操作顺序的递归规则,灵感来自 ANSI C Yacc 语法:
S: A;
P
: NUMBER
| '(' S ')'
;
M
: P
| M '*' P
| M '/' P
;
A
: M
| A '+' M
| A '-' M
;
我们想要使用shift-reduce 解析来解析1+2。首先,1 被移位为数字。我的问题是,然后它会减少到P,然后是M,然后是A,最后是S吗?它怎么知道在哪里停下来?
假设它确实一直减少到 S,然后移动“+”。我们现在有一个堆栈,其中包含:
S '+'
如果我们移动 '2',则减少可能是:
S '+' NUMBER
S '+' P
S '+' M
S '+' A
S '+' S
现在,在最后一行的任一侧,S 可以是 P、M、A 或 NUMBER,并且它在任何组合都是文本的正确表示的感觉。解析器如何“知道”使其
A '+' M
能够将整个表达式简化为 A,然后是 S?换句话说,它如何知道在移动下一个令牌之前停止减少?这是 LR 解析器生成的一个关键困难吗?
编辑:对问题进行补充。
现在假设我们解析1+2*3
。一些移位/归约操作如下:
Stack | Input | Operation
---------+-------+----------------------------------------------
| 1+2*3 |
NUMBER | +2*3 | Shift
A | +2*3 | Reduce (looking ahead, we know to stop at A)
A+ | 2*3 | Shift
A+NUMBER | *3 | Shift (looking ahead, we know to stop at M)
A+M | *3 | Reduce (looking ahead, we know to stop at M)
这是正确的(当然,它还没有完全解析)?此外,向前看 1 个符号是否也告诉我们不要将 A+M
简化为 A
,因为这样做会导致读取 *3 后不可避免的语法错误?
I'm trying to learn about shift-reduce parsing. Suppose we have the following grammar, using recursive rules that enforce order of operations, inspired by the ANSI C Yacc grammar:
S: A;
P
: NUMBER
| '(' S ')'
;
M
: P
| M '*' P
| M '/' P
;
A
: M
| A '+' M
| A '-' M
;
And we want to parse 1+2 using shift-reduce parsing. First, the 1 is shifted as a NUMBER. My question is, is it then reduced to P, then M, then A, then finally S? How does it know where to stop?
Suppose it does reduce all the way to S, then shifts '+'. We'd now have a stack containing:
S '+'
If we shift '2', the reductions might be:
S '+' NUMBER
S '+' P
S '+' M
S '+' A
S '+' S
Now, on either side of the last line, S could be P, M, A, or NUMBER, and it would still be valid in the sense that any combination would be a correct representation of the text. How does the parser "know" to make it
A '+' M
So that it can reduce the whole expression to A, then S? In other words, how does it know to stop reducing before shifting the next token? Is this a key difficulty in LR parser generation?
Edit: An addition to the question follows.
Now suppose we parse 1+2*3
. Some shift/reduce operations are as follows:
Stack | Input | Operation
---------+-------+----------------------------------------------
| 1+2*3 |
NUMBER | +2*3 | Shift
A | +2*3 | Reduce (looking ahead, we know to stop at A)
A+ | 2*3 | Shift
A+NUMBER | *3 | Shift (looking ahead, we know to stop at M)
A+M | *3 | Reduce (looking ahead, we know to stop at M)
Is this correct (granted, it's not fully parsed yet)? Moreover, does lookahead by 1 symbol also tell us not to reduce A+M
to A
, as doing so would result in an inevitable syntax error after reading *3
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您描述的问题是创建
LR(0)
解析器的问题 - 也就是说,自下而上的解析器不会对它们正在解析的当前符号之外的符号进行任何前瞻。您描述的语法似乎不是LR(0)
语法,这就是为什么您在尝试不先行解析它时遇到麻烦的原因。然而,它确实看起来是LR(1)
,因此通过在输入中向前查看 1 个符号,您可以轻松确定是移位还是归约。在这种情况下,LR(1)
解析器会在堆栈上有1
时向前查看,查看下一个符号是+
>,并意识到它不应该减少过去的A
(因为这是它唯一可以减少到仍然与第二个位置具有+
的规则匹配的东西) 。LR
语法的一个有趣特性是,对于任何k>1
为LR(k)
的语法,可以构造一个 < code>LR(1) 语法是等价的。然而,同样的情况并没有一直延伸到LR(0)
- 有许多语法无法转换为LR(0)
。有关 LR(k)-ness 的更多详细信息,请参阅此处:
http:// en.wikipedia.org/wiki/LR_parser
The problem you're describing is an issue with creating
LR(0)
parsers - that is, bottom-up parsers that don't do any lookahead to symbols beyond the current one they are parsing. The grammar you've described doesn't appear to be anLR(0)
grammar, which is why you run into trouble when trying to parse it w/o lookahead. It does appear to beLR(1)
, however, so by looking 1 symbol ahead in the input you could easily determine whether to shift or reduce. In this case, anLR(1)
parser would look ahead when it had the1
on the stack, see that the next symbol is a+
, and realize that it shouldn't reduce pastA
(since that is the only thing it could reduce to that would still match a rule with+
in the second position).An interesting property of
LR
grammars is that for any grammar which isLR(k)
fork>1
, it is possible to construct anLR(1)
grammar which is equivalent. However, the same does not extend all the way down toLR(0)
- there are many grammars which cannot be converted toLR(0)
.See here for more details on
LR(k)
-ness:http://en.wikipedia.org/wiki/LR_parser
我不太确定 Yacc / Bison 解析算法以及它何时更喜欢转移而不是减少,但我知道 Bison 支持 LR(1) 解析,这意味着它有一个前瞻标记。这意味着令牌不会立即传递到堆栈。相反,他们会等到不再有削减的情况发生。然后,如果移动下一个标记有意义,它将应用该操作。
首先,在您的情况下,如果您正在评估 1 + 2,它将移位 1。它将将该标记减少为 A,因为“+”先行标记表明它是唯一有效的过程。由于没有更多的减少,它将把“+”标记移到堆栈上并保留 2 作为前瞻。它将移动 2 并减少为 M,因为 A + M 生成 A 并且表达式已完成。
I'm not exactly sure of the Yacc / Bison parsing algorithm and when it prefers shifting over reducing, however I know that Bison supports LR(1) parsing which means it has a lookahead token. This means that tokens aren't passed to the stack immediately. Rather they wait until no more reductions can happen. Then, if shifting the next token makes sense it applies that operation.
First of all, in your case, if you're evaluating 1 + 2, it will shift 1. It will reduce that token to an A because the '+' lookahead token indicates that its the only valid course. Since there are no more reductions, it will shift the '+' token onto the stack and hold 2 as the lookahead. It will shift the 2 and reduce to an M since A + M produces an A and the expression is complete.