Python/YACC:解决移位/归约冲突
我正在使用 PLY。这是我来自 parser.out 的状态之一:
state 3
(5) course_data -> course .
(6) course_data -> course . course_list_tail
(3) or_phrase -> course . OR_CONJ COURSE_NUMBER
(7) course_list_tail -> . , COURSE_NUMBER
(8) course_list_tail -> . , COURSE_NUMBER course_list_tail
! shift/reduce conflict for OR_CONJ resolved as shift
$end reduce using rule 5 (course_data -> course .)
OR_CONJ shift and go to state 7
, shift and go to state 8
! OR_CONJ [ reduce using rule 5 (course_data -> course .) ]
course_list_tail shift and go to state 9
我想将其解决为:
if OR_CONJ is followed by COURSE_NUMBER:
shift and go to state 7
else:
reduce using rule 5 (course_data -> course .)
如何修复我的解析器文件以反映这一点?我是否需要通过回溯并尝试不同的规则来处理语法错误?
文档说:
然后使用这些值附加一个 数值优先值和 每个的关联方向 语法规则。这总是 通过查看确定 最右边终端的优先级 符号。
如果规则没有终端怎么办?
更新:完整语法:
Grammar
Rule 0 S' -> statement
Rule 1 statement -> course_data
Rule 2 or_phrase -> statement OR_CONJ statement
Rule 3 or_phrase -> course OR_CONJ COURSE_NUMBER
Rule 4 statement -> or_phrase
Rule 5 course_data -> course
Rule 6 course_data -> course course_list_tail
Rule 7 course_list_tail -> , COURSE_NUMBER
Rule 8 course_list_tail -> , COURSE_NUMBER course_list_tail
Rule 9 course -> DEPT_CODE COURSE_NUMBER
I'm using PLY. Here is one of my states from parser.out:
state 3
(5) course_data -> course .
(6) course_data -> course . course_list_tail
(3) or_phrase -> course . OR_CONJ COURSE_NUMBER
(7) course_list_tail -> . , COURSE_NUMBER
(8) course_list_tail -> . , COURSE_NUMBER course_list_tail
! shift/reduce conflict for OR_CONJ resolved as shift
$end reduce using rule 5 (course_data -> course .)
OR_CONJ shift and go to state 7
, shift and go to state 8
! OR_CONJ [ reduce using rule 5 (course_data -> course .) ]
course_list_tail shift and go to state 9
I want to resolve this as:
if OR_CONJ is followed by COURSE_NUMBER:
shift and go to state 7
else:
reduce using rule 5 (course_data -> course .)
How can I fix my parser file to reflect this? Do I need to handle a syntax error by backtracking and trying a different rule?
The documentation says:
These values are then used to attach a
numerical precedence value and
associativity direction to each
grammar rule. This is always
determined by looking at the
precedence of the right-most terminal
symbol.
What if the rule has no terminals?
UPDATE: The complete grammar:
Grammar
Rule 0 S' -> statement
Rule 1 statement -> course_data
Rule 2 or_phrase -> statement OR_CONJ statement
Rule 3 or_phrase -> course OR_CONJ COURSE_NUMBER
Rule 4 statement -> or_phrase
Rule 5 course_data -> course
Rule 6 course_data -> course course_list_tail
Rule 7 course_list_tail -> , COURSE_NUMBER
Rule 8 course_list_tail -> , COURSE_NUMBER course_list_tail
Rule 9 course -> DEPT_CODE COURSE_NUMBER
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的基本问题是,您需要两个前瞻令牌才能完成您想要的操作 - 当到目前为止看到的输入是一个
course
而前瞻是一个OR_CONJ
时,您不需要不知道是否将course
减少为course_data
或在不向前查看两个标记的情况下转移到OR_CONJ
之后的标记。有多种方法可以处理这个问题使用 LR(2) 或 LR(k) 或 GLR 解析器生成器——任何方法都可以处理这个问题。
使用词法分析器进行前瞻 - 基本上让词法分析器返回两个不同的
OR_CONJ
标记,具体取决于以下标记是否是COURSE_NUMBER
。分解语法以消除冲突,这可能会导致语法解析的内容与您想要的略有不同(需要一些额外的解析后检查以拒绝一些无效的构造),并且通常会使语法变得更加困难理解。
请注意,您给出的语法对于三个或更多课程在单个语句关联中的连接方式也是不明确的。通过将语法重写为更清晰的左递归形式,可以轻松解决此问题:
对于 LL 解析器生成器,也可以将其重写为右递归形式,但它仍然存在 2 令牌先行问题。重构它以消除这种情况的一种方法是将
COURSE_NUMBER
本身设置为有效的课程
,并将其与之前的课程
重新组合在一个post-pass(或者如果它是statement
中的第一个course
,则给出错误)。那么规则 4 就变成了:你们之间就没有冲突了。
Your basic problem is that you need two tokens of lookahead to do what you want -- when the input seen so far is a
course
and the lookahead is aOR_CONJ
you don't know whether to reduce thecourse
to acourse_data
or shift without looking ahead two tokens to the token after theOR_CONJ
. There are a number of ways you can deal with thisuse an LR(2) or LR(k) or GLR parser generator -- any can deal with this.
use a lexer hack to do the lookahead -- basically have the lexer return two different
OR_CONJ
tokens depending on whether the following token is aCOURSE_NUMBER
or not.factor the grammar to get rid of the conflict, which may result in a grammar that parses something slightly different from what you want (need some extra post-parse checks to reject some invalid constructs) and will generally make the grammar much harder to understand.
Note that your grammar as given is also ambiguous related to which way three or more courses connected in a single statement associate. This is easily fixed by rewriting the grammar into a clearer left-recursive form:
This could also be rewritten as right-recursive for an LL parser generator, but it still has the 2-token lookahead problem. One way of refactoring it to make that go away would be to make
COURSE_NUMBER
by itself a validcourse
and recombine it with the previouscourse
in a post-pass (or give an error if its the firstcourse
in astatement
). Then rule 4 becomes:and you have no conflicts.