Python:YACC 的问题
我正在使用 PLY 来解析如下句子:
“CS 2310 或同等经验”
所需的输出:
[[("CS", 2310)], ["equivalent experience"]]
YACC 分词器符号:
tokens = [
'DEPT_CODE',
'COURSE_NUMBER',
'OR_CONJ',
'MISC_TEXT',
]
t_DEPT_CODE = r'[A-Z]{2,}'
t_COURSE_NUMBER = r'[0-9]{4}'
t_OR_CONJ = r'or'
t_ignore = ' \t'
terms = {'DEPT_CODE': t_DEPT_CODE,
'COURSE_NUMBER': t_COURSE_NUMBER,
'OR_CONJ': t_OR_CONJ}
for name, regex in terms.items():
terms[name] = "^%s$" % regex
def t_MISC_TEXT(t):
r'\S+'
for name, regex in terms.items():
# print "trying to match %s with regex %s" % (t.value, regex)
if re.match(regex, t.value):
t.type = name
return t
return t
(MISC_TEXT 旨在匹配其他术语未捕获的任何内容。)
解析器的一些相关规则:
precedence = (
('left', 'MISC_TEXT'),
)
def p_statement_course_data(p):
'statement : course_data'
p[0] = p[1]
def p_course_data(p):
'course_data : course'
p[0] = p[1]
def p_course(p):
'course : DEPT_CODE COURSE_NUMBER'
p[0] = make_course(p[1], int(p[2]))
def p_or_phrase(p):
'or_phrase : statement OR_CONJ statement'
p[0] = [[p[1]], [p[3]]]
def p_misc_text(p):
'''text_aggregate : MISC_TEXT MISC_TEXT
| MISC_TEXT text_aggregate
| text_aggregate MISC_TEXT '''
p[0] = "%s %s" % (p[0], [1])
def p_text_aggregate_statement(p):
'statement : text_aggregate'
p[0] = p[1]
不幸的是,这失败了:
# works as it should
>>> token_list("CS 2110 or equivalent experience")
[LexToken(DEPT_CODE,'CS',1,0), LexToken(COURSE_NUMBER,'2110',1,3), LexToken(OR_CONJ,'or',1,8), LexToken(MISC_TEXT,'equivalent',1,11), LexToken(MISC_TEXT,'experience',1,22)]
# fails. bummer.
>>> parser.parse("CS 2110 or equivalent experience")
Syntax error in input: LexToken(MISC_TEXT,'equivalent',1,11)
我做错了什么?我不完全明白如何设置优先规则。
另外,这是我的错误函数:
def p_error(p):
print "Syntax error in input: %s" % p
有没有办法查看解析器失败时正在尝试哪个规则?或者有其他方法让解析器打印规则它的尝试?
UPDATE token_list()
只是一个辅助函数:
def token_list(string):
lexer.input(string)
result = []
for tok in lexer:
result.append(tok)
return result
UPDATE 2: 这是我想要进行的解析:
Symbol Stack Input Tokens Action
DEPT_CODE COURSE_NUMBER OR_CONJ MISC_TEXT MISC_TEXT
DEPT_CODE COURSE_NUMBER OR_CONJ MISC_TEXT MISC_TEXT Shift DEPT_CODE
DEPT_CODE COURSE_NUMBER OR_CONJ MISC_TEXT MISC_TEXT Shift COURSE_NUMBER
course OR_CONJ MISC_TEXT MISC_TEXT Reduce course : DEPT_CODE COURSE_NUMBER
course_data OR_CONJ MISC_TEXT MISC_TEXT Reduce course_data : course
statement OR_CONJ MISC_TEXT MISC_TEXT Reduce statement : course_data
statement OR_CONJ MISC_TEXT MISC_TEXT Shift OR_CONJ
statement OR_CONJ MISC_TEXT MISC_TEXT Shift MISC_TEXT
statement OR_CONJ text_aggregate MISC_TEXT Reduce text_aggregate : MISC_TEXT
statement OR_CONJ text_aggregate MISC_TEXT Shift MISC_TEXT
statement OR_CONJ text_aggergate Reduce text_aggregate : text_aggregate MISC_TEXT
statement OR_CONJ statement Reduce statement : TEXT_AGGREGATE
or_phrase Reduce or_phrase : statement OR_CONJ statement
statement Reduce statement : or_phrase
我添加了这个解析操作:
def p_misc_text_singleton(p):
'text_aggregate : MISC_TEXT'
p[0] = p[1]
当我尝试构建解析器时,我得到以下输出:
Generating LALR tables
WARNING: 2 shift/reduce conflicts
WARNING: 3 reduce/reduce conflicts
WARNING: reduce/reduce conflict in state 8 resolved using rule (text_aggregate -> MISC_TEXT MISC_TEXT)
WARNING: rejected rule (text_aggregate -> MISC_TEXT) in state 8
解析仍然因语法错误而失败,如上所述。
I'm using PLY to parse sentences like:
"CS 2310 or equivalent experience"
The desired output:
[[("CS", 2310)], ["equivalent experience"]]
YACC tokenizer symbols:
tokens = [
'DEPT_CODE',
'COURSE_NUMBER',
'OR_CONJ',
'MISC_TEXT',
]
t_DEPT_CODE = r'[A-Z]{2,}'
t_COURSE_NUMBER = r'[0-9]{4}'
t_OR_CONJ = r'or'
t_ignore = ' \t'
terms = {'DEPT_CODE': t_DEPT_CODE,
'COURSE_NUMBER': t_COURSE_NUMBER,
'OR_CONJ': t_OR_CONJ}
for name, regex in terms.items():
terms[name] = "^%s$" % regex
def t_MISC_TEXT(t):
r'\S+'
for name, regex in terms.items():
# print "trying to match %s with regex %s" % (t.value, regex)
if re.match(regex, t.value):
t.type = name
return t
return t
(MISC_TEXT is meant to match anything not caught by the other terms.)
Some relevant rules from the parser:
precedence = (
('left', 'MISC_TEXT'),
)
def p_statement_course_data(p):
'statement : course_data'
p[0] = p[1]
def p_course_data(p):
'course_data : course'
p[0] = p[1]
def p_course(p):
'course : DEPT_CODE COURSE_NUMBER'
p[0] = make_course(p[1], int(p[2]))
def p_or_phrase(p):
'or_phrase : statement OR_CONJ statement'
p[0] = [[p[1]], [p[3]]]
def p_misc_text(p):
'''text_aggregate : MISC_TEXT MISC_TEXT
| MISC_TEXT text_aggregate
| text_aggregate MISC_TEXT '''
p[0] = "%s %s" % (p[0], [1])
def p_text_aggregate_statement(p):
'statement : text_aggregate'
p[0] = p[1]
Unfortunately, this fails:
# works as it should
>>> token_list("CS 2110 or equivalent experience")
[LexToken(DEPT_CODE,'CS',1,0), LexToken(COURSE_NUMBER,'2110',1,3), LexToken(OR_CONJ,'or',1,8), LexToken(MISC_TEXT,'equivalent',1,11), LexToken(MISC_TEXT,'experience',1,22)]
# fails. bummer.
>>> parser.parse("CS 2110 or equivalent experience")
Syntax error in input: LexToken(MISC_TEXT,'equivalent',1,11)
What am I doing wrong? I don't fully understand how to set precedence rules.
Also, this is my error function:
def p_error(p):
print "Syntax error in input: %s" % p
Is there a way to see which rule the parser was trying when it failed? Or some other way to make the parser print which rules its trying?
UPDATE token_list()
is just a helper function:
def token_list(string):
lexer.input(string)
result = []
for tok in lexer:
result.append(tok)
return result
UPDATE 2: Here is the parsing that I want to happen:
Symbol Stack Input Tokens Action
DEPT_CODE COURSE_NUMBER OR_CONJ MISC_TEXT MISC_TEXT
DEPT_CODE COURSE_NUMBER OR_CONJ MISC_TEXT MISC_TEXT Shift DEPT_CODE
DEPT_CODE COURSE_NUMBER OR_CONJ MISC_TEXT MISC_TEXT Shift COURSE_NUMBER
course OR_CONJ MISC_TEXT MISC_TEXT Reduce course : DEPT_CODE COURSE_NUMBER
course_data OR_CONJ MISC_TEXT MISC_TEXT Reduce course_data : course
statement OR_CONJ MISC_TEXT MISC_TEXT Reduce statement : course_data
statement OR_CONJ MISC_TEXT MISC_TEXT Shift OR_CONJ
statement OR_CONJ MISC_TEXT MISC_TEXT Shift MISC_TEXT
statement OR_CONJ text_aggregate MISC_TEXT Reduce text_aggregate : MISC_TEXT
statement OR_CONJ text_aggregate MISC_TEXT Shift MISC_TEXT
statement OR_CONJ text_aggergate Reduce text_aggregate : text_aggregate MISC_TEXT
statement OR_CONJ statement Reduce statement : TEXT_AGGREGATE
or_phrase Reduce or_phrase : statement OR_CONJ statement
statement Reduce statement : or_phrase
I added this parsing action:
def p_misc_text_singleton(p):
'text_aggregate : MISC_TEXT'
p[0] = p[1]
When I try to build the parser, I get this output:
Generating LALR tables
WARNING: 2 shift/reduce conflicts
WARNING: 3 reduce/reduce conflicts
WARNING: reduce/reduce conflict in state 8 resolved using rule (text_aggregate -> MISC_TEXT MISC_TEXT)
WARNING: rejected rule (text_aggregate -> MISC_TEXT) in state 8
Parsing still fails on a syntax error, as above.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我无法重现您的错误,而是在“or”上出现语法错误。您没有包含使用
or_phrase
的规则。当我包含一个时,我没有收到任何错误。我认为这不是一个优先级问题。如果您应该设置日志记录,这样您就可以看到 PLY 正在采取的步骤并将其与您想要发生的情况进行比较,这将会有所帮助。为此,请将
debug=1
传递给解析函数(您可能还必须将其传递给yacc
)。如果您无法进行调试,请查看 PLY 的yacc.py
。发生归约/归约冲突的原因是,是否应将
MISC_TEXT MISC_TEXT
归约为text_aggregate MISC_TEXT
,还是应将MISC_TEXT MISC_TEXT
归约为不明确。 >text_aggregate
。由于无法重现问题,我对解决错误的最佳猜测是将
p_misc_text
规则更改为:'''文本聚合:MISC_TEXT
|文本聚合 MISC_TEXT'''
我认为您还可以删除
优先级
元组。I can't reproduce your error, instead I get a syntax error on "or". You did not include a rule that uses
or_phrase
. When I include one, I get no errors.I don't think it's a precedence issue. It would help if you should set up logging so you can see the steps PLY is taking and compare it to what you want to happen. To do this, pass
debug=1
to the parse function (you might also have to pass that toyacc
). Look at PLY'syacc.py
if you can't get the debugging working.The reduce/reduce conflict happens because it is ambiguous whether it should reduce
MISC_TEXT MISC_TEXT
totext_aggregate MISC_TEXT
or if it should reduceMISC_TEXT MISC_TEXT
totext_aggregate
.Without being able to reproduce the problem, my best guess at what would fix your error is to change the
p_misc_text
rule to:'''text_aggregate : MISC_TEXT
| text_aggregate MISC_TEXT'''
I think you can also delete the
precedence
tuple.