Python:YACC 的问题

发布于 2024-09-03 10:14:50 字数 4894 浏览 3 评论 0原文

我正在使用 PLY 来解析如下句子:

“CS 2310 或同等经验”

所需的输出:

[[("CS", 2310)], ["equivalent experience"]]

YACC 分词器符号:

tokens = [
    'DEPT_CODE',
    'COURSE_NUMBER',
    'OR_CONJ',
    'MISC_TEXT',
]

t_DEPT_CODE = r'[A-Z]{2,}'
t_COURSE_NUMBER  = r'[0-9]{4}'

t_OR_CONJ = r'or'

t_ignore = ' \t'

terms = {'DEPT_CODE': t_DEPT_CODE,
         'COURSE_NUMBER': t_COURSE_NUMBER,
         'OR_CONJ': t_OR_CONJ}

for name, regex in terms.items():
    terms[name] = "^%s$" % regex

def t_MISC_TEXT(t):
    r'\S+'
    for name, regex in terms.items():
        # print "trying to match %s with regex %s" % (t.value, regex)
        if re.match(regex, t.value):
            t.type = name
            return t

    return t

(MISC_TEXT 旨在匹配其他术语未捕获的任何内容。)

解析器的一些相关规则:

precedence = (
    ('left', 'MISC_TEXT'),
)


def p_statement_course_data(p):
    'statement : course_data'
    p[0] = p[1]

def p_course_data(p):
    'course_data : course'
    p[0] = p[1]


def p_course(p):
    'course : DEPT_CODE COURSE_NUMBER'
    p[0] = make_course(p[1], int(p[2]))


def p_or_phrase(p):
    'or_phrase : statement OR_CONJ statement'
    p[0] = [[p[1]], [p[3]]] 


def p_misc_text(p):
    '''text_aggregate : MISC_TEXT MISC_TEXT
                      | MISC_TEXT text_aggregate
                      | text_aggregate MISC_TEXT '''
    p[0] = "%s %s" % (p[0], [1])

def p_text_aggregate_statement(p):
    'statement : text_aggregate'
    p[0] = p[1]

不幸的是,这失败了:

# works as it should
>>> token_list("CS 2110 or equivalent experience")
[LexToken(DEPT_CODE,'CS',1,0), LexToken(COURSE_NUMBER,'2110',1,3), LexToken(OR_CONJ,'or',1,8), LexToken(MISC_TEXT,'equivalent',1,11), LexToken(MISC_TEXT,'experience',1,22)]

# fails. bummer.
>>> parser.parse("CS 2110 or equivalent experience")
Syntax error in input: LexToken(MISC_TEXT,'equivalent',1,11)

我做错了什么?我不完全明白如何设置优先规则。

另外,这是我的错误函数:

def p_error(p):
    print "Syntax error in input: %s" % p

有没有办法查看解析器失败时正在尝试哪个规则?或者有其他方法让解析器打印规则它的尝试?

UPDATE token_list() 只是一个辅助函数:

def token_list(string):
    lexer.input(string)
    result = []
    for tok in lexer:
        result.append(tok)
    return result

UPDATE 2: 这是我想要进行的解析:

Symbol Stack                                Input Tokens                                                Action
                                            DEPT_CODE COURSE_NUMBER OR_CONJ MISC_TEXT MISC_TEXT
DEPT_CODE                                   COURSE_NUMBER OR_CONJ MISC_TEXT MISC_TEXT                   Shift DEPT_CODE
DEPT_CODE COURSE_NUMBER                     OR_CONJ MISC_TEXT MISC_TEXT                                 Shift COURSE_NUMBER
course                                      OR_CONJ MISC_TEXT MISC_TEXT                                 Reduce course : DEPT_CODE COURSE_NUMBER
course_data                                 OR_CONJ MISC_TEXT MISC_TEXT                                 Reduce course_data : course
statement                                   OR_CONJ MISC_TEXT MISC_TEXT                                 Reduce statement : course_data
statement OR_CONJ                           MISC_TEXT MISC_TEXT                                         Shift OR_CONJ

statement OR_CONJ MISC_TEXT                 MISC_TEXT                                                   Shift MISC_TEXT
statement OR_CONJ text_aggregate            MISC_TEXT                                                   Reduce text_aggregate : MISC_TEXT
statement OR_CONJ text_aggregate MISC_TEXT                                                              Shift MISC_TEXT
statement OR_CONJ text_aggergate                                                                        Reduce text_aggregate : text_aggregate MISC_TEXT

statement OR_CONJ statement                                                                             Reduce statement : TEXT_AGGREGATE
or_phrase                                                                                               Reduce or_phrase : statement OR_CONJ statement
statement                                                                                               Reduce statement : or_phrase

我添加了这个解析操作:

def p_misc_text_singleton(p):
    'text_aggregate : MISC_TEXT'
    p[0] = p[1]

当我尝试构建解析器时,我得到以下输出:

Generating LALR tables
WARNING: 2 shift/reduce conflicts
WARNING: 3 reduce/reduce conflicts
WARNING: reduce/reduce conflict in state 8 resolved using rule (text_aggregate -> MISC_TEXT MISC_TEXT)
WARNING: rejected rule (text_aggregate -> MISC_TEXT) in state 8

解析仍然因语法错误而失败,如上所述。

I'm using PLY to parse sentences like:

"CS 2310 or equivalent experience"

The desired output:

[[("CS", 2310)], ["equivalent experience"]]

YACC tokenizer symbols:

tokens = [
    'DEPT_CODE',
    'COURSE_NUMBER',
    'OR_CONJ',
    'MISC_TEXT',
]

t_DEPT_CODE = r'[A-Z]{2,}'
t_COURSE_NUMBER  = r'[0-9]{4}'

t_OR_CONJ = r'or'

t_ignore = ' \t'

terms = {'DEPT_CODE': t_DEPT_CODE,
         'COURSE_NUMBER': t_COURSE_NUMBER,
         'OR_CONJ': t_OR_CONJ}

for name, regex in terms.items():
    terms[name] = "^%s$" % regex

def t_MISC_TEXT(t):
    r'\S+'
    for name, regex in terms.items():
        # print "trying to match %s with regex %s" % (t.value, regex)
        if re.match(regex, t.value):
            t.type = name
            return t

    return t

(MISC_TEXT is meant to match anything not caught by the other terms.)

Some relevant rules from the parser:

precedence = (
    ('left', 'MISC_TEXT'),
)


def p_statement_course_data(p):
    'statement : course_data'
    p[0] = p[1]

def p_course_data(p):
    'course_data : course'
    p[0] = p[1]


def p_course(p):
    'course : DEPT_CODE COURSE_NUMBER'
    p[0] = make_course(p[1], int(p[2]))


def p_or_phrase(p):
    'or_phrase : statement OR_CONJ statement'
    p[0] = [[p[1]], [p[3]]] 


def p_misc_text(p):
    '''text_aggregate : MISC_TEXT MISC_TEXT
                      | MISC_TEXT text_aggregate
                      | text_aggregate MISC_TEXT '''
    p[0] = "%s %s" % (p[0], [1])

def p_text_aggregate_statement(p):
    'statement : text_aggregate'
    p[0] = p[1]

Unfortunately, this fails:

# works as it should
>>> token_list("CS 2110 or equivalent experience")
[LexToken(DEPT_CODE,'CS',1,0), LexToken(COURSE_NUMBER,'2110',1,3), LexToken(OR_CONJ,'or',1,8), LexToken(MISC_TEXT,'equivalent',1,11), LexToken(MISC_TEXT,'experience',1,22)]

# fails. bummer.
>>> parser.parse("CS 2110 or equivalent experience")
Syntax error in input: LexToken(MISC_TEXT,'equivalent',1,11)

What am I doing wrong? I don't fully understand how to set precedence rules.

Also, this is my error function:

def p_error(p):
    print "Syntax error in input: %s" % p

Is there a way to see which rule the parser was trying when it failed? Or some other way to make the parser print which rules its trying?

UPDATE token_list() is just a helper function:

def token_list(string):
    lexer.input(string)
    result = []
    for tok in lexer:
        result.append(tok)
    return result

UPDATE 2: Here is the parsing that I want to happen:

Symbol Stack                                Input Tokens                                                Action
                                            DEPT_CODE COURSE_NUMBER OR_CONJ MISC_TEXT MISC_TEXT
DEPT_CODE                                   COURSE_NUMBER OR_CONJ MISC_TEXT MISC_TEXT                   Shift DEPT_CODE
DEPT_CODE COURSE_NUMBER                     OR_CONJ MISC_TEXT MISC_TEXT                                 Shift COURSE_NUMBER
course                                      OR_CONJ MISC_TEXT MISC_TEXT                                 Reduce course : DEPT_CODE COURSE_NUMBER
course_data                                 OR_CONJ MISC_TEXT MISC_TEXT                                 Reduce course_data : course
statement                                   OR_CONJ MISC_TEXT MISC_TEXT                                 Reduce statement : course_data
statement OR_CONJ                           MISC_TEXT MISC_TEXT                                         Shift OR_CONJ

statement OR_CONJ MISC_TEXT                 MISC_TEXT                                                   Shift MISC_TEXT
statement OR_CONJ text_aggregate            MISC_TEXT                                                   Reduce text_aggregate : MISC_TEXT
statement OR_CONJ text_aggregate MISC_TEXT                                                              Shift MISC_TEXT
statement OR_CONJ text_aggergate                                                                        Reduce text_aggregate : text_aggregate MISC_TEXT

statement OR_CONJ statement                                                                             Reduce statement : TEXT_AGGREGATE
or_phrase                                                                                               Reduce or_phrase : statement OR_CONJ statement
statement                                                                                               Reduce statement : or_phrase

I added this parsing action:

def p_misc_text_singleton(p):
    'text_aggregate : MISC_TEXT'
    p[0] = p[1]

When I try to build the parser, I get this output:

Generating LALR tables
WARNING: 2 shift/reduce conflicts
WARNING: 3 reduce/reduce conflicts
WARNING: reduce/reduce conflict in state 8 resolved using rule (text_aggregate -> MISC_TEXT MISC_TEXT)
WARNING: rejected rule (text_aggregate -> MISC_TEXT) in state 8

Parsing still fails on a syntax error, as above.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

廻憶裏菂餘溫 2024-09-10 10:14:50

我无法重现您的错误,而是在“or”上出现语法错误。您没有包含使用 or_phrase 的规则。当我包含一个时,我没有收到任何错误。

我认为这不是一个优先级问题。如果您应该设置日志记录,这样您就可以看到 PLY 正在采取的步骤并将其与您想要发生的情况进行比较,这将会有所帮助。为此,请将 debug=1 传递给解析函数(您可能还必须将其传递给 yacc)。如果您无法进行调试,请查看 PLY 的 yacc.py

发生归约/归约冲突的原因是,是否应将 MISC_TEXT MISC_TEXT 归约为 text_aggregate MISC_TEXT,还是应将 MISC_TEXT MISC_TEXT 归约为 不明确。 >text_aggregate

由于无法重现问题,我对解决错误的最佳猜测是将 p_misc_text 规则更改为:


'''文本聚合:MISC_TEXT
|文本聚合 MISC_TEXT'''

我认为您还可以删除优先级元组。

I can't reproduce your error, instead I get a syntax error on "or". You did not include a rule that uses or_phrase. When I include one, I get no errors.

I don't think it's a precedence issue. It would help if you should set up logging so you can see the steps PLY is taking and compare it to what you want to happen. To do this, pass debug=1 to the parse function (you might also have to pass that to yacc). Look at PLY's yacc.py if you can't get the debugging working.

The reduce/reduce conflict happens because it is ambiguous whether it should reduce MISC_TEXT MISC_TEXT to text_aggregate MISC_TEXT or if it should reduce MISC_TEXT MISC_TEXT to text_aggregate.

Without being able to reproduce the problem, my best guess at what would fix your error is to change the p_misc_text rule to:


'''text_aggregate : MISC_TEXT
| text_aggregate MISC_TEXT'''

I think you can also delete the precedence tuple.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文