在Pyparsing中的正向声明和多行嵌套结构的问题

发布于 2025-01-20 05:58:13 字数 4203 浏览 1 评论 0 原文

我的目标是用我使用pyparsing创建的模式来解析以下字符串中的每个角色。我有两个嵌套的结构正在尝试解析。控制结构和宏观结构,它们跨越多条线。

    """
    ; Macros to verify assumptions about the data or code

    table_width: MACRO
    CURRENT_TABLE_WIDTH = \\1
    if _NARG == 2
    REDEF CURRENT_TABLE_START EQUS "\\2"
    else
    REDEF CURRENT_TABLE_START EQUS "._table_width\@"
    {CURRENT_TABLE_START}:
    endc
    ENDM
    """

这些是我的解析器。它们可以从多文件项目中解析线路,直到我开始尝试解析嵌套控制和宏观结构为止。

comment_parser = (Literal(";") + SkipTo(LineEnd()))

charmap_parser = CaselessKeyword("charmap") + QuotedString("\"") + \
                 Literal(",").suppress() + Word(hexnums + "$") + Opt(comment_parser)

expression = infix_notation(Word(printables, exclude_chars="() ** ~ + - * / % & | ^ != == <= >= < > !"),
                            [
                                ("()", 2, OpAssoc.LEFT),
                                ("**", 2, OpAssoc.LEFT),
                                (one_of("~ + -"), 1, OpAssoc.RIGHT),
                                (one_of("* / %"), 2, OpAssoc.LEFT),
                                (one_of("<< >>"), 2, OpAssoc.LEFT),
                                (one_of("& | ^"), 2, OpAssoc.LEFT),
                                ("+ -", 2, OpAssoc.LEFT),
                                ("!= == <= >= < >", 2, OpAssoc.LEFT),
                                ("&& ||", 2, OpAssoc.LEFT),
                                ("!", 1, OpAssoc.RIGHT),
                            ])

elif_parser = CaselessKeyword("elif") + expression

if_parser = CaselessKeyword("if") + expression

include_parser = CaselessLiteral("include") + QuotedString("\"") + Opt(comment_parser)
include_parser.add_parse_action(parse_include)

label = Word(printables, excludeChars=":") + Literal(":")

newcharmap_parser = CaselessKeyword("newcharmap") + Word(printables) + Opt(comment_parser)

numeric_assignment = Word(printables) + Literal("=") + Word(printables)

popc = CaselessKeyword("popc") + Opt(comment_parser)

pushc = CaselessKeyword("pushc") + Opt(comment_parser)

redef = CaselessKeyword("redef") + Word(printables) + \
        (CaselessKeyword("equ") ^ CaselessKeyword("equs")) + \
        QuotedString("\"")

all_rgbasm_parsers = Forward()

control = Forward()

macro_parser = Forward()

all_rgbasm_parsers <<= (charmap_parser ^ comment_parser ^ include_parser ^ newcharmap_parser ^
                        numeric_assignment ^ popc ^ pushc ^ redef ^ control ^ macro_parser ^ label)

control <<= if_parser + OneOrMore(all_rgbasm_parsers) + Opt(elif_parser ^ CaselessKeyword("else")) + \
    ZeroOrMore(all_rgbasm_parsers) + CaselessKeyword("endc")


macro_parser <<= Word(printables, excludeChars=":") + Literal(":").suppress() + CaselessLiteral("macro") + \
               OneOrMore(all_rgbasm_parsers) + FollowedBy(CaselessKeyword("endm"))

我希望MacRo_Parser可以通过解析上述字符串返回结果的嵌套列表。

问题是Macro_parser不起作用。我最终以预期的文本结束,发现“宏” 一个非常无用的错误消息。

如果我从 all_rgbasm_parsers 中删除 label ,我会收到一个更糟糕的消息预期的文本结束,发现'table'我在尝试时会收到相同的错误消息为了解析这一点,

((Word(printables, excludeChars=":") + Literal(":").suppress() + CaselessLiteral("macro") +
               OneOrMore(all_rgbasm_parsers) + FollowedBy(CaselessKeyword("endm"))) ^ comment_parser)

我在上面的表达式中看不到它在一条线开始时会期望新线。我可能会忽略一些东西。看来 word(PrintableS,dubludeChars =“:”)在解析字符时不包括字符串。

我正在使用此测试对解析器


    test = """
    ; Macros to verify assumptions about the data or code

    table_width: MACRO
    CURRENT_TABLE_WIDTH = \\1
    if _NARG == 2
    REDEF CURRENT_TABLE_START EQUS "\\2"
    else
    REDEF CURRENT_TABLE_START EQUS "._table_width\@"
    {CURRENT_TABLE_START}:
    endc
    ENDM
    """
    from rgbasm_parsers import all_rgbasm_parsers
    all_parsers = OneOrMore(Group(all_rgbasm_parsers))
    print(all_parsers.parse_string(test, parseAll=True))

进行测试 oneormore(group(all_rgbasm_parsers)))使用不包含嵌套结构的文件,这给了我正确的结果,所以我认为该代码不是一个问题,尽管我可能错了。

问题的一部分可能是嵌套结构跨越了多行,但是预期的文本结束,发现“表” 使我成为我的事。

我认为我可能会使用远期。

有什么想法吗?

My goal to is to parse every character in the following string with the patterns I have created with PyParsing. I have two nested structures I am trying to parse. The control structure and the macro structure, and they span multiple lines.

    """
    ; Macros to verify assumptions about the data or code

    table_width: MACRO
    CURRENT_TABLE_WIDTH = \\1
    if _NARG == 2
    REDEF CURRENT_TABLE_START EQUS "\\2"
    else
    REDEF CURRENT_TABLE_START EQUS "._table_width\@"
    {CURRENT_TABLE_START}:
    endc
    ENDM
    """

These are my parsers. They work fine for parsing lines from a multi-file project up until I start trying to parse nested control and macro structures.

comment_parser = (Literal(";") + SkipTo(LineEnd()))

charmap_parser = CaselessKeyword("charmap") + QuotedString("\"") + \
                 Literal(",").suppress() + Word(hexnums + "
quot;) + Opt(comment_parser)

expression = infix_notation(Word(printables, exclude_chars="() ** ~ + - * / % & | ^ != == <= >= < > !"),
                            [
                                ("()", 2, OpAssoc.LEFT),
                                ("**", 2, OpAssoc.LEFT),
                                (one_of("~ + -"), 1, OpAssoc.RIGHT),
                                (one_of("* / %"), 2, OpAssoc.LEFT),
                                (one_of("<< >>"), 2, OpAssoc.LEFT),
                                (one_of("& | ^"), 2, OpAssoc.LEFT),
                                ("+ -", 2, OpAssoc.LEFT),
                                ("!= == <= >= < >", 2, OpAssoc.LEFT),
                                ("&& ||", 2, OpAssoc.LEFT),
                                ("!", 1, OpAssoc.RIGHT),
                            ])

elif_parser = CaselessKeyword("elif") + expression

if_parser = CaselessKeyword("if") + expression

include_parser = CaselessLiteral("include") + QuotedString("\"") + Opt(comment_parser)
include_parser.add_parse_action(parse_include)

label = Word(printables, excludeChars=":") + Literal(":")

newcharmap_parser = CaselessKeyword("newcharmap") + Word(printables) + Opt(comment_parser)

numeric_assignment = Word(printables) + Literal("=") + Word(printables)

popc = CaselessKeyword("popc") + Opt(comment_parser)

pushc = CaselessKeyword("pushc") + Opt(comment_parser)

redef = CaselessKeyword("redef") + Word(printables) + \
        (CaselessKeyword("equ") ^ CaselessKeyword("equs")) + \
        QuotedString("\"")

all_rgbasm_parsers = Forward()

control = Forward()

macro_parser = Forward()

all_rgbasm_parsers <<= (charmap_parser ^ comment_parser ^ include_parser ^ newcharmap_parser ^
                        numeric_assignment ^ popc ^ pushc ^ redef ^ control ^ macro_parser ^ label)

control <<= if_parser + OneOrMore(all_rgbasm_parsers) + Opt(elif_parser ^ CaselessKeyword("else")) + \
    ZeroOrMore(all_rgbasm_parsers) + CaselessKeyword("endc")


macro_parser <<= Word(printables, excludeChars=":") + Literal(":").suppress() + CaselessLiteral("macro") + \
               OneOrMore(all_rgbasm_parsers) + FollowedBy(CaselessKeyword("endm"))

I expect the macro_parser to return a nested list of results from parsing the above string.

The problem is that the macro_parser does not work. I end up with Expected end of text, found 'MACRO' A very unhelpful error message.

If I remove label from all_rgbasm_parsers I get an even worse message Expected end of text, found 'table' I get the same error message when trying to parse with this

((Word(printables, excludeChars=":") + Literal(":").suppress() + CaselessLiteral("macro") +
               OneOrMore(all_rgbasm_parsers) + FollowedBy(CaselessKeyword("endm"))) ^ comment_parser)

I see nowhere in the expression above where it would expect a newline at the start of a line. I may be overlooking something. It appears that Word(printables, excludeChars=":") does not include the character _ when it parses despite the fact that string.printable includes it.

I am testing the parser with this


    test = """
    ; Macros to verify assumptions about the data or code

    table_width: MACRO
    CURRENT_TABLE_WIDTH = \\1
    if _NARG == 2
    REDEF CURRENT_TABLE_START EQUS "\\2"
    else
    REDEF CURRENT_TABLE_START EQUS "._table_width\@"
    {CURRENT_TABLE_START}:
    endc
    ENDM
    """
    from rgbasm_parsers import all_rgbasm_parsers
    all_parsers = OneOrMore(Group(all_rgbasm_parsers))
    print(all_parsers.parse_string(test, parseAll=True))

I have tested OneOrMore(Group(all_rgbasm_parsers)) with files that include no nested structures, and that gives me the correct results, so I do not think that that code is the problem, though I may be wrong.

It may be that part of the problem is that the nested structures span multiple lines, but Expected end of text, found 'table' makes me thing otherwise.

I think I might be using Forward wrong.

Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

累赘 2025-01-27 05:58:14

发现2处错误。

第一,infix_notation 中缺少一些 one_ofs

expression = infix_notation(Word(
    printables,
    exclude_chars=" ** ~ + - * / % & | ^ != == <= >= < > ! , += -= *= /= %= <<= >>= &= |= ^="
),
    [
        ("**", 2, OpAssoc.LEFT),
        (one_of("~ + -"), 1, OpAssoc.RIGHT),
        (one_of("* / % *= /= %="), 2, OpAssoc.LEFT),
        (one_of("<< >> <<= >>="), 2, OpAssoc.LEFT),
        (one_of("& | ^ &= |= ^="), 2, OpAssoc.LEFT),
        (one_of("+ - += -="), 2, OpAssoc.LEFT),
        (one_of("!= == <= >= < >"), 2, OpAssoc.LEFT),
        (one_of("&& ||"), 2, OpAssoc.LEFT),
        ("!", 1, OpAssoc.RIGHT),
    ])

然后“endm”没有被消耗,导致 ParseExcetion。

macro_parser <<= (Word(printables, excludeChars=":") + Literal(":").suppress() + CaselessLiteral("macro") + 
                  OneOrMore(all_rgbasm_parsers) +
                  FollowedBy(CaselessKeyword("endm"))) + CaselessKeyword("endm")

Found 2 things wrong.

1st, there were some missing one_ofs in the infix_notation

expression = infix_notation(Word(
    printables,
    exclude_chars=" ** ~ + - * / % & | ^ != == <= >= < > ! , += -= *= /= %= <<= >>= &= |= ^="
),
    [
        ("**", 2, OpAssoc.LEFT),
        (one_of("~ + -"), 1, OpAssoc.RIGHT),
        (one_of("* / % *= /= %="), 2, OpAssoc.LEFT),
        (one_of("<< >> <<= >>="), 2, OpAssoc.LEFT),
        (one_of("& | ^ &= |= ^="), 2, OpAssoc.LEFT),
        (one_of("+ - += -="), 2, OpAssoc.LEFT),
        (one_of("!= == <= >= < >"), 2, OpAssoc.LEFT),
        (one_of("&& ||"), 2, OpAssoc.LEFT),
        ("!", 1, OpAssoc.RIGHT),
    ])

Then the "endm" was not being consumed resulting in a ParseExcetion.

macro_parser <<= (Word(printables, excludeChars=":") + Literal(":").suppress() + CaselessLiteral("macro") + 
                  OneOrMore(all_rgbasm_parsers) +
                  FollowedBy(CaselessKeyword("endm"))) + CaselessKeyword("endm")
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文