我可以改进这个 GOLD Parser Grammar 吗?

发布于 2024-09-11 15:15:11 字数 1392 浏览 10 评论 0原文

我必须解析一个如下所示的文件:

versioninfo
{
    "editorversion" "400"
    "editorbuild" "4715"
}
visgroups
{
}
world
{
    "id" "1"
    "mapversion" "525"
    "classname" "worldspawn"
    solid
    {
        "id" "2"
        side
        {
            "id" "1"
            "plane" "(-544 -400 0) (-544 -240 0) (-272 -240 0)"
        }
        side
        {
            "id" "2"
            "plane" "(-544 -240 -16) (-544 -400 -16) (-272 -400 -16)"
        }
    }
}

我有一个从头开始编写的解析器,但它有一些我无法追踪的错误,我想如果将来格式发生变化,它将很难维护。我决定使用 GOLD 解析系统来生成解析器。我的语法如下所示:

"Start Symbol" = <SectionList>

! SETS

{Section Chars} = {AlphaNumeric} + [_]
{Property Chars} = {Printable} - ["]

! TERMINALS

SectionName = {Section Chars}+ 
PropertyPart = '"' {Property Chars}* '"'

! RULES

<SectionList> ::= <Section>
               |  <Section> <SectionList>

<SectionBody> ::= <PropertyList>
               |  <SectionList>
               |  <PropertyList> <SectionList>

<Section> ::= SectionName '{' '}'
           |  SectionName '{' <SectionBody> '}'

<PropertyList> ::= <Property>
                |  <Property> <PropertyList>

<Property> ::= PropertyPart PropertyPart

没有错误,它可以很好地解析我的 2000 行测试文件。然而,这是我第一次编写自定义语法,所以我不确定我是否做得正确。

我可以对上面的语法进行任何改进吗?

I have to parse a file that looks like this:

versioninfo
{
    "editorversion" "400"
    "editorbuild" "4715"
}
visgroups
{
}
world
{
    "id" "1"
    "mapversion" "525"
    "classname" "worldspawn"
    solid
    {
        "id" "2"
        side
        {
            "id" "1"
            "plane" "(-544 -400 0) (-544 -240 0) (-272 -240 0)"
        }
        side
        {
            "id" "2"
            "plane" "(-544 -240 -16) (-544 -400 -16) (-272 -400 -16)"
        }
    }
}

I have a parser written from scratch, but it has a few bugs that I can't track down and I imagine it'll be difficult to maintain if the format changes in the future. I decided to use the GOLD Parsing System to generate a parser, instead. My grammar looks like this:

"Start Symbol" = <SectionList>

! SETS

{Section Chars} = {AlphaNumeric} + [_]
{Property Chars} = {Printable} - ["]

! TERMINALS

SectionName = {Section Chars}+ 
PropertyPart = '"' {Property Chars}* '"'

! RULES

<SectionList> ::= <Section>
               |  <Section> <SectionList>

<SectionBody> ::= <PropertyList>
               |  <SectionList>
               |  <PropertyList> <SectionList>

<Section> ::= SectionName '{' '}'
           |  SectionName '{' <SectionBody> '}'

<PropertyList> ::= <Property>
                |  <Property> <PropertyList>

<Property> ::= PropertyPart PropertyPart

There are no errors and it parses my 2000-line test file just fine. However, this is my first time writing a custom grammar, so I'm not sure if I'm doing it correctly.

Are there any improvements I could make to the grammar above?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

飘落散花 2024-09-18 15:15:11

以下是我要求更改以获得更好性能的一些更改

1)使语法左递归规则。这在进行移位归约操作方面更好,因为黄金解析器是移位归约 LR 解析器。

SectionList ::= Section

           |   SectionList Section

PropertyList ::= Property

            | PropertyList Property

2) 下面部分中的第三条规则强制您仅在sectionlist之前拥有propertylist,但不在不同的之间拥有propertylist。确保它符合要求

SectionBody ::= PropertyList

           |  SectionList

           |  PropertyList SectionList

如果需要的话我可以更好地帮助你,如果你让我知道语言说“它应该接受这个,不应该接受这个”而不是一个不会给出100%的示例输入你的语言的图片。或者让我知道您感受到的错误,我们也可以从中定义语言描述。

问候,
VM Rakesh([电子邮件受保护]

below are some changes i would request to change for better performance

1) make the grammar left recursive rules. this is better in terms of making shift reduce operations as gold parser is a shift reduce LR parser.

SectionList ::= Section

           |   SectionList Section

PropertyList ::= Property

            | PropertyList Property

2) third rule in below section forces you to have propertylist only before sectionlist but not between different 's. make sure its as per requirement

SectionBody ::= PropertyList

           |  SectionList

           |  PropertyList SectionList

i can help you better if required and if you let me know the language saying " it should accept this , shouldn't accept this" rather than a sample input which will not give 100% picture of your language. or let me know the bugs you felt from which we can define the language description also.

Regards,
V M Rakesh ([email protected])

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文