树顶SGF解析

发布于 2024-07-13 22:20:57 字数 1454 浏览 10 评论 0原文

我目前正在尝试编写一个 Treetop 语法来解析简单游戏格式文件,并且到目前为止它大部分工作正常。 然而,也出现了一些问题。

  1. 我不确定如何实际访问解析后生成的 Treetop 结构。
  2. 有没有比我的字符规则更好的方法来处理捕获所有字符?
  3. 有一个评论我似乎无法正确书写。

    C[player1 [4k\]: hi player2 [3k\]: hi!]

我无法理解如何处理其中包含 [] 的 C[] 节点的嵌套结构。

以下是我目前的进展。

sgf-grammar.treetop

grammar SgfGrammar
rule node
    '(' chunk* ')' {
        def value
            text_value
        end
    }
end

rule chunk
    ';' property_set* {
        def value
            text_value
        end
    }
end

rule property_set
    property ('[' property_data ']')* / property '[' property_data ']' {
        def value
            text_value
        end
    }
end

rule property_data
    chars '[' (!'\]' . )* '\]' chars / chars / empty {
        def value
            text_value
        end
    }
end

rule property
    [A-Z]+ / [A-Z] {
        def value
            text_value
        end
    }
end

rule chars
    [a-zA-Z0-9_/\-:;|'"\\<>(){}!@#$%^&\*\+\-,\.\?!= \r\n\t]*
end

rule empty
    ''
end
end

以及我的测试用例,当前排除具有上述嵌套括号问题的 C[] 节点:

example.rb

require 'rubygems'
require 'treetop'
require 'sgf-grammar'

parser = SgfGrammarParser.new
parser.parse("(;GM[1]FF[4]CA[UTF-8]AP[CGoban:3]ST[2]
RU[Japanese]SZ[19]KM[0.50]TM[1800]OT[5x30 byo-yomi]
PW[stoic]PB[bojo]WR[3k]BR[4k]DT[2008-11-30]RE[B+2.50])")

I am currently trying to write a Treetop grammar to parse Simple Game Format files, and have it mostly working so far. However, there are a few questions that have come up.

  1. I am unsure how to actually access the structure Treetop generates after a parse.
  2. Is there a better way to handle capturing all characters than my chars rule?
  3. There is a case for comments that I can't seem to write correctly.

    C[player1 [4k\]: hi player2 [3k\]: hi!]

I can't wrap my head around how to deal with the nested structure of the C[] node with []'s inside them.

The following is my current progress.

sgf-grammar.treetop

grammar SgfGrammar
rule node
    '(' chunk* ')' {
        def value
            text_value
        end
    }
end

rule chunk
    ';' property_set* {
        def value
            text_value
        end
    }
end

rule property_set
    property ('[' property_data ']')* / property '[' property_data ']' {
        def value
            text_value
        end
    }
end

rule property_data
    chars '[' (!'\]' . )* '\]' chars / chars / empty {
        def value
            text_value
        end
    }
end

rule property
    [A-Z]+ / [A-Z] {
        def value
            text_value
        end
    }
end

rule chars
    [a-zA-Z0-9_/\-:;|'"\\<>(){}!@#$%^&\*\+\-,\.\?!= \r\n\t]*
end

rule empty
    ''
end
end

And my test case, currently excluding C[] nodes with the above mentioned nested bracket problem:

example.rb

require 'rubygems'
require 'treetop'
require 'sgf-grammar'

parser = SgfGrammarParser.new
parser.parse("(;GM[1]FF[4]CA[UTF-8]AP[CGoban:3]ST[2]
RU[Japanese]SZ[19]KM[0.50]TM[1800]OT[5x30 byo-yomi]
PW[stoic]PB[bojo]WR[3k]BR[4k]DT[2008-11-30]RE[B+2.50])")

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

你是我的挚爱i 2024-07-20 22:20:57
  1. 该结构以 SyntaxNodes 树的形式返回给您(如果结果为 nil,请检查 parser.failure_reason)。 您可以遍历这棵树,或者(这是推荐的)您可以使用执行您想要的操作的函数来增强它,并且只需在根上调用您的主函数。

如果您的意思是“如何从节点函数内访问组件?” 有几种方法。 您可以使用 element[x] 符号或规则来获取它们:

rule url_prefix
    protocol "://" host_name {
       def example
           assert element[0] == protocol
           assert element[2] == host_name
           unless protocol.text_value == "http"
               print "#{protocol.text_value} not supported" 
               end
           end
       }

您也可以像这样命名它们:

rule phone_number
    "(" area_code:( digit digit digit ) ")" ...

然后按名称引用它们。

  1. 如果您只想匹配这些字符,那么您的字符规则看起来不错。 如果您想匹配任何字符,您只需使用点(.),就像在正则表达式中一样。

  2. 我不熟悉您尝试解析的语言,但您正在寻找的规则可能类似于:

规则注释 
      “C”平衡方括号字符串 
      结尾 
  规则平衡方括号字符串 
      “[”([^\[\]]/balanced_square_bracket_string)*“]” 
      结尾 
  

第二条规则的中间部分匹配除方括号或带有balanced_方括号的嵌套字符串之外的任何内容。

PS 有一个相当活跃的 Google 群组,有在线档案和在线档案。 可搜索。

  1. The structure comes back to you as a tree of SyntaxNodes (if the result is nil, check parser.failure_reason). You can walk this tree or (and this is recommended) you can augment it with functions that do what you want and just call your main function on the root.

If what you mean is "how do you access the components from within a node function?" there are several ways. You can get at them with the element[x] notation or by rule:

rule url_prefix
    protocol "://" host_name {
       def example
           assert element[0] == protocol
           assert element[2] == host_name
           unless protocol.text_value == "http"
               print "#{protocol.text_value} not supported" 
               end
           end
       }

You can also name them like so:

rule phone_number
    "(" area_code:( digit digit digit ) ")" ...

and then refer to them by name.

  1. Your chars rule looks fine if you only want to match those characters. If you want to match any character you can just use a dot (.) like in a regular expression.

  2. I'm not familiar with the language you are trying to parse, but the rule you are looking for may be something like:

rule comment
    "C" balanced_square_bracket_string
    end
rule balanced_square_bracket_string
    "[" ( [^\[\]]  / balanced_square_bracket_string )* "]"
    end

The middle part of the second rule matches anything that isn't a square bracket or a nested string with balanced_square brackets.

P.S. There is a fairly active Google group, with archives online & searchable.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文