Treetop ruby 解析器 - 无法解析有序选择

发布于 2024-10-08 14:07:56 字数 522 浏览 7 评论 0原文

我定义了使用 Treetop 解析字符串和数字的简单语法，如下所示。

grammar Simple
    rule value
        number / string
    end 

    rule string
        word space string
        /
        word
    end

    rule word
        [0-9a-zA-Z]+
    end

    rule number
        [1-9] [0-9]*
    end

    rule space
        ' '+
    end
end

Ruby：

parser = SimpleParser.new
parser.parse('123abc wer') # => nil

我希望解析器返回字符串节点，但看起来解析器无法理解输入。任何想法将不胜感激。

原文

I have defined simple grammar for parsing string and number using Treetop as below.

grammar Simple
    rule value
        number / string
    end 

    rule string
        word space string
        /
        word
    end

    rule word
        [0-9a-zA-Z]+
    end

    rule number
        [1-9] [0-9]*
    end

    rule space
        ' '+
    end
end

Ruby:

parser = SimpleParser.new
parser.parse('123abc wer') # => nil

I expect the parser to return string node but look like the parser could not understand the input. Any idea would be appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦断已成空 2024-10-15 14:07:56

在 Treetop（实际上是一般的 PEG）中，选择运算符是有序的，这与大多数其他解析形式不同。

因此，

rule value
  number / string
end

您告诉 Treetop 您更喜欢数字而不是字符串。

您的输入以 1 开头，它与 number 和 string 匹配两者（通过 word ），但您告诉 Treetop 更喜欢 number 解释，因此它将其解析为 number。当涉及到输入中的a时，它没有更多的规则可以应用，因此它什么也没有返回（nil），因为在Treetop中，不这样做是一个错误消耗整个输入流。

如果您只是反转选择的顺序，整个输入将被解释为 string 而不是 number：

SyntaxNode+String0 offset=0, "123abc wer" (word,space,string):
  SyntaxNode offset=0, "123abc":
    SyntaxNode offset=0, "1"
    SyntaxNode offset=1, "2"
    SyntaxNode offset=2, "3"
    SyntaxNode offset=3, "a"
    SyntaxNode offset=4, "b"
    SyntaxNode offset=5, "c"
  SyntaxNode offset=6, " ":
    SyntaxNode offset=6, " "
  SyntaxNode offset=7, "wer":
    SyntaxNode offset=7, "w"
    SyntaxNode offset=8, "e"
    SyntaxNode offset=9, "r"

或者，您可以保持原样，但允许要多次匹配的 value 规则。插入一个新的顶级规则，如下所示：

rule values
  value+
end

或修改 value 规则，如下所示：

rule value
  (number / string)+
end

这将为您提供大致如下所示的 AST：

SyntaxNode offset=0, "123abc wer":
  SyntaxNode+Number0 offset=0, "123":
    SyntaxNode offset=0, "1"
    SyntaxNode offset=1, "23":
      SyntaxNode offset=1, "2"
      SyntaxNode offset=2, "3"
      SyntaxNode+String0 offset=3, "abc wer" (word,space,string):
        SyntaxNode offset=3, "abc":
          SyntaxNode offset=3, "a"
          SyntaxNode offset=4, "b"
      SyntaxNode offset=5, "c"
    SyntaxNode offset=6, " ":
      SyntaxNode offset=6, " "
    SyntaxNode offset=7, "wer":
      SyntaxNode offset=7, "w"
      SyntaxNode offset=8, "e"
      SyntaxNode offset=9, "r"

In Treetop (and PEGs in general, actually) the choice operator is ordered, unlike most other parsing formalisms.

So, in

rule value
  number / string
end

you are telling Treetop that you prefer number over string.

Your input starts with 1, which matches both number and string (through word), but you told Treetop to prefer the number interpretation, so it parses it as a number. When it comes to the a in the input, it has no more rules to apply, and thus it returns nothing (nil), because in Treetop it is an error to not consume the entire input stream.

If you simply reverse the order of the choice, the entire input will interpreted as a string instead of a number:

SyntaxNode+String0 offset=0, "123abc wer" (word,space,string):
  SyntaxNode offset=0, "123abc":
    SyntaxNode offset=0, "1"
    SyntaxNode offset=1, "2"
    SyntaxNode offset=2, "3"
    SyntaxNode offset=3, "a"
    SyntaxNode offset=4, "b"
    SyntaxNode offset=5, "c"
  SyntaxNode offset=6, " ":
    SyntaxNode offset=6, " "
  SyntaxNode offset=7, "wer":
    SyntaxNode offset=7, "w"
    SyntaxNode offset=8, "e"
    SyntaxNode offset=9, "r"

Or, you could keep the order as it is, but allow the value rule to be matched multiple times. Either insert a new top-level rule like this:

rule values
  value+
end

or modify the value rule like this:

rule value
  (number / string)+
end

Which will give you an AST roughly like this:

SyntaxNode offset=0, "123abc wer":
  SyntaxNode+Number0 offset=0, "123":
    SyntaxNode offset=0, "1"
    SyntaxNode offset=1, "23":
      SyntaxNode offset=1, "2"
      SyntaxNode offset=2, "3"
      SyntaxNode+String0 offset=3, "abc wer" (word,space,string):
        SyntaxNode offset=3, "abc":
          SyntaxNode offset=3, "a"
          SyntaxNode offset=4, "b"
      SyntaxNode offset=5, "c"
    SyntaxNode offset=6, " ":
      SyntaxNode offset=6, " "
    SyntaxNode offset=7, "wer":
      SyntaxNode offset=7, "w"
      SyntaxNode offset=8, "e"
      SyntaxNode offset=9, "r"

回复收藏 0 原文

~没有更多了~