我相信这应该是 Treetop 的一项规则

发布于 2024-11-19 22:57:35 字数 1073 浏览 2 评论 0原文

我在 Treetop 中有一对有效的规则，我的完美主义者认为应该是唯一的一个规则，或者至少是更漂亮的规则：

rule _
  crap
  /
  " "*
end

rule crap
  " "* "\\x0D\\x0A"* " "*
end

我正在解析一些表达式，它们时不时地以“\x0D\x0A”结尾”。是的，不是“\r\n”，而是“\x0D\x0A”。在某些时候有些东西被双重逃脱了。很长的故事。

这条规则确实有效，但它很丑陋，而且让我烦恼。我尝试了这个：

rule _
  " "* "\\x0D\\x0A"* " "*
  /
  " "*
end

这导致

SyntaxError: (eval):1276:in `load_from_string': compile error
(eval):1161: class/module name must be CONSTANT
    from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:42:in `load_from_string'
    from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:35:in `load'
    from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:32:in `open'
    from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:32:in `load'

理想情况下我想实际写一些类似的东西：

rule _
  (" " | "\\x0D\\x0A")*
end

但这不起作用，当我们这样做时，我还发现每个规则不能只有一个 *：

rule _
  " "*
  /
  "\n"*
end

它将匹配“” ，但从不 \n。

原文

I have this working pair of rules in Treetop that the perfectionist in me believes should be one and only one rule, or maybe something more beautiful at least:

rule _
  crap
  /
  " "*
end

rule crap
  " "* "\\x0D\\x0A"* " "*
end

I'm parsing some expressions that every now and then ended up with "\x0D\x0A". Yeah, not "\r\n" but "\x0D\x0A". Something was double escaped at some point. Long story.

That rule works, but it's ugly and it bothers me. I tried this:

rule _
  " "* "\\x0D\\x0A"* " "*
  /
  " "*
end

which caused

SyntaxError: (eval):1276:in `load_from_string': compile error
(eval):1161: class/module name must be CONSTANT
    from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:42:in `load_from_string'
    from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:35:in `load'
    from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:32:in `open'
    from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:32:in `load'

Ideally I would like to actually write something like:

rule _
  (" " | "\\x0D\\x0A")*
end

but that doesn't work, and while we are at it, I also discovered that you can't have only one * per rule:

rule _
  " "*
  /
  "\n"*
end

that will match " ", but never \n.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

深巷少女 2024-11-26 22:57:36

我看到您使用了三个不同的 OR 字符：/、| 和 \ （其中只有第一个表示OR）。

这工作正常：

grammar Language

  rule crap
    (" " / "\\x0D\\x0A")* {
      def value
        text_value    
      end
    }
  end

end

#!/usr/bin/env ruby

require 'rubygems'
require 'treetop'
require 'polyglot'
require 'language'

parser = LanguageParser.new
value = parser.parse(' \\x0D\\x0A   \\x0D\\x0A   ').value
print '>' + value + '<'

打印：

> \x0D\x0A   \x0D\x0A   <

I see you're using three different OR chars: /, | and \ (of which only the first means OR).

This works fine:

grammar Language

  rule crap
    (" " / "\\x0D\\x0A")* {
      def value
        text_value    
      end
    }
  end

end

#!/usr/bin/env ruby

require 'rubygems'
require 'treetop'
require 'polyglot'
require 'language'

parser = LanguageParser.new
value = parser.parse(' \\x0D\\x0A   \\x0D\\x0A   ').value
print '>' + value + '<'

prints:

> \x0D\x0A   \x0D\x0A   <

回复收藏 0 原文

浪漫人生路 2024-11-26 22:57:36

你说“我还发现每条规则不能只有一个 *”（你的意思是：你可以有），“它将匹配“”，但永远不会匹配\n”。

当然;当规则匹配零空格字符时，该规则成功。您可以只使用 + 来代替：

rule _
  " "+
  /
  "\n"*
end

如果您想匹配任意数量的空格或换行符，您还可以将空格字符放在括号中：

rule _
  (" " / "\n")*
end

您的错误“类/模块名称必须是 CONSTANT”是因为规则名称用作模块名称的前缀，用于包含附加到规则的任何方法。模块名称不能以下划线开头，因此您不能在名称以下划线开头的规则中使用方法。

You said "I also discovered that you can't have only one * per rule" (you mean: you CAN have), "that will match " ", but never \n".

Of course; the rule succeeds when it matches zero space characters. You could just use a + instead:

rule _
  " "+
  /
  "\n"*
end

You could also parenthesise the space characters if you want to match any number of space-or-newline characters:

rule _
  (" " / "\n")*
end

Your error "class/module name must be CONSTANT" is because the rule name is used as the prefix of a module name to contain any methods attached to your rule. A module name may not begin with an underscore, so you can't use methods in a rule whose name begins with an underscore.

回复收藏 0 原文

~没有更多了~