我相信这应该是 Treetop 的一项规则
我在 Treetop 中有一对有效的规则,我的完美主义者认为应该是唯一的一个规则,或者至少是更漂亮的规则:
rule _
crap
/
" "*
end
rule crap
" "* "\\x0D\\x0A"* " "*
end
我正在解析一些表达式,它们时不时地以“\x0D\x0A”结尾”。是的,不是“\r\n”,而是“\x0D\x0A”。在某些时候有些东西被双重逃脱了。很长的故事。
这条规则确实有效,但它很丑陋,而且让我烦恼。我尝试了这个:
rule _
" "* "\\x0D\\x0A"* " "*
/
" "*
end
这导致
SyntaxError: (eval):1276:in `load_from_string': compile error
(eval):1161: class/module name must be CONSTANT
from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:42:in `load_from_string'
from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:35:in `load'
from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:32:in `open'
from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:32:in `load'
理想情况下我想实际写一些类似的东西:
rule _
(" " | "\\x0D\\x0A")*
end
但这不起作用,当我们这样做时,我还发现每个规则不能只有一个 *:
rule _
" "*
/
"\n"*
end
它将匹配“” ,但从不 \n。
I have this working pair of rules in Treetop that the perfectionist in me believes should be one and only one rule, or maybe something more beautiful at least:
rule _
crap
/
" "*
end
rule crap
" "* "\\x0D\\x0A"* " "*
end
I'm parsing some expressions that every now and then ended up with "\x0D\x0A". Yeah, not "\r\n" but "\x0D\x0A". Something was double escaped at some point. Long story.
That rule works, but it's ugly and it bothers me. I tried this:
rule _
" "* "\\x0D\\x0A"* " "*
/
" "*
end
which caused
SyntaxError: (eval):1276:in `load_from_string': compile error
(eval):1161: class/module name must be CONSTANT
from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:42:in `load_from_string'
from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:35:in `load'
from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:32:in `open'
from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:32:in `load'
Ideally I would like to actually write something like:
rule _
(" " | "\\x0D\\x0A")*
end
but that doesn't work, and while we are at it, I also discovered that you can't have only one * per rule:
rule _
" "*
/
"\n"*
end
that will match " ", but never \n.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我看到您使用了三个不同的
OR
字符:/
、|
和\
(其中只有第一个表示OR
)。这工作正常:
打印:
I see you're using three different
OR
chars:/
,|
and\
(of which only the first meansOR
).This works fine:
prints:
你说“我还发现每条规则不能只有一个 *”(你的意思是:你可以有),“它将匹配“”,但永远不会匹配\n”。
当然;当规则匹配零空格字符时,该规则成功。您可以只使用 + 来代替:
如果您想匹配任意数量的空格或换行符,您还可以将空格字符放在括号中:
您的错误“类/模块名称必须是 CONSTANT”是因为规则名称用作模块名称的前缀,用于包含附加到规则的任何方法。模块名称不能以下划线开头,因此您不能在名称以下划线开头的规则中使用方法。
You said "I also discovered that you can't have only one * per rule" (you mean: you CAN have), "that will match " ", but never \n".
Of course; the rule succeeds when it matches zero space characters. You could just use a + instead:
You could also parenthesise the space characters if you want to match any number of space-or-newline characters:
Your error "class/module name must be CONSTANT" is because the rule name is used as the prefix of a module name to contain any methods attached to your rule. A module name may not begin with an underscore, so you can't use methods in a rule whose name begins with an underscore.