理解 Scala 解析器组合器中的波形符
我对 Scala 相当陌生,在阅读解析器组合器时(解析器组合器背后的魔力< /a>, 域特定语言Scala)我遇到了这样的方法定义:
def classPrefix = "class" ~ ID ~ "(" ~ formals ~ ")"
我一直在阅读 scala.util.parsing.Parsers 的 API 文档,它定义了一个名为 (tilde) 的方法,但我仍然不太明白它的用法上面的例子。 在该示例中 (波浪号) 是在 java.lang.String 上调用的方法,但该方法没有该方法并导致编译器失败。 我知道 (代字号) 被定义为,
case class ~ [+a, +b] (_1: a, _2: b)
但这在上面的示例中有何帮助?
如果有人能给我提示以了解这里发生的情况,我会很高兴。 预先非常感谢您!
扬
I'm fairly new to Scala and while reading about parser combinators(The Magic Behind Parser Combinators, Domain-Specific Languages in Scala) I came across method definitions like this:
def classPrefix = "class" ~ ID ~ "(" ~ formals ~ ")"
I've been reading throught the API doc of scala.util.parsing.Parsers which defines a method named (tilde) but I still dont't really understand its usage in the example above.
In that example (tilde) is a method which is called on java.lang.String which doesn't have that method and causes the compiler to fail.
I know that (tilde) is defined as
case class ~ [+a, +b] (_1: a, _2: b)
but how does this help in the example above?
I'd be happy if someone could give me a hint to understand what's going on here.
Thank you very much in advance!
Jan
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
解析器上的
~
方法将两个解析器合二为一,依次应用两个原始解析器并返回两个结果。这可能很简单(在Parser[T]
中)如果您从未组合过两个以上的解析器,那就没问题。但是,如果您链接其中三个:
p1
、p2
、p3
,返回类型为T1
、T2
、T3
,然后是p1 ~ p2 ~ p3
,即p1.~(p2).~(p3)
属于类型解析器[((T1, T2), T3)]
。如果像示例中那样将其中的五个组合起来,那就是Parser[((((T1, T2), T3), T4), T5)]
。然后,当您对结果进行模式匹配时,您也会有所有这些括号:这非常不舒服。
然后是一个聪明的语法技巧。当案例类有两个参数时,它可以出现在模式中的中缀位置而不是前缀位置。也就是说,如果你有
case class X(a: A, b: B)
,您可以与case X(a, b)
进行模式匹配,也可以与case a X b 进行模式匹配
。 (这就是使用模式x::xs
来匹配非空列表,::
是一个案例类)。当你写 case
a ~ b ~ c
时,它的意思是case ~(~(a,b), c)
,但是更令人愉快,而且比更令人愉快code>case ((a,b), c)
也是如此,这很难正确完成。因此 Parser 中的
~
方法返回一个Parser[~[T,U]]
而不是Parser[(T,U)]
,这样您就可以轻松地对多个 ~ 的结果进行模式匹配。除此之外,~[T,U]
和(T,U)
几乎是相同的东西,尽可能同构。解析器中的组合方法和结果类型选择相同的名称,因为生成的代码易于阅读。人们立即看到结果处理中的每个部分如何与语法规则的项目相关。
选择 Tilda 是因为它的优先级(紧密绑定)与解析器上的其他运算符配合得很好。
最后一点,辅助运算符
~>
和<~
会丢弃其中一个操作数的结果,通常是规则中不包含有用内容的常量部分数据。因此,人们宁愿在结果中只写入并获取 ID 和形式的值。
The
~
method on parser combines two parser in one which applies the two original parsers successively and returns the two results. That could be simply (inParser[T]
)If you never combined more than two parsers, that would be ok. However, if you chain three of them,
p1
,p2
,p3
, with return typesT1
,T2
,T3
, thenp1 ~ p2 ~ p3
, which meansp1.~(p2).~(p3)
is of typeParser[((T1, T2), T3)]
. And if you combine five of them as in your example, that would beParser[((((T1, T2), T3), T4), T5)]
. Then when you pattern match on the result, you would have all those parantheses too :This is quite uncomfortable.
Then comes a clever syntactic trick. When a case class has two parameters, it can appears in infix rather than prefix position in a pattern. That is, if you have
case class X(a: A, b: B)
, you can pattern match withcase X(a, b)
, but also withcase a X b
. (That is what is done with a patternx::xs
to match a non empty List,::
is a case class).When you write case
a ~ b ~ c
, it meanscase ~(~(a,b), c)
, but is much more pleasant, and more pleasant thancase ((a,b), c)
too, which is tricky to get right.So the
~
method in Parser returns aParser[~[T,U]]
instead of aParser[(T,U)]
, so you can pattern match easily on the result of multiple ~. Beside that,~[T,U]
and(T,U)
are pretty much the same thing, as isomorphic as you can get.The same name is chosen for the combining method in parser and for the result type, because the resulting code is natural to read. One sees immediately how each part in the result processing relates to the items of the grammar rule.
Tilda is chosen because its precedence (it binds tightly) plays nicely with the other operators on parser.
One last point, there are auxiliary operators
~>
and<~
which discard the result of one of the operand, typically the constant parts in the rule which carries no useful data. So one would rather writeand get only the values of ID and formals in the result.
您应该查看 Parsers.Parser。 Scala 有时会定义具有相同名称的方法和案例类来帮助模式匹配等,如果您正在阅读 Scaladoc,这会有点令人困惑。
特别是,
"class" ~ ID
与"class".~(ID)
相同。~
是一种将解析器与另一个解析器按顺序组合的方法。有 在
RegexParsers
中定义的隐式转换,自动从String
创建解析器 价值。因此,"class"
自动成为Parser[String]
的实例。RegexParsers
还定义了另一种隐式转换,可以根据Regex
值自动创建解析器。因此,ID
也自动成为Parser[String]
的实例。通过组合两个解析器,
"class" ~ ID
返回一个与文字“class”匹配的Parser[String]
,然后是正则表达式ID
依次出现。还有其他方法,例如|
和|||
。有关更多信息,请阅读Scala 编程。You should checkout Parsers.Parser. Scala sometimes defines method and case class with the same name to aid pattern matching etc, and it's a little confusing if you're reading the Scaladoc.
In particular,
"class" ~ ID
is same as"class".~(ID)
.~
is a method that combines the parser with another parser sequentially.There's an implicit conversion defined in
RegexParsers
that automatically creates a parser from aString
value. So,"class"
automatically becomes an instance ofParser[String]
.RegexParsers
also defines another implicit conversion that automatically creates parser from aRegex
value. So,ID
automatically becomes an instance ofParser[String]
too.By combining two parsers,
"class" ~ ID
returns aParser[String]
that matches the literal "class" and then the regular expressionID
appearing sequentially. There are other methods like|
and|||
. For more info, read Programming in Scala.这里的结构有点棘手。首先,请注意,您总是在某个解析器的子类中定义这些内容,例如
class MyParser extends RegexParsers
。现在,您可能会注意到 RegexParsers 中的两个隐式定义:它们的作用是获取任何字符串或正则表达式,并将它们转换为与该字符串或正则表达式作为标记匹配的解析器。它们是隐式的,因此它们会在需要时随时应用(例如,如果您在
Parser[String]
上调用String
(或正则表达式
)没有)。但是这个
Parser
是什么东西呢?它是在Parsers
中定义的内部类,是RegexParser
的超级特征:看起来它是一个接受输入并将其映射到结果的函数。嗯,这是有道理的!您可以查看它的文档 此处。
现在我们可以查找
~
方法:因此,如果我们看到类似的情况
,首先,
"fish"
没有~ 方法,因此它会隐式转换为
Parser[String]
。然后,~
方法需要一个Parser[U]
类型的参数,因此我们将"swim"
隐式转换为Parser[String ]
(即U
==String
)。现在我们有了与输入"fish"
相匹配的内容,输入中剩下的任何内容都应与"swim"
相匹配,如果两者都是这种情况,则seaFacts
将成功匹配。The structure here is a little bit tricky. First, notice that you always define these things inside a subclass of some parser, e.g.
class MyParser extends RegexParsers
. Now, you may note two implicit definitions insideRegexParsers
:What these will do is take any string or regex and convert them into a parser that matches that string or that regex as a token. They're implicit, so they'll be applied any time they're needed (e.g. if you call a method on
Parser[String]
thatString
(orRegex
) does not have).But what is this
Parser
thing? It's an inner class defined insideParsers
, the supertrait forRegexParser
:Looks like it's a function that takes input and maps it to a result. Well, that makes sense! And you can see the documentation for it here.
Now we can just look up the
~
method:So, if we see something like
what happens is, first,
"fish"
does not have the~
method, so it's implicitly converted toParser[String]
which does. The~
method then wants an argument of typeParser[U]
, and so we implicitly convert"swim"
intoParser[String]
(i.e.U
==String
). Now we have something that will match an input"fish"
, and whatever is left in the input should match"swim"
, and if both are the case, thenseaFacts
will succeed in its match.