理解 Scala 解析器组合器中的波形符

发布于 2024-11-25 19:52:13 字数 638 浏览 1 评论 0原文

我对 Scala 相当陌生，在阅读解析器组合器时（解析器组合器背后的魔力< /a>, 域特定语言Scala）我遇到了这样的方法定义：

def classPrefix = "class" ~ ID ~ "(" ~ formals ~ ")"

我一直在阅读 scala.util.parsing.Parsers 的 API 文档，它定义了一个名为 (tilde) 的方法，但我仍然不太明白它的用法上面的例子。在该示例中 (波浪号) 是在 java.lang.String 上调用的方法，但该方法没有该方法并导致编译器失败。我知道 (代字号) 被定义为，

case class ~ [+a, +b] (_1: a, _2: b)

但这在上面的示例中有何帮助？

如果有人能给我提示以了解这里发生的情况，我会很高兴。预先非常感谢您！

扬

原文

I'm fairly new to Scala and while reading about parser combinators(The Magic Behind Parser Combinators, Domain-Specific Languages in Scala) I came across method definitions like this:

def classPrefix = "class" ~ ID ~ "(" ~ formals ~ ")"

I've been reading throught the API doc of scala.util.parsing.Parsers which defines a method named (tilde) but I still dont't really understand its usage in the example above.
In that example (tilde) is a method which is called on java.lang.String which doesn't have that method and causes the compiler to fail.
I know that (tilde) is defined as

case class ~ [+a, +b] (_1: a, _2: b)

but how does this help in the example above?

I'd be happy if someone could give me a hint to understand what's going on here.
Thank you very much in advance!

Jan

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

明媚如初 2024-12-02 19:52:14

解析器上的 ~ 方法将两个解析器合二为一，依次应用两个原始解析器并返回两个结果。这可能很简单（在 Parser[T] 中）

def ~[U](q: =>Parser[U]): Parser[(T,U)].

如果您从未组合过两个以上的解析器，那就没问题。但是，如果您链接其中三个：p1、p2、p3，返回类型为T1、T2、T3，然后是p1 ~ p2 ~ p3，即p1.~(p2).~(p3)属于类型解析器[((T1, T2), T3)]。如果像示例中那样将其中的五个组合起来，那就是 Parser[((((T1, T2), T3), T4), T5)]。然后，当您对结果进行模式匹配时，您也会有所有这些括号：

case ((((_, id), _), formals), _) => ...

这非常不舒服。

然后是一个聪明的语法技巧。当案例类有两个参数时，它可以出现在模式中的中缀位置而不是前缀位置。也就是说，如果你有
case class X(a: A, b: B)，您可以与 case X(a, b) 进行模式匹配，也可以与 case a X b 进行模式匹配。（这就是使用模式 x::xs 来匹配非空列表，:: 是一个案例类）。
当你写 case a ~ b ~ c 时，它的意思是 case ~(~(a,b), c)，但是更令人愉快，而且比 更令人愉快code>case ((a,b), c) 也是如此，这很难正确完成。

因此 Parser 中的 ~ 方法返回一个 Parser[~[T,U]] 而不是 Parser[(T,U)]，这样您就可以轻松地对多个 ~ 的结果进行模式匹配。除此之外，~[T,U] 和 (T,U) 几乎是相同的东西，尽可能同构。

解析器中的组合方法和结果类型选择相同的名称，因为生成的代码易于阅读。人们立即看到结果处理中的每个部分如何与语法规则的项目相关。

parser1 ~ parser2 ~ parser3 ^^ {case part1 ~ part2 ~ part3 => ...}

选择 Tilda 是因为它的优先级（紧密绑定）与解析器上的其他运算符配合得很好。

最后一点，辅助运算符 ~> 和 <~ 会丢弃其中一个操作数的结果，通常是规则中不包含有用内容的常量部分数据。因此，人们宁愿

"class" ~> ID <~ ")" ~ formals <~ ")"

在结果中只写入并获取 ID 和形式的值。

The ~ method on parser combines two parser in one which applies the two original parsers successively and returns the two results. That could be simply (in Parser[T])

def ~[U](q: =>Parser[U]): Parser[(T,U)].

If you never combined more than two parsers, that would be ok. However, if you chain three of them, p1, p2, p3, with return types T1, T2, T3, then p1 ~ p2 ~ p3, which means p1.~(p2).~(p3) is of type Parser[((T1, T2), T3)]. And if you combine five of them as in your example, that would be Parser[((((T1, T2), T3), T4), T5)]. Then when you pattern match on the result, you would have all those parantheses too :

case ((((_, id), _), formals), _) => ...

This is quite uncomfortable.

Then comes a clever syntactic trick. When a case class has two parameters, it can appears in infix rather than prefix position in a pattern. That is, if you have
case class X(a: A, b: B), you can pattern match with case X(a, b), but also with case a X b. (That is what is done with a pattern x::xs to match a non empty List, :: is a case class).
When you write case a ~ b ~ c, it means case ~(~(a,b), c), but is much more pleasant, and more pleasant than case ((a,b), c) too, which is tricky to get right.

So the ~ method in Parser returns a Parser[~[T,U]] instead of a Parser[(T,U)], so you can pattern match easily on the result of multiple ~. Beside that, ~[T,U] and (T,U) are pretty much the same thing, as isomorphic as you can get.

The same name is chosen for the combining method in parser and for the result type, because the resulting code is natural to read. One sees immediately how each part in the result processing relates to the items of the grammar rule.

parser1 ~ parser2 ~ parser3 ^^ {case part1 ~ part2 ~ part3 => ...}

Tilda is chosen because its precedence (it binds tightly) plays nicely with the other operators on parser.

One last point, there are auxiliary operators ~> and <~ which discard the result of one of the operand, typically the constant parts in the rule which carries no useful data. So one would rather write

"class" ~> ID <~ ")" ~ formals <~ ")"

and get only the values of ID and formals in the result.

回复收藏 0 原文

人心善变 2024-12-02 19:52:14

您应该查看 Parsers.Parser。 Scala 有时会定义具有相同名称的方法和案例类来帮助模式匹配等，如果您正在阅读 Scaladoc，这会有点令人困惑。

特别是，"class" ~ ID 与"class".~(ID) 相同。 ~ 是一种将解析器与另一个解析器按顺序组合的方法。

有在 RegexParsers 中定义的隐式转换，自动从 String 创建解析器价值。因此，"class" 自动成为 Parser[String] 的实例。

val ID = """[a-zA-Z]([a-zA-Z0-9]|_[a-zA-Z0-9])*"""r

RegexParsers 还定义了另一种隐式转换，可以根据 Regex 值自动创建解析器。因此，ID 也自动成为 Parser[String] 的实例。

通过组合两个解析器，"class" ~ ID 返回一个与文字“class”匹配的 Parser[String]，然后是正则表达式 ID依次出现。还有其他方法，例如 | 和 |||。有关更多信息，请阅读Scala 编程。

You should checkout Parsers.Parser. Scala sometimes defines method and case class with the same name to aid pattern matching etc, and it's a little confusing if you're reading the Scaladoc.

In particular, "class" ~ ID is same as "class".~(ID). ~ is a method that combines the parser with another parser sequentially.

There's an implicit conversion defined in RegexParsers that automatically creates a parser from a String value. So, "class" automatically becomes an instance of Parser[String].

val ID = """[a-zA-Z]([a-zA-Z0-9]|_[a-zA-Z0-9])*"""r

RegexParsers also defines another implicit conversion that automatically creates parser from a Regex value. So, ID automatically becomes an instance of Parser[String] too.

By combining two parsers, "class" ~ ID returns a Parser[String] that matches the literal "class" and then the regular expression ID appearing sequentially. There are other methods like | and |||. For more info, read Programming in Scala.

回复收藏 0 原文

被翻牌 2024-12-02 19:52:13

这里的结构有点棘手。首先，请注意，您总是在某个解析器的子类中定义这些内容，例如class MyParser extends RegexParsers。现在，您可能会注意到 RegexParsers 中的两个隐式定义：

implicit def literal (s: String): Parser[String]
implicit def regex (r: Regex): Parser[String]

它们的作用是获取任何字符串或正则表达式，并将它们转换为与该字符串或正则表达式作为标记匹配的解析器。它们是隐式的，因此它们会在需要时随时应用（例如，如果您在 Parser[String] 上调用 String （或 正则表达式）没有）。

但是这个Parser是什么东西呢？它是在 Parsers 中定义的内部类，是 RegexParser 的超级特征：

class Parser [+T] extends (Input) ⇒ ParseResult[T]

看起来它是一个接受输入并将其映射到结果的函数。嗯，这是有道理的！您可以查看它的文档此处。

现在我们可以查找 ~ 方法：

def ~ [U] (q: ⇒ Parser[U]): Parser[~[T, U]]
  A parser combinator for sequential composition
  p ~ q' succeeds if p' succeeds and q' succeeds on the input left over by p'.

因此，如果我们看到类似的情况

def seaFacts = "fish" ~ "swim"

，首先，"fish" 没有 ~ 方法，因此它会隐式转换为 Parser[String] 。然后，~ 方法需要一个 Parser[U] 类型的参数，因此我们将 "swim" 隐式转换为 Parser[String ]（即U == String）。现在我们有了与输入 "fish" 相匹配的内容，输入中剩下的任何内容都应与 "swim" 相匹配，如果两者都是这种情况，则 seaFacts 将成功匹配。

The structure here is a little bit tricky. First, notice that you always define these things inside a subclass of some parser, e.g. class MyParser extends RegexParsers. Now, you may note two implicit definitions inside RegexParsers:

implicit def literal (s: String): Parser[String]
implicit def regex (r: Regex): Parser[String]

What these will do is take any string or regex and convert them into a parser that matches that string or that regex as a token. They're implicit, so they'll be applied any time they're needed (e.g. if you call a method on Parser[String] that String (or Regex) does not have).

But what is this Parser thing? It's an inner class defined inside Parsers, the supertrait for RegexParser:

class Parser [+T] extends (Input) ⇒ ParseResult[T]

Looks like it's a function that takes input and maps it to a result. Well, that makes sense! And you can see the documentation for it here.

Now we can just look up the ~ method:

def ~ [U] (q: ⇒ Parser[U]): Parser[~[T, U]]
  A parser combinator for sequential composition
  p ~ q' succeeds if p' succeeds and q' succeeds on the input left over by p'.

So, if we see something like

def seaFacts = "fish" ~ "swim"

what happens is, first, "fish" does not have the ~ method, so it's implicitly converted to Parser[String] which does. The ~ method then wants an argument of type Parser[U], and so we implicitly convert "swim" into Parser[String] (i.e. U == String). Now we have something that will match an input "fish", and whatever is left in the input should match "swim", and if both are the case, then seaFacts will succeed in its match.