Scala 中的正则表达式和模式匹配第二部分

发布于 2024-12-12 10:18:22 字数 706 浏览 3 评论 0原文

作为这个问题的后续内容

这里是一些编译和运行的代码正确地,使用捕获。

val myString = "ACATCGTAGCTGCTAGCTG"

val nucCap = "([ACTG]+)".r

myString match {
   case nucCap(myNuc) => println("dna:"+myNuc)
   case _ => println("not dna")
}

>scala scalaTest.scala 
dna:ACATCGTAGCTGCTAGCTG

这是更简单的代码,没有捕获,无法编译。

val myString = "ACATCGTAGCTGCTAGCTG"

val nuc = "[ACGT]+".r

myString match {
     case nuc => println("dna")
     case _ => println("not dna")
}

>scala scalaTest.scala
scalaTest.scala:7: error: unreachable code

似乎无论是否使用捕获,匹配都应该返回布尔值。 这是怎么回事?

As a follow-up to this question

Here is some code that compiles and runs correctly, using captures.

val myString = "ACATCGTAGCTGCTAGCTG"

val nucCap = "([ACTG]+)".r

myString match {
   case nucCap(myNuc) => println("dna:"+myNuc)
   case _ => println("not dna")
}

>scala scalaTest.scala 
dna:ACATCGTAGCTGCTAGCTG

Here is simpler code, without capture, that does not compile.

val myString = "ACATCGTAGCTGCTAGCTG"

val nuc = "[ACGT]+".r

myString match {
     case nuc => println("dna")
     case _ => println("not dna")
}

>scala scalaTest.scala
scalaTest.scala:7: error: unreachable code

Seems like the matching should return a boolean regardless of whether a capture is used.
What is going on here?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

谜兔 2024-12-19 10:18:22

match 块中,nuc 是一个模式变量,并不引用封闭范围内的 nuc。这使得默认情况无法访问,因为简单模式 nuc 将匹配任何内容。

nuc 上的一对空括号将使语法糖起作用并调用正则表达式上的 unapplySeq 方法:

myString match {
  case nuc() => println("dna")
  case _ => println("not dna")
}

避免此陷阱的一种方法是重命名 nuc< /code> 到 Nuc。以大写字母开头使其成为稳定的标识符,因此它引用封闭范围内的 Nuc,而不是被编译器视为模式变量。

val Nuc = "[ACGT]+".r
myString match {
  case Nuc => println("dna")
  case _ => println("not dna")
}

上面将打印 "not dna",因为这里我们只是将 NucmyString 进行比较,并且它们不相等。这是一个错误,但也许是一个不那么令人困惑的错误!

在这种情况下,添加括号也会达到预期的效果:

myString match {
  case Nuc() => println("dna")
  case _ => println("not dna")
}
// prints "dna"

顺便说一下,返回的不是布尔值,而是 Option[List[String]]:

scala> nuc.unapplySeq(myString)
res17: Option[List[String]] = Some(List())
scala> nucCap.unapplySeq(myString)
res18: Option[List[String]] = Some(List(ACATCGTAGCTGCTAGCTG))

In your match block, nuc is a pattern variable and does not refer to the nuc in the enclosing scope. This makes the default case unreachable because the simple pattern nuc will match anything.

An empty pair of parentheses on nuc will make the syntactic sugar work and call the unapplySeq method on the Regex:

myString match {
  case nuc() => println("dna")
  case _ => println("not dna")
}

One way to avoid this pitfall is to rename nuc to Nuc. Starting with an uppercase letter makes it a stable identifier, so that it refers to the Nuc in the enclosing scope, rather than being treated by the compiler as a pattern variable.

val Nuc = "[ACGT]+".r
myString match {
  case Nuc => println("dna")
  case _ => println("not dna")
}

The above will print "not dna", because here we are simply comparing Nuc to myString, and they are not equal. It's a bug, but maybe a less confusing one!

Adding the parentheses will have the desired effect in this case too:

myString match {
  case Nuc() => println("dna")
  case _ => println("not dna")
}
// prints "dna"

By the way, it is not a boolean that is being returned, but an Option[List[String]]:

scala> nuc.unapplySeq(myString)
res17: Option[List[String]] = Some(List())
scala> nucCap.unapplySeq(myString)
res18: Option[List[String]] = Some(List(ACATCGTAGCTGCTAGCTG))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文