如何在 Scala 中使用正则表达式进行模式匹配?

发布于 2024-10-10 17:24:36 字数 328 浏览 2 评论 0原文

我希望能够找到单词的第一个字母与组中的一个字母(例如“ABC”)之间的匹配项。在伪代码中,这可能看起来像:

case Process(word) =>
   word.firstLetter match {
      case([a-c][A-C]) =>
      case _ =>
   }
}

但是如何在 Scala 而不是 Java 中获取第一个字母?如何正确表达正则表达式?是否可以在 案例类 中执行此操作?

I would like to be able to find a match between the first letter of a word, and one of the letters in a group such as "ABC". In pseudocode, this might look something like:

case Process(word) =>
   word.firstLetter match {
      case([a-c][A-C]) =>
      case _ =>
   }
}

But how do I grab the first letter in Scala instead of Java? How do I express the regular expression properly? Is it possible to do this within a case class?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

极致的悲 2024-10-17 17:24:37

稍微扩展一下 Andrew 的答案:表达式定义提取器可用于使用 Scala 的模式匹配很好地分​​解正则表达式匹配的子字符串,例如:

val Process = """([a-cA-C])([^\s]+)""".r // define first, rest is non-space
for (p <- Process findAllIn "aha bah Cah dah") p match {
  case Process("b", _) => println("first: 'a', some rest")
  case Process(_, rest) => println("some first, rest: " + rest)
  // etc.
}

To expand a little on Andrew's answer: The fact that regular expressions define extractors can be used to decompose the substrings matched by the regex very nicely using Scala's pattern matching, e.g.:

val Process = """([a-cA-C])([^\s]+)""".r // define first, rest is non-space
for (p <- Process findAllIn "aha bah Cah dah") p match {
  case Process("b", _) => println("first: 'a', some rest")
  case Process(_, rest) => println("some first, rest: " + rest)
  // etc.
}
献世佛 2024-10-17 17:24:37

请注意,@AndrewMyers 的答案中的方法将 整个 字符串与正则表达式匹配,其效果是使用 ^将正则表达式锚定在字符串的两端>$。示例:

scala> val MY_RE = "(foo|bar).*".r
MY_RE: scala.util.matching.Regex = (foo|bar).*

scala> val result = "foo123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = foo

scala> val result = "baz123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = No match

scala> val result = "abcfoo123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = No match

末尾没有 .*

scala> val MY_RE2 = "(foo|bar)".r
MY_RE2: scala.util.matching.Regex = (foo|bar)

scala> val result = "foo123" match { case MY_RE2(m) => m; case _ => "No match" }
result: String = No match

Note that the approach from @AndrewMyers's answer matches the entire string to the regular expression, with the effect of anchoring the regular expression at both ends of the string using ^ and $. Example:

scala> val MY_RE = "(foo|bar).*".r
MY_RE: scala.util.matching.Regex = (foo|bar).*

scala> val result = "foo123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = foo

scala> val result = "baz123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = No match

scala> val result = "abcfoo123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = No match

And with no .* at the end:

scala> val MY_RE2 = "(foo|bar)".r
MY_RE2: scala.util.matching.Regex = (foo|bar)

scala> val result = "foo123" match { case MY_RE2(m) => m; case _ => "No match" }
result: String = No match
哭泣的笑容 2024-10-17 17:24:37

String.matches 是在正则表达式意义上进行模式匹配的方法。

但顺便说一句,真正的 Scala 代码中的 word.firstLetter 看起来像:

word(0)

Scala 将字符串视为 Char 的序列,因此如果出于某种原因你想显式获取字符串的第一个字符并匹配它,你可以使用类似的东西这:

"Cat"(0).toString.matches("[a-cA-C]")
res10: Boolean = true

我并不建议将此作为进行正则表达式模式匹配的一般方法,但它符合您建议的方法,即首先找到字符串的第一个字符,然后将其与正则表达式进行匹配。

编辑:
需要明确的是,正如其他人所说,我这样做的方式是:

"Cat".matches("^[a-cA-C].*")
res14: Boolean = true

只是想展示一个尽可能接近您的初始伪代码的示例。干杯!

String.matches is the way to do pattern matching in the regex sense.

But as a handy aside, word.firstLetter in real Scala code looks like:

word(0)

Scala treats Strings as a sequence of Char's, so if for some reason you wanted to explicitly get the first character of the String and match it, you could use something like this:

"Cat"(0).toString.matches("[a-cA-C]")
res10: Boolean = true

I'm not proposing this as the general way to do regex pattern matching, but it's in line with your proposed approach to first find the first character of a String and then match it against a regex.

EDIT:
To be clear, the way I would do this is, as others have said:

"Cat".matches("^[a-cA-C].*")
res14: Boolean = true

Just wanted to show an example as close as possible to your initial pseudocode. Cheers!

御守 2024-10-17 17:24:37

首先我们应该知道正则表达式是可以单独使用的。这是一个例子:

import scala.util.matching.Regex
val pattern = "Scala".r // <=> val pattern = new Regex("Scala")
val str = "Scala is very cool"
val result = pattern findFirstIn str
result match {
  case Some(v) => println(v)
  case _ =>
} // output: Scala

其次,我们应该注意到,将正则表达式与模式匹配相结合将非常强大。这是一个简单的例子。

val date = """(\d\d\d\d)-(\d\d)-(\d\d)""".r
"2014-11-20" match {
  case date(year, month, day) => "hello"
} // output: hello

其实正则表达式本身就已经很强大了;我们唯一需要做的就是通过Scala 让它变得更加强大。以下是 Scala 文档中的更多示例: http://www.scala-lang.org/files/archive/api/current/index.html#scala.util.matching.Regex

First we should know that regular expression can separately be used. Here is an example:

import scala.util.matching.Regex
val pattern = "Scala".r // <=> val pattern = new Regex("Scala")
val str = "Scala is very cool"
val result = pattern findFirstIn str
result match {
  case Some(v) => println(v)
  case _ =>
} // output: Scala

Second we should notice that combining regular expression with pattern matching would be very powerful. Here is a simple example.

val date = """(\d\d\d\d)-(\d\d)-(\d\d)""".r
"2014-11-20" match {
  case date(year, month, day) => "hello"
} // output: hello

In fact, regular expression itself is already very powerful; the only thing we need to do is to make it more powerful by Scala. Here are more examples in Scala Document: http://www.scala-lang.org/files/archive/api/current/index.html#scala.util.matching.Regex

温暖的光 2024-10-17 17:24:36

您可以这样做,因为正则表达式定义了提取器,但您需要首先定义正则表达式模式。我无法访问 Scala REPL 来测试它,但类似的东西应该可以工作。

val Pattern = "([a-cA-C])".r
word.firstLetter match {
   case Pattern(c) => c bound to capture group here
   case _ =>
}

无捕获组

仅检查 Regex< /a> 匹配,忽略任何组,
使用序列通配符:

val date = "[0-9]{4}-[0-9]{2}-[0-9]{2}".r
"2004-01-20" match {
    case date(_*) => "It's a date!"
}

子字符串匹配

在模式匹配中,Regex 通常匹配整个输入。
但是,未锚定的 Regex 会在任何地方找到该模式
在输入中:

val date = "([0-9]{4}-[0-9]{2}-[0-9]{2})".r.unanchored
"The date is 2004-01-20 today" match {
    case date(d) => s"Found a date $d!"
}

You can do this because regular expressions define extractors but you need to define the regex pattern first. I don't have access to a Scala REPL to test this but something like this should work.

val Pattern = "([a-cA-C])".r
word.firstLetter match {
   case Pattern(c) => c bound to capture group here
   case _ =>
}

No capturing groups

To check only whether the Regex matches, ignoring any groups,
use a sequence wildcard:

val date = "[0-9]{4}-[0-9]{2}-[0-9]{2}".r
"2004-01-20" match {
    case date(_*) => "It's a date!"
}

Sub-string matching

In a pattern match, Regex normally matches the entire input.
However, an unanchored Regex finds the pattern anywhere
in the input:

val date = "([0-9]{4}-[0-9]{2}-[0-9]{2})".r.unanchored
"The date is 2004-01-20 today" match {
    case date(d) => s"Found a date $d!"
}
七色彩虹 2024-10-17 17:24:36

从版本 2.10 开始,可以使用 Scala 的字符串插值功能:

implicit class RegexOps(sc: StringContext) {
  def r = new util.matching.Regex(sc.parts.mkString, sc.parts.tail.map(_ => "x"): _*)
}

scala> "123" match { case r"\d+" => true case _ => false }
res34: Boolean = true

更好的是可以绑定正则表达式组:

scala> "123" match { case r"(\d+)$d" => d.toInt case _ => 0 }
res36: Int = 123

scala> "10+15" match { case r"(\d\d)${first}\+(\d\d)${second}" => first.toInt+second.toInt case _ => 0 }
res38: Int = 25

还可以设置更详细的绑定机制:

scala> object Doubler { def unapply(s: String) = Some(s.toInt*2) }
defined module Doubler

scala> "10" match { case r"(\d\d)${Doubler(d)}" => d case _ => 0 }
res40: Int = 20

scala> object isPositive { def unapply(s: String) = s.toInt >= 0 }
defined module isPositive

scala> "10" match { case r"(\d\d)${d @ isPositive()}" => d.toInt case _ => 0 }
res56: Int = 10

一个关于 Dynamic 的可能性的令人印象深刻的示例显示在博客文章动态类型简介< /em>:

object T {

  class RegexpExtractor(params: List[String]) {
    def unapplySeq(str: String) =
      params.headOption flatMap (_.r unapplySeq str)
  }

  class StartsWithExtractor(params: List[String]) {
    def unapply(str: String) =
      params.headOption filter (str startsWith _) map (_ => str)
  }

  class MapExtractor(keys: List[String]) {
    def unapplySeq[T](map: Map[String, T]) =
      Some(keys.map(map get _))
  }

  import scala.language.dynamics

  class ExtractorParams(params: List[String]) extends Dynamic {
    val Map = new MapExtractor(params)
    val StartsWith = new StartsWithExtractor(params)
    val Regexp = new RegexpExtractor(params)

    def selectDynamic(name: String) =
      new ExtractorParams(params :+ name)
  }

  object p extends ExtractorParams(Nil)

  Map("firstName" -> "John", "lastName" -> "Doe") match {
    case p.firstName.lastName.Map(
          Some(p.Jo.StartsWith(fn)),
          Some(p.`.*(\\w)

从版本 2.10 开始,可以使用 Scala 的字符串插值功能:

implicit class RegexOps(sc: StringContext) {
  def r = new util.matching.Regex(sc.parts.mkString, sc.parts.tail.map(_ => "x"): _*)
}

scala> "123" match { case r"\d+" => true case _ => false }
res34: Boolean = true

更好的是可以绑定正则表达式组:

scala> "123" match { case r"(\d+)$d" => d.toInt case _ => 0 }
res36: Int = 123

scala> "10+15" match { case r"(\d\d)${first}\+(\d\d)${second}" => first.toInt+second.toInt case _ => 0 }
res38: Int = 25

还可以设置更详细的绑定机制:

scala> object Doubler { def unapply(s: String) = Some(s.toInt*2) }
defined module Doubler

scala> "10" match { case r"(\d\d)${Doubler(d)}" => d case _ => 0 }
res40: Int = 20

scala> object isPositive { def unapply(s: String) = s.toInt >= 0 }
defined module isPositive

scala> "10" match { case r"(\d\d)${d @ isPositive()}" => d.toInt case _ => 0 }
res56: Int = 10

一个关于 Dynamic 的可能性的令人印象深刻的示例显示在博客文章动态类型简介< /em>:

.Regexp(lastChar))) => println(s"Match! $fn ...$lastChar") case _ => println("nope") } }

Since version 2.10, one can use Scala's string interpolation feature:

implicit class RegexOps(sc: StringContext) {
  def r = new util.matching.Regex(sc.parts.mkString, sc.parts.tail.map(_ => "x"): _*)
}

scala> "123" match { case r"\d+" => true case _ => false }
res34: Boolean = true

Even better one can bind regular expression groups:

scala> "123" match { case r"(\d+)$d" => d.toInt case _ => 0 }
res36: Int = 123

scala> "10+15" match { case r"(\d\d)${first}\+(\d\d)${second}" => first.toInt+second.toInt case _ => 0 }
res38: Int = 25

It is also possible to set more detailed binding mechanisms:

scala> object Doubler { def unapply(s: String) = Some(s.toInt*2) }
defined module Doubler

scala> "10" match { case r"(\d\d)${Doubler(d)}" => d case _ => 0 }
res40: Int = 20

scala> object isPositive { def unapply(s: String) = s.toInt >= 0 }
defined module isPositive

scala> "10" match { case r"(\d\d)${d @ isPositive()}" => d.toInt case _ => 0 }
res56: Int = 10

An impressive example on what's possible with Dynamic is shown in the blog post Introduction to Type Dynamic:

object T {

  class RegexpExtractor(params: List[String]) {
    def unapplySeq(str: String) =
      params.headOption flatMap (_.r unapplySeq str)
  }

  class StartsWithExtractor(params: List[String]) {
    def unapply(str: String) =
      params.headOption filter (str startsWith _) map (_ => str)
  }

  class MapExtractor(keys: List[String]) {
    def unapplySeq[T](map: Map[String, T]) =
      Some(keys.map(map get _))
  }

  import scala.language.dynamics

  class ExtractorParams(params: List[String]) extends Dynamic {
    val Map = new MapExtractor(params)
    val StartsWith = new StartsWithExtractor(params)
    val Regexp = new RegexpExtractor(params)

    def selectDynamic(name: String) =
      new ExtractorParams(params :+ name)
  }

  object p extends ExtractorParams(Nil)

  Map("firstName" -> "John", "lastName" -> "Doe") match {
    case p.firstName.lastName.Map(
          Some(p.Jo.StartsWith(fn)),
          Some(p.`.*(\\w)

Since version 2.10, one can use Scala's string interpolation feature:

implicit class RegexOps(sc: StringContext) {
  def r = new util.matching.Regex(sc.parts.mkString, sc.parts.tail.map(_ => "x"): _*)
}

scala> "123" match { case r"\d+" => true case _ => false }
res34: Boolean = true

Even better one can bind regular expression groups:

scala> "123" match { case r"(\d+)$d" => d.toInt case _ => 0 }
res36: Int = 123

scala> "10+15" match { case r"(\d\d)${first}\+(\d\d)${second}" => first.toInt+second.toInt case _ => 0 }
res38: Int = 25

It is also possible to set more detailed binding mechanisms:

scala> object Doubler { def unapply(s: String) = Some(s.toInt*2) }
defined module Doubler

scala> "10" match { case r"(\d\d)${Doubler(d)}" => d case _ => 0 }
res40: Int = 20

scala> object isPositive { def unapply(s: String) = s.toInt >= 0 }
defined module isPositive

scala> "10" match { case r"(\d\d)${d @ isPositive()}" => d.toInt case _ => 0 }
res56: Int = 10

An impressive example on what's possible with Dynamic is shown in the blog post Introduction to Type Dynamic:

.Regexp(lastChar))) => println(s"Match! $fn ...$lastChar") case _ => println("nope") } }
旧故 2024-10-17 17:24:36

正如 delnan 指出的,Scala 中的 match 关键字与正则表达式无关。要查明字符串是否与正则表达式匹配,可以使用 String.matches 方法。要查明字符串是否以小写或大写的 a、b 或 c 开头,正则表达式将如下所示:

word.matches("[a-cA-C].*")

您可以将此正则表达式读作“后面跟随的字符 a、b、c、A、B 或 C 之一”由任何内容”(. 表示“任何字符”,* 表示“零次或多次”,因此“.*”是任何字符串)。

As delnan pointed out, the match keyword in Scala has nothing to do with regexes. To find out whether a string matches a regex, you can use the String.matches method. To find out whether a string starts with an a, b or c in lower or upper case, the regex would look like this:

word.matches("[a-cA-C].*")

You can read this regex as "one of the characters a, b, c, A, B or C followed by anything" (. means "any character" and * means "zero or more times", so ".*" is any string).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文