访问 Scala 解析器正则表达式匹配数据

发布于 2024-08-12 08:28:36 字数 299 浏览 10 评论 0原文

我想知道是否可以从下面的语法中的匹配正则表达式生成MatchData。

object DateParser extends JavaTokenParsers {

    ....

    val dateLiteral = """(\d{4}[-/])?(\d\d[-/])?(\d\d)""".r ^^ {
        ... get MatchData
    }
}

当然，一种选择是在块内再次执行匹配，但由于 RegexParser 已经执行了匹配，我希望它将 MatchData 传递到块或存储它？

原文

I wondering if it's possible to get the MatchData generated from the matching regular expression in the grammar below.

object DateParser extends JavaTokenParsers {

    ....

    val dateLiteral = """(\d{4}[-/])?(\d\d[-/])?(\d\d)""".r ^^ {
        ... get MatchData
    }
}

One option of course is to perform the match again inside the block, but since the RegexParser has already performed the match I'm hoping that it passes the MatchData to the block, or stores it?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

眼趣 2024-08-19 08:28:36

以下是将您的 Regex 转换为 Parser 的隐式定义：

  /** A parser that matches a regex string */
  implicit def regex(r: Regex): Parser[String] = new Parser[String] {
    def apply(in: Input) = {
      val source = in.source
      val offset = in.offset
      val start = handleWhiteSpace(source, offset)
      (r findPrefixMatchOf (source.subSequence(start, source.length))) match {
        case Some(matched) =>
          Success(source.subSequence(start, start + matched.end).toString, 
                  in.drop(start + matched.end - offset))
        case None =>
          Failure("string matching regex `"+r+"' expected but `"+in.first+"' found", in.drop(start - offset))
      }
    }
  }

只需调整它即可：

object X extends RegexParsers {
  /** A parser that matches a regex string and returns the Match */
  def regexMatch(r: Regex): Parser[Regex.Match] = new Parser[Regex.Match] {
    def apply(in: Input) = {
      val source = in.source
      val offset = in.offset
      val start = handleWhiteSpace(source, offset)
      (r findPrefixMatchOf (source.subSequence(start, source.length))) match {
        case Some(matched) =>
          Success(matched,
                  in.drop(start + matched.end - offset))
        case None =>
          Failure("string matching regex `"+r+"' expected but `"+in.first+"' found", in.drop(start - offset))
      }
    }
  }
  val t = regexMatch("""(\d\d)/(\d\d)/(\d\d\d\d)""".r) ^^ { case m => (m.group(1), m.group(2), m.group(3)) }
}

示例：

scala> X.parseAll(X.t, "23/03/1971")
res8: X.ParseResult[(String, String, String)] = [1.11] parsed: (23,03,1971)

Here is the implicit definition that converts your Regex into a Parser:

  /** A parser that matches a regex string */
  implicit def regex(r: Regex): Parser[String] = new Parser[String] {
    def apply(in: Input) = {
      val source = in.source
      val offset = in.offset
      val start = handleWhiteSpace(source, offset)
      (r findPrefixMatchOf (source.subSequence(start, source.length))) match {
        case Some(matched) =>
          Success(source.subSequence(start, start + matched.end).toString, 
                  in.drop(start + matched.end - offset))
        case None =>
          Failure("string matching regex `"+r+"' expected but `"+in.first+"' found", in.drop(start - offset))
      }
    }
  }

Just adapt it:

object X extends RegexParsers {
  /** A parser that matches a regex string and returns the Match */
  def regexMatch(r: Regex): Parser[Regex.Match] = new Parser[Regex.Match] {
    def apply(in: Input) = {
      val source = in.source
      val offset = in.offset
      val start = handleWhiteSpace(source, offset)
      (r findPrefixMatchOf (source.subSequence(start, source.length))) match {
        case Some(matched) =>
          Success(matched,
                  in.drop(start + matched.end - offset))
        case None =>
          Failure("string matching regex `"+r+"' expected but `"+in.first+"' found", in.drop(start - offset))
      }
    }
  }
  val t = regexMatch("""(\d\d)/(\d\d)/(\d\d\d\d)""".r) ^^ { case m => (m.group(1), m.group(2), m.group(3)) }
}

Example:

scala> X.parseAll(X.t, "23/03/1971")
res8: X.ParseResult[(String, String, String)] = [1.11] parsed: (23,03,1971)

回复收藏 0 原文

亣腦蒛氧 2024-08-19 08:28:36

不，你不能这样做。如果您查看将正则表达式转换为解析器时使用的解析器的定义，它会丢弃所有上下文并仅返回完全匹配的字符串：

http://lampsvn.epfl.ch/trac/ scala/browser/scala/tags/R_2_7_7_final/src/library/scala/util/parsing/combinator/RegexParsers.scala?view=markup#L55

不过，您还有其他几个选择：

将解析器分解为几个较小的解析器（用于您实际想要提取的标记）
定义一个自定义解析器，该解析器提取您想要的值并返回域对象而不是字符串

第一个看起来像

val separator = "-" | "/"
  val year = ("""\d{4}"""r) <~ separator
  val month = ("""\d\d"""r) <~ separator
  val day = """\d\d"""r

  val date = ((year?) ~ (month?) ~ day) map {
    case year ~ month ~ day =>
      (year.getOrElse("2009"), month.getOrElse("11"), day)
  }

<~ 意味着“将这两个标记放在一起，但只给出第一个标记的结果

~ 意味着“将这两个标记放在一起，并将它们绑定在一个可模式匹配的 ~ 对象中。

? 表示解析器是可选的并将返回一个选项。

.getOrElse 位在解析器未定义值时提供默认值。

No, you can't do this. If you look at the definition of the Parser used when you convert a regex to a Parser, it throws away all context and just returns the full matched string:

http://lampsvn.epfl.ch/trac/scala/browser/scala/tags/R_2_7_7_final/src/library/scala/util/parsing/combinator/RegexParsers.scala?view=markup#L55

You have a couple of other options, though:

break up your parser into several smaller parsers (for the tokens you actually want to extract)
define a custom parser that extracts the values you want and returns a domain object instead of a string

The first would look like

val separator = "-" | "/"
  val year = ("""\d{4}"""r) <~ separator
  val month = ("""\d\d"""r) <~ separator
  val day = """\d\d"""r

  val date = ((year?) ~ (month?) ~ day) map {
    case year ~ month ~ day =>
      (year.getOrElse("2009"), month.getOrElse("11"), day)
  }

The <~ means "require these two tokens together, but only give me the result of the first one.

The ~ means "require these two tokens together and tie them together in a pattern-matchable ~ object.

The ? means that the parser is optional and will return an Option.

The .getOrElse bit provides a default value for when the parser didn't define a value.

回复收藏 0 原文

人事已非 2024-08-19 08:28:36

当在 RegexParsers 实例中使用 Regex 时，RegexParsers 中的隐式 def regex(Regex): Parser[String] 用于将该 Regex 应用于输入。在当前输入成功应用 RE 时生成的 Match 实例用于在 regex() 方法中构造 Success，但仅使用其“结束”值，因此在该方法时任何捕获的子匹配都会被丢弃返回。

就目前情况而言（在我查看的 2.7 源代码中），我相信你运气不好。

回复收藏 0 原文

時窥 2024-08-19 08:28:36

我使用 scala 2.8.1 遇到了类似的问题，并尝试使用 RegexParsers 类解析“name:value”形式的输入：

package scalucene.query

import scala.util.matching.Regex
import scala.util.parsing.combinator._

object QueryParser extends RegexParsers {
  override def skipWhitespace = false

  private def quoted = regex(new Regex("\"[^\"]+"))
  private def colon = regex(new Regex(":"))
  private def word = regex(new Regex("\\w+"))
  private def fielded = (regex(new Regex("[^:]+")) <~ colon) ~ word
  private def term = (fielded | word | quoted)

  def parseItem(str: String) = parse(term, str)
}

似乎您可以在解析后获取匹配的组，如下所示：

QueryParser.parseItem("nameExample:valueExample") match {
  case QueryParser.Success(result:scala.util.parsing.combinator.Parsers$tilde, _) => {
      println("Name: " + result.productElement(0) + " value: " + result.productElement(1))
  }
}

I ran into a similar issue using scala 2.8.1 and trying to parse input of the form "name:value" using the RegexParsers class:

package scalucene.query

import scala.util.matching.Regex
import scala.util.parsing.combinator._

object QueryParser extends RegexParsers {
  override def skipWhitespace = false

  private def quoted = regex(new Regex("\"[^\"]+"))
  private def colon = regex(new Regex(":"))
  private def word = regex(new Regex("\\w+"))
  private def fielded = (regex(new Regex("[^:]+")) <~ colon) ~ word
  private def term = (fielded | word | quoted)

  def parseItem(str: String) = parse(term, str)
}

It seems that you can grab the matched groups after parsing like this:

QueryParser.parseItem("nameExample:valueExample") match {
  case QueryParser.Success(result:scala.util.parsing.combinator.Parsers$tilde, _) => {
      println("Name: " + result.productElement(0) + " value: " + result.productElement(1))
  }
}

回复收藏 0 原文

~没有更多了~