Scala RegexParsers 中的非贪婪匹配

发布于 2024-12-10 12:12:49 字数 846 浏览 6 评论 0原文

假设我正在用 Scala 编写一个基本的 SQL 解析器。我有以下内容：

class Arith extends RegexParsers {
    def selectstatement: Parser[Any] = selectclause ~ fromclause
    def selectclause: Parser[Any] = "(?i)SELECT".r ~ tokens
    def fromclause: Parser[Any] = "(?i)FROM".r ~ tokens
    def tokens: Parser[Any] = rep(token) //how to make this non-greedy?
    def token: Parser[Any] = "(\\s*)\\w+(\\s*)".r
}

当尝试将 selectstatement 与 SELECT foo FROM bar 匹配时，如何防止 selectclause 由于 rep(token) 中的 rep(token) 而吞噬整个短语~ 令牌？

换句话说，如何在 Scala 中指定非贪婪匹配？

为了澄清，我完全意识到我可以在字符串模式本身中使用标准非贪婪语法 (*?) 或 (+?)，但我想知道是否有一种方法可以在 def 令牌内以更高的级别指定它。例如，如果我像这样定义了令牌：

def token: Parser[Any] = stringliteral | numericliteral | columnname

那么如何为 def 令牌内的rep(token)指定非贪婪匹配？

原文

Suppose I'm writing a rudimentary SQL parser in Scala. I have the following:

class Arith extends RegexParsers {
    def selectstatement: Parser[Any] = selectclause ~ fromclause
    def selectclause: Parser[Any] = "(?i)SELECT".r ~ tokens
    def fromclause: Parser[Any] = "(?i)FROM".r ~ tokens
    def tokens: Parser[Any] = rep(token) //how to make this non-greedy?
    def token: Parser[Any] = "(\\s*)\\w+(\\s*)".r
}

When trying to match selectstatement against SELECT foo FROM bar, how do I prevent the selectclause from gobbling up the entire phrase due to the rep(token) in ~ tokens?

In other words, how do I specify non-greedy matching in Scala?

To clarify, I'm fully aware that I can use standard non-greedy syntax (*?) or (+?) within the String pattern itself, but I wondered if there's a way to specify it at a higher level inside def tokens. For example, if I had defined token like this:

def token: Parser[Any] = stringliteral | numericliteral | columnname

Then how can I specify non-greedy matching for the rep(token) inside def tokens?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

终止放荡 2024-12-17 12:12:49

不容易，因为成功的匹配不会重试。例如，考虑一下：

object X extends RegexParsers {
  def p = ("a" | "aa" | "aaa" | "aaaa") ~ "ab"
}

scala> X.parseAll(X.p, "aaaab")
res1: X.ParseResult[X.~[String,String]] = 
[1.2] failure: `ab' expected but `a' found

aaaab
 ^

第一个匹配成功，在解析器中的括号内，因此它继续进行下一个匹配。那个失败了，所以 p 失败了。如果 p 是替代匹配的一部分，则将尝试替代匹配，因此诀窍是生成可以处理此类事情的东西。

假设我们有这个：

def nonGreedy[T](rep: => Parser[T], terminal: => Parser[T]) = Parser { in =>
  def recurse(in: Input, elems: List[T]): ParseResult[List[T] ~ T] =
    terminal(in) match {
      case Success(x, rest) => Success(new ~(elems.reverse, x), rest)
      case _ => 
        rep(in) match {
          case Success(x, rest) => recurse(rest, x :: elems)
          case ns: NoSuccess    => ns
        }
    }

  recurse(in, Nil)
}

然后您可以像这样使用它：

def p = nonGreedy("a", "ab")

顺便说一句，我总是发现查看其他事物的定义方式有助于尝试提出像上面的非贪婪的东西。特别是，看看如何定义 rep1 ，以及如何更改它以避免重新评估其重复参数 - 同样的事情可能对 nonGreedy 有用。

这是一个完整的解决方案，进行了一些更改以避免消耗“终端”。

trait NonGreedy extends Parsers {
    def nonGreedy[T, U](rep: => Parser[T], terminal: => Parser[U]) = Parser { in =>
      def recurse(in: Input, elems: List[T]): ParseResult[List[T]] =
        terminal(in) match {
          case _: Success[_] => Success(elems.reverse, in)
          case _ => 
            rep(in) match {
              case Success(x, rest) => recurse(rest, x :: elems)
              case ns: NoSuccess    => ns
            }
        }

      recurse(in, Nil)
    }  
}

class Arith extends RegexParsers with NonGreedy {
    // Just to avoid recompiling the pattern each time
    val select: Parser[String] = "(?i)SELECT".r
    val from: Parser[String] = "(?i)FROM".r
    val token: Parser[String] = "(\\s*)\\w+(\\s*)".r
    val eof: Parser[String] = """\z""".r

    def selectstatement: Parser[Any] = selectclause(from) ~ fromclause(eof)
    def selectclause(terminal: Parser[Any]): Parser[Any] = 
      select ~ tokens(terminal)
    def fromclause(terminal: Parser[Any]): Parser[Any] = 
      from ~ tokens(terminal)
    def tokens(terminal: Parser[Any]): Parser[Any] = 
      nonGreedy(token, terminal)
}

Not easily, because a successful match is not retried. Consider, for example:

object X extends RegexParsers {
  def p = ("a" | "aa" | "aaa" | "aaaa") ~ "ab"
}

scala> X.parseAll(X.p, "aaaab")
res1: X.ParseResult[X.~[String,String]] = 
[1.2] failure: `ab' expected but `a' found

aaaab
 ^

The first match was successful, in parser inside parenthesis, so it proceeded to the next one. That one failed, so p failed. If p was part of alternative matches, the alternative would be tried, so the trick is to produce something that can handle that sort of thing.

Let's say we have this:

def nonGreedy[T](rep: => Parser[T], terminal: => Parser[T]) = Parser { in =>
  def recurse(in: Input, elems: List[T]): ParseResult[List[T] ~ T] =
    terminal(in) match {
      case Success(x, rest) => Success(new ~(elems.reverse, x), rest)
      case _ => 
        rep(in) match {
          case Success(x, rest) => recurse(rest, x :: elems)
          case ns: NoSuccess    => ns
        }
    }

  recurse(in, Nil)
}

You can then use it like this:

def p = nonGreedy("a", "ab")

By the way,I always found that looking at how other things are defined is helpful in trying to come up with stuff like nonGreedy above. In particular, look at how rep1 is defined, and how it was changed to avoid re-evaluating its repetition parameter -- the same thing would probably be useful on nonGreedy.

Here's a full solution, with a little change to avoid consuming the "terminal".

trait NonGreedy extends Parsers {
    def nonGreedy[T, U](rep: => Parser[T], terminal: => Parser[U]) = Parser { in =>
      def recurse(in: Input, elems: List[T]): ParseResult[List[T]] =
        terminal(in) match {
          case _: Success[_] => Success(elems.reverse, in)
          case _ => 
            rep(in) match {
              case Success(x, rest) => recurse(rest, x :: elems)
              case ns: NoSuccess    => ns
            }
        }

      recurse(in, Nil)
    }  
}

class Arith extends RegexParsers with NonGreedy {
    // Just to avoid recompiling the pattern each time
    val select: Parser[String] = "(?i)SELECT".r
    val from: Parser[String] = "(?i)FROM".r
    val token: Parser[String] = "(\\s*)\\w+(\\s*)".r
    val eof: Parser[String] = """\z""".r

    def selectstatement: Parser[Any] = selectclause(from) ~ fromclause(eof)
    def selectclause(terminal: Parser[Any]): Parser[Any] = 
      select ~ tokens(terminal)
    def fromclause(terminal: Parser[Any]): Parser[Any] = 
      from ~ tokens(terminal)
    def tokens(terminal: Parser[Any]): Parser[Any] = 
      nonGreedy(token, terminal)
}

回复收藏 0 原文

~没有更多了~