将组合器解析器的列表/序列转换为单个解析器

发布于 2024-12-08 20:23:28 字数 669 浏览 1 评论 0原文

我有一个值列表，可以从中构造一个解析器列表，这些解析器通过映射依赖于这些值（请参见示例）。然后我想做的就是通过串联将解析器列表变成单个解析器。

一种可能是使用 foldLeft 和 ~：

parsers.foldLeft(success(Nil)){case (ps,p) => rs ~ p ^^ {case xs ~ x => x ::xs}} ^^ (_.reverse)

这有效吗？

我不知道组合器解析器是如何工作的；是否会有一个深度为列表长度的调用堆栈？因此，对于很长的连接，我可能会遇到 SO 错误吗？

更好的方法

有没有一种更具可读性的不同方法？

示例

假设您有一个包含两行的文件。第一行包含n个整数x_1到x_n。第二行包含 x_1 + x_2 + ... x_n 属于根据第一行的组的整数。我想从第一行获取整数序列并创建 n 个解析器 p_1 到 p_n，其中 p_i 解析 x_i 整数。

假设我有第一行的整数列表 l = List(1,2,3) 。对于每个整数n，我创建一个解析器来解析n整数：parsers = l.map(repN(_,integer))。

原文

I have a list of values from which I can construct a list of parsers, that depend on these values by mapping (see example). Then what I want to do is turn the list of parsers into a single parser by concatenation.

One possibility is using foldLeft and ~:

parsers.foldLeft(success(Nil)){case (ps,p) => rs ~ p ^^ {case xs ~ x => x ::xs}} ^^ (_.reverse)

Is this efficient?

I don't know how combinator parsers work; will there be a call stack with depth of length of the list? Thus may I run into SO errors for very long concatenations?

Better way

Is there a different way that is more readable?

Example

Suppose you have a file with two lines. The first line contains n integers x_1 to x_n. The second line contains contains x_1 + x_2 + ... x_n integers that belong to groups according to the first line. I want to take the sequence of integers from the first line and create n parsers p_1 to p_n where p_i parses x_i integers.

Suppose I have the list of integers l = List(1,2,3) from the first line. For each integer n I create a parser that parses n integers: parsers = l.map(repN(_,integer)).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

冷默言语 2024-12-15 20:23:28

您所描述的内容（以及您在实现中使用 foldLeft 和 ~ 或多或少重新发明的内容）本质上是 Haskell 的 sequence 用于 monad（实际上你只需要一个应用函子，但这与这里无关）。 sequence 接受一元值列表并返回一元值列表。 Parser 是一个 monad，因此 Parser 的 sequence 会将 List[Parser[A]] 更改为 <代码>解析器[列表[A]]。

Scalaz 为您提供序列，但我不知道我不知道是否有一种好方法来获取 Parser 所需的 Applicative 实例。幸运的是，你可以很容易地推出自己的（我直接翻译 Haskell 定义)：

import scala.util.parsing.combinator._

object parser extends RegexParsers {
  val integer = """\d+""".r

  val counts = List(1, 2, 3)
  val parsers = counts.map(repN(_, integer))

  val line = parsers.foldRight(success(Nil: List[List[String]])) {
    (m, n) => for { x <- m ; xs <- n } yield (x :: xs)
  }

  def apply(s: String) = parseAll(line, s)
}

这为我们提供了 List(List(1), List(2, 3), List(4, 5, 6)) parser("1 2 3 4 5 6")，根据需要。

（请注意，我在这里使用 RegexParsers 作为一个方便的完整示例，但该方法的工作更普遍。）

如果我们对 for 进行脱糖处理，那么发生的事情可能会更清楚一些理解：

val line = parsers.foldRight(success(Nil: List[List[String]])) {
  (current, acc) => current.flatMap(x => acc.map(x :: _))
}

我们可以将 flatMap 写为 into 并将 map 写为 ^^：

val line = parsers.foldRight(success(Nil: List[List[String]])) {
  (current, acc) => current into (x => acc ^^ (x :: _))
}

这与您的想法并不太远公式，除了我们使用的是向右折叠而不是反转，并且不会构建和分解 ~。

关于效率：我们的两种实现都会导致令人不愉快的调用堆栈。根据我的经验，这只是 Scala 解析器组合器的一个事实。引用另一个Stack Overflow答案，例子：

Scala 的解析器组合器效率不高。他们不是
设计为。它们适合完成相对较小的任务
小投入。

我的 sequence-y 方法解决了您的问题的“更具可读性”部分，并且几乎肯定是使用 Scala 解析器组合器解决问题的最简洁方法。它比您的实施效率稍高，并且对于几千个左右的组来说应该没问题。如果您需要处理更多内容，则必须在 scala.util.parsing.combinator 之外寻找。我建议如下：

def parse(counts: Seq[Int], input: String): Option[Seq[Seq[Int]]] = {
  val parsed = try {
    Some(input.split(" ").map(_.toInt))
  } catch {
    case _ : java.lang.NumberFormatException => None
  }

  parsed.flatMap { ints =>
    if (ints.length != counts.sum) None
    else Some(counts.foldLeft((Seq.empty[Seq[Int]], ints)) {
      case ((collected, remaining), count) => {
        val (m, n) = remaining.splitAt(count)
        (m.toSeq +: collected, n)
      }
    }._1.reverse)
  }
}

没有保证，但在我的系统上，它不会在具有 100k 整数组的行上溢出。

What you're describing (and what you've more or less reinvented in your implementation with foldLeft and ~) is essentially Haskell's sequence for monads (really you only need an applicative functor, but that's irrelevant here). sequence takes a list of monadic values and returns a monadic list of values. Parser is a monad, so sequence for Parser would change a List[Parser[A]] into a Parser[List[A]].

Scalaz gives you sequence, but off the top of my head I don't know if there's a nice way to get the necessary Applicative instance for Parser. Fortunately you can roll your own pretty easily (I'm directly translating the Haskell definition):

import scala.util.parsing.combinator._

object parser extends RegexParsers {
  val integer = """\d+""".r

  val counts = List(1, 2, 3)
  val parsers = counts.map(repN(_, integer))

  val line = parsers.foldRight(success(Nil: List[List[String]])) {
    (m, n) => for { x <- m ; xs <- n } yield (x :: xs)
  }

  def apply(s: String) = parseAll(line, s)
}

This gives us List(List(1), List(2, 3), List(4, 5, 6)) for parser("1 2 3 4 5 6"), as desired.

(Note that I'm using RegexParsers here as a convenient complete example, but the approach works more generally.)

What's going on might be a little clearer if we desugar the for comprehension:

val line = parsers.foldRight(success(Nil: List[List[String]])) {
  (current, acc) => current.flatMap(x => acc.map(x :: _))
}

We can write flatMap as into and map as ^^:

val line = parsers.foldRight(success(Nil: List[List[String]])) {
  (current, acc) => current into (x => acc ^^ (x :: _))
}

This isn't too far from your formulation, except that we're using a right fold instead of reversing and aren't building up and breaking down the ~s.

About efficiency: Both of our implementations are going to result in unpleasant call stacks. In my experience this is just a fact of life with Scala's parser combinators. To quote another Stack Overflow answer, for example:

Scala's parser combinators aren't very efficient. They weren't
designed to be. They're good for doing small tasks with relatively
small inputs.

My sequence-y approach addresses the "more readable" part of your question, and is almost certainly the cleanest way to solve the problem with Scala's parser combinators. It's marginally more efficient than your implementation, and should be fine for a few thousand groups or so. If you need to handle more than that, you'll have to look outside of scala.util.parsing.combinator. I'd recommend something like the following:

def parse(counts: Seq[Int], input: String): Option[Seq[Seq[Int]]] = {
  val parsed = try {
    Some(input.split(" ").map(_.toInt))
  } catch {
    case _ : java.lang.NumberFormatException => None
  }

  parsed.flatMap { ints =>
    if (ints.length != counts.sum) None
    else Some(counts.foldLeft((Seq.empty[Seq[Int]], ints)) {
      case ((collected, remaining), count) => {
        val (m, n) = remaining.splitAt(count)
        (m.toSeq +: collected, n)
      }
    }._1.reverse)
  }
}

No guarantees, but on my system it doesn't overflow on a line with 100k integer groups.

回复收藏 0 原文

雨巷深深 2024-12-15 20:23:28

您是否考虑过使用 RegexParsers（在 scala.util.parsing.combinator 中）？然后你可以使用正则表达式作为解析器，它的计算速度非常快并且易于编写。

例如，如果您使用解析器组合器来解析 AST 进行简单算术运算，则可以使用正则表达式来解释引用对象的标记，以便可以解析诸如 appleList.size + 4 之类的表达式。

这是一个相当简单的示例，但它展示了如何通过解析器组合器组合正则表达式。

object MyParser extends RegexParsers {
  val regex1 = """[abc]*""".r
  val regex2 = """[def]*""".r
  val parse = regex1 ~ regex2

  def apply(s: String) = parseAll(parse, s)
}

Have you considered using a RegexParsers (in scala.util.parsing.combinator)? Then you can use regular expressions as parsers, which will compute very fast and be easy to write.

For example, if you are using parser combinators to parse an AST for simple arithmatic, you might use regular expressions to interpret tokens that refer to objects so you can parse expressions like appleList.size + 4.

Here is a rather trivial example, but it shows how regular expressions can be combined by parser combinators.

object MyParser extends RegexParsers {
  val regex1 = """[abc]*""".r
  val regex2 = """[def]*""".r
  val parse = regex1 ~ regex2

  def apply(s: String) = parseAll(parse, s)
}

回复收藏 0 原文

~没有更多了~