将组合器解析器的列表/序列转换为单个解析器
我有一个值列表,可以从中构造一个解析器列表,这些解析器通过映射依赖于这些值(请参见示例)。然后我想做的就是通过串联将解析器列表变成单个解析器。
一种可能是使用 foldLeft
和 ~
:
parsers.foldLeft(success(Nil)){case (ps,p) => rs ~ p ^^ {case xs ~ x => x ::xs}} ^^ (_.reverse)
这有效吗?
我不知道组合器解析器是如何工作的;是否会有一个深度为列表长度的调用堆栈?因此,对于很长的连接,我可能会遇到 SO 错误吗?
更好的方法
有没有一种更具可读性的不同方法?
示例
假设您有一个包含两行的文件。第一行包含n个整数x_1到x_n。第二行包含 x_1 + x_2 + ... x_n 属于根据第一行的组的整数。我想从第一行获取整数序列并创建 n 个解析器 p_1 到 p_n,其中 p_i 解析 x_i 整数。
假设我有第一行的整数列表 l = List(1,2,3)
。对于每个整数n
,我创建一个解析器来解析n
整数:parsers = l.map(repN(_,integer))
。
I have a list of values from which I can construct a list of parsers, that depend on these values by mapping (see example). Then what I want to do is turn the list of parsers into a single parser by concatenation.
One possibility is using foldLeft
and ~
:
parsers.foldLeft(success(Nil)){case (ps,p) => rs ~ p ^^ {case xs ~ x => x ::xs}} ^^ (_.reverse)
Is this efficient?
I don't know how combinator parsers work; will there be a call stack with depth of length of the list? Thus may I run into SO errors for very long concatenations?
Better way
Is there a different way that is more readable?
Example
Suppose you have a file with two lines. The first line contains n integers x_1 to x_n. The second line contains contains x_1 + x_2 + ... x_n integers that belong to groups according to the first line. I want to take the sequence of integers from the first line and create n parsers p_1 to p_n where p_i parses x_i integers.
Suppose I have the list of integers l = List(1,2,3)
from the first line. For each integer n
I create a parser that parses n
integers: parsers = l.map(repN(_,integer))
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您所描述的内容(以及您在实现中使用
foldLeft
和~
或多或少重新发明的内容)本质上是 Haskell 的sequence
用于 monad(实际上你只需要一个应用函子,但这与这里无关)。sequence
接受一元值列表并返回一元值列表。Parser
是一个 monad,因此Parser
的sequence
会将List[Parser[A]]
更改为 <代码>解析器[列表[A]]。Scalaz 为您提供
序列
,但我不知道我不知道是否有一种好方法来获取Parser
所需的Applicative
实例。幸运的是,你可以很容易地推出自己的(我直接翻译 Haskell 定义):这为我们提供了
List(List(1), List(2, 3), List(4, 5, 6))
parser("1 2 3 4 5 6")
,根据需要。(请注意,我在这里使用
RegexParsers
作为一个方便的完整示例,但该方法的工作更普遍。)如果我们对
for
进行脱糖处理,那么发生的事情可能会更清楚一些理解:我们可以将
flatMap
写为into
并将map
写为^^
:这与您的想法并不太远公式,除了我们使用的是向右折叠而不是反转,并且不会构建和分解
~
。关于效率:我们的两种实现都会导致令人不愉快的调用堆栈。根据我的经验,这只是 Scala 解析器组合器的一个事实。引用另一个Stack Overflow答案,例子:
我的
sequence
-y 方法解决了您的问题的“更具可读性”部分,并且几乎肯定是使用 Scala 解析器组合器解决问题的最简洁方法。它比您的实施效率稍高,并且对于几千个左右的组来说应该没问题。如果您需要处理更多内容,则必须在 scala.util.parsing.combinator 之外寻找。我建议如下:没有保证,但在我的系统上,它不会在具有 100k 整数组的行上溢出。
What you're describing (and what you've more or less reinvented in your implementation with
foldLeft
and~
) is essentially Haskell'ssequence
for monads (really you only need an applicative functor, but that's irrelevant here).sequence
takes a list of monadic values and returns a monadic list of values.Parser
is a monad, sosequence
forParser
would change aList[Parser[A]]
into aParser[List[A]]
.Scalaz gives you
sequence
, but off the top of my head I don't know if there's a nice way to get the necessaryApplicative
instance forParser
. Fortunately you can roll your own pretty easily (I'm directly translating the Haskell definition):This gives us
List(List(1), List(2, 3), List(4, 5, 6))
forparser("1 2 3 4 5 6")
, as desired.(Note that I'm using
RegexParsers
here as a convenient complete example, but the approach works more generally.)What's going on might be a little clearer if we desugar the
for
comprehension:We can write
flatMap
asinto
andmap
as^^
:This isn't too far from your formulation, except that we're using a right fold instead of reversing and aren't building up and breaking down the
~
s.About efficiency: Both of our implementations are going to result in unpleasant call stacks. In my experience this is just a fact of life with Scala's parser combinators. To quote another Stack Overflow answer, for example:
My
sequence
-y approach addresses the "more readable" part of your question, and is almost certainly the cleanest way to solve the problem with Scala's parser combinators. It's marginally more efficient than your implementation, and should be fine for a few thousand groups or so. If you need to handle more than that, you'll have to look outside ofscala.util.parsing.combinator
. I'd recommend something like the following:No guarantees, but on my system it doesn't overflow on a line with 100k integer groups.
您是否考虑过使用 RegexParsers(在 scala.util.parsing.combinator 中)?然后你可以使用正则表达式作为解析器,它的计算速度非常快并且易于编写。
例如,如果您使用解析器组合器来解析 AST 进行简单算术运算,则可以使用正则表达式来解释引用对象的标记,以便可以解析诸如
appleList.size + 4
之类的表达式。这是一个相当简单的示例,但它展示了如何通过解析器组合器组合正则表达式。
Have you considered using a
RegexParsers
(inscala.util.parsing.combinator
)? Then you can use regular expressions as parsers, which will compute very fast and be easy to write.For example, if you are using parser combinators to parse an AST for simple arithmatic, you might use regular expressions to interpret tokens that refer to objects so you can parse expressions like
appleList.size + 4
.Here is a rather trivial example, but it shows how regular expressions can be combined by parser combinators.