Haskell 的 Parsec 库可以用来实现带备份的递归下降解析器吗?
我一直在考虑使用 Haskell 的 Parsec 解析库来解析 Java 的一个子集,作为递归下降解析器,作为更传统的解析器生成器解决方案(如 Happy)的替代方案。 Parsec 似乎很容易使用,解析速度对我来说绝对不是一个因素。不过,我想知道是否可以使用秒差距来实现“备份”,这是一种通过依次尝试每个产品来找到正确产品的技术。举一个简单的例子,考虑 JLS Java 语法的一开始:
Literal:
IntegerLiteral
FloatingPointLiteral
我想要一种方法,不必弄清楚应该如何排序这两个规则来使解析成功。就目前情况而言,像这样的幼稚实现:
literal = do {
x <- try (do { v <- integer; return (IntLiteral v)}) <|>
(do { v <- float; return (FPLiteral v)});
return(Literal x)
}
将不起作用...像“15.2”这样的输入将导致整数解析器首先成功,然后整个事情将因“.”而窒息。象征。当然,在这种情况下,显然可以通过重新排序两个产品来解决问题。不过,在一般情况下,找到这样的事情将是一场噩梦,而且我很可能会错过一些案例。理想情况下,我希望有一种方法可以让 Parsec 为我解决这样的问题。这是可能的,还是我只是想对图书馆做太多事情? Parsec 文档声称它可以“解析上下文相关的、无限前瞻语法”,所以看起来我应该能够在这里做一些事情。
I've been considering using Haskell's Parsec parsing library to parse a subset of Java as a recursive descent parser as an alternative to more traditional parser-generator solutions like Happy. Parsec seems very easy to use, and parse speed is definitely not a factor for me. I'm wondering, though, if it's possible to implement "backup" with Parsec, a technique which finds the correct production to use by trying each one in turn. For a simple example, consider the very start of the JLS Java grammar:
Literal:
IntegerLiteral
FloatingPointLiteral
I'd like a way to not have to figure out how I should order these two rules to get the parse to succeed. As it stands, a naive implementation like this:
literal = do {
x <- try (do { v <- integer; return (IntLiteral v)}) <|>
(do { v <- float; return (FPLiteral v)});
return(Literal x)
}
Will not work... inputs like "15.2" will cause the integer parser to succeed first, and then the whole thing will choke on the "." symbol. In this case, of course, it's obvious that you can solve the problem by re-ordering the two productions. In the general case, though, finding things like this is going to be a nightmare, and it's very likely that I'll miss some cases. Ideally, I'd like a way to have Parsec figure out stuff like this for me. Is this possible, or am I simply trying to do too much with the library? The Parsec documentation claims that it can "parse context-sensitive, infinite look-ahead grammars", so it seems like something like I should be able to do something here.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
实现此目的的一种方法是使用
try
组合器,它允许解析器使用输入并失败,而不会导致整个解析失败。另一种是使用
Text.ParserCombinators.ReadP
,它实现了一个对称选择运算符,其中证明了a +++ b = b +++ a
,所以哪个顺序并不重要。我相当偏爱ReadP
,因为它虽然很小,但提供了构建真正强大的解析器所需的内容。One way you can do this is to use the
try
combinator, which allows a parser to consume input and fail without failing the whole parse.Another is to use
Text.ParserCombinators.ReadP
, which implements a symmetric choice operator, in which it is proven thata +++ b = b +++ a
, so it really doesn't matter which order. I'm rather partial toReadP
, since it is minimal but provides what you need to build up a really powerful parser.要么使用 Parsec 的
notFollowedBy
来确保integer
消耗了直到某个标记分隔符的所有内容(这种方法大多数时候可以扩展到任意场景),或者看看解析器组合器探索所有可能的解析替代方案。首先想到的是 UU_Parsing 库。Either use Parsec's
notFollowedBy
to ensure thatinteger
consumed everything up to some token separator (this approach will scale to arbitrary scenario most of the time), or take a look at parser combinators that explore all possible parsing alternatives. First to come to mind is UU_Parsing library.