Haskell 的 Parsec 库可以用来实现带备份的递归下降解析器吗？

发布于 2024-08-26 02:52:32 字数 769 浏览 9 评论 0原文

我一直在考虑使用 Haskell 的 Parsec 解析库来解析 Java 的一个子集，作为递归下降解析器，作为更传统的解析器生成器解决方案（如 Happy）的替代方案。 Parsec 似乎很容易使用，解析速度对我来说绝对不是一个因素。不过，我想知道是否可以使用秒差距来实现“备份”，这是一种通过依次尝试每个产品来找到正确产品的技术。举一个简单的例子，考虑 JLS Java 语法的一开始：

Literal:
    IntegerLiteral  
    FloatingPointLiteral

我想要一种方法，不必弄清楚应该如何排序这两个规则来使解析成功。就目前情况而言，像这样的幼稚实现：

literal = do {
    x <- try (do { v <- integer; return (IntLiteral v)}) <|>
         (do { v <- float; return (FPLiteral v)});
    return(Literal x)
}

将不起作用...像“15.2”这样的输入将导致整数解析器首先成功，然后整个事情将因“.”而窒息。象征。当然，在这种情况下，显然可以通过重新排序两个产品来解决问题。不过，在一般情况下，找到这样的事情将是一场噩梦，而且我很可能会错过一些案例。理想情况下，我希望有一种方法可以让 Parsec 为我解决这样的问题。这是可能的，还是我只是想对图书馆做太多事情？ Parsec 文档声称它可以“解析上下文相关的、无限前瞻语法”，所以看起来我应该能够在这里做一些事情。

原文

I've been considering using Haskell's Parsec parsing library to parse a subset of Java as a recursive descent parser as an alternative to more traditional parser-generator solutions like Happy. Parsec seems very easy to use, and parse speed is definitely not a factor for me. I'm wondering, though, if it's possible to implement "backup" with Parsec, a technique which finds the correct production to use by trying each one in turn. For a simple example, consider the very start of the JLS Java grammar:

Literal:
    IntegerLiteral  
    FloatingPointLiteral

I'd like a way to not have to figure out how I should order these two rules to get the parse to succeed. As it stands, a naive implementation like this:

literal = do {
    x <- try (do { v <- integer; return (IntLiteral v)}) <|>
         (do { v <- float; return (FPLiteral v)});
    return(Literal x)
}

Will not work... inputs like "15.2" will cause the integer parser to succeed first, and then the whole thing will choke on the "." symbol. In this case, of course, it's obvious that you can solve the problem by re-ordering the two productions. In the general case, though, finding things like this is going to be a nightmare, and it's very likely that I'll miss some cases. Ideally, I'd like a way to have Parsec figure out stuff like this for me. Is this possible, or am I simply trying to do too much with the library? The Parsec documentation claims that it can "parse context-sensitive, infinite look-ahead grammars", so it seems like something like I should be able to do something here.

分享到QQ

分享到微博