Parsec 与 Yacc/Bison/Antlr:为什么以及何时使用 Parsec?
我是 Haskell 和 Parsec 的新手。读完第16章使用现实世界Haskell的秒差距后,我的脑海中出现了一个问题:为什么以及何时 Parsec 比 Yacc/Bison/Antlr 等其他解析器生成器更好?
我的理解是 Parsec 创建了一个很好的 DSL 编写解析器,而 Haskell 让它变得非常简单和富有表现力。但解析是这样一种标准/流行的技术,值得拥有自己的语言,输出到多种目标语言。那么我们什么时候应该使用 Parsec 而不是从 Bison/Antlr 生成 Haskell 代码呢?
这个问题可能超出了技术范围,进入了行业实践领域。当从头开始编写解析器时,与 Bison/Antlr 或类似的东西相比,选择 Haskell/Parsec 有什么好处?
顺便说一句:我的问题与这个问题非常相似,但不是那里得到了满意的答复。
I'm new to Haskell and Parsec. After reading Chapter 16 Using Parsec of Real World Haskell, a question appeared in my mind: Why and when is Parsec better than other parser generators like Yacc/Bison/Antlr?
My understanding is that Parsec creates a nice DSL of writing parsers and Haskell makes it very easy and expressive. But parsing is such a standard/popular technology that deserves its own language, which outputs to multiple target languages. So when shall we use Parsec instead of, say, generating Haskell code from Bison/Antlr?
This question might go a little beyond technology, and into the realm of industry practice. When writing a parser from scratch, what's the benefit of picking up Haskell/Parsec compared to Bison/Antlr or something similar?
BTW: my question is quite similar to this one but wasn't answered satisfactorily there.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可能想查看此问题以及问题中的链接问题。
哪种 Haskell 解析技术最好用,为什么?
在 Haskell 中,竞争发生在 Parsec(和其他解析器组合器)和解析器生成器 Happy 之间。如果我已经有一个 LR 语法可以使用,我会选择 Happy - 解析器组合器采用 LL 形式的语法,并且从 LR 到 LL 的翻译需要一些努力,并且组合器解析器通常会慢得多。如果我没有语法,我将使用 Parsec,它比 Happy 更灵活(更强大),并且“在 Haskell 中”工作比使用 Happy 和 Alex 生成代码更有趣。如果您使用 Happy 进行解析,您几乎总是需要使用 Alex 进行词法分析。
对于行业实践来说,决定使用 Haskell 只是为了获得秒差距是很奇怪的。对于解析,当前大多数语言都至少有一个解析器生成器,并且可能有一些更灵活的东西,例如 Parsec 端口或 PEG 系统。
Ira Baxter 对相关问题的回答非常准确,即解析器只需让您到达喜马拉雅山的立足点即可编写翻译器,但成为翻译器的一部分只是解析器的用途之一,所以仍然有很多领域像 ANTLR、Happy 和 Parsec 这样相当简约的系统是令人满意的。
You might want to see this question as well as the linked one in your question.
Which Haskell parsing technology is most pleasant to use, and why?
In Haskell the competition is between Parsec (and other parser combinators) and the parser generator Happy. I'd pick Happy if I already had an LR grammar to work from - parser combinators take grammars in LL form and the translation from LR to LL takes some effort and a combinator parser will usually be significantly slower. If I don't have a grammar I'll use Parsec, it is more flexible (powerful) than Happy and its more fun to work "in Haskell" than generate code with Happy and Alex. If you use Happy for parsing you almost always need to use Alex for lexing.
For industry practice, it would be odd to decide to use Haskell just to get Parsec. For parsing, most of the current crop of languages will have at least a parser generator and probably something more flexible like a port of Parsec or a PEG system.
Ira Baxter's answer to the linked question was spot-on about a parser getting you merely to the foothold of the Himalayas for writing a translator, but being part of a translator is only one of the uses for a parser, so there are still many domains where fairly minimalist systems like ANTLR, Happy and Parsec are satisfactory.
根据斯蒂芬的回答,我认为如果您想坚持使用解析器组合器,秒差距最常见的替代方案之一是 attoparsec。主要区别在于 attoparsec 的编写更多地偏向于速度,并相应地进行了权衡。例如,Parsec 会进行一些记录,以便在解析失败时尝试返回有用的错误消息,而 attoparsec 则不会做到这一点。另外,我认为 attoparsec 专门用于一种输入流/令牌类型,而 Parsec 从输入类型中抽象出来,以便它可以毫无问题地解析 String、ByteString、Text 等类型的流。
Following on from stephen's answer, I think that one of the most common alternatives to Parsec, if you want to stick with parser combinators, is attoparsec. The main difference is that attoparsec was written with more of a bias towards speed, and makes trade-offs accordingly. For example, Parsec does some book-keeping to try to return helpful error messages if a parse fails, which attoparsec doesn't do to the same extent. Also, I think that attoparsec is specialised to one input stream/token type, whereas Parsec abstracts from the input type so that it can parse streams of type String, ByteString, Text, etc. without problem.
您列出的工具之间的主要区别之一是 ANTLR、Bison 及其朋友是解析器生成器,而 Parsec 是解析器组合器库。
解析器生成器读取语法的描述并输出解析器。通常不可能将现有语法组合成新语法,并且当然不可能将两个现有生成的解析器组合成新解析器。
解析器组合器 OTOH 不执行任何操作,只是将现有解析器组合成新的解析器。通常,解析器组合器库附带几个简单的内置解析器,可以解析空字符串或单个字符,并且它附带一组组合器,它们采用 1 个或多个解析器并返回一个新的解析器,例如,解析原始解析器的序列(例如,您可以将
d
解析器和o
解析器组合起来形成do
解析器),交替原始解析器(例如0
解析器和1
解析器到0|1
解析器)或多次解析原始解析器(重复)。例如,这意味着您可以采用现有的 Java 解析器和现有的 HTML 解析器,并将它们组合成 JSP 解析器。
大多数解析器生成器不支持这一点,或者仅以有限的方式支持它。解析器组合器 OTOH 仅支持这一点,而不支持其他任何东西。
One of the main differences between the tools you listed, is that ANTLR, Bison and their friends are parser generators, whereas Parsec is a parser combinator library.
A parser generator reads in a description of a grammar and spits out a parser. It is generally not possible to combine existing grammars into a new grammar, and it is certainly not possible to combine two existing generated parsers into a new parser.
A parser combinator OTOH does nothing but combine existing parsers into new parsers. Usually, a parser combinator library ships with a couple of trivial built-in parsers that can parse the empty string or a single character, and it ships with a set of combinators that take 1 or more parsers and return a new one that, for example, parses the sequence of the original parsers (e.g. you can combine a
d
parser and ano
parser to form ado
parser), the alternation of the original parsers (e.g. a0
parser and a1
parser to a0|1
parser) or parses the original parse multiple times (repetetion).What this means is that you could, for example, take an existing parser for Java and an existing parser for HTML and combine them into a parser for JSP.
Most parser generators don't support this, or only support it in a limited way. Parser combinators OTOH only support this and nothing else.