是否有适用于 Haskell 或 Scala 等函数式语言的 LL 解析器生成器?
我注意到明显缺乏用函数式语言创建解析器的 LL 解析器。我一直在寻找但没有成功的理想发现是为 ANTLR 风格的 LL(*) 语法生成 Haskell 解析器(语法的模小数重新格式化),并且令我惊讶的是,每个最后一个解析器生成器都具有函数我发现的语言目标是某种 LR 解析器。
我想将我正在开发的这种语言的解析器从 ANTLR 的功能特性转换为该语言本身的自托管,如果我可以将另一种功能语言中几乎肯定正确的东西移植到我的语言中,那将会有很大帮助(最好是我熟悉的 Haskell 和 Scala),而不必完全从头开始重写,尽管最终我可能会这样做,因为核心语言很小。
在这一点上,我很好奇为什么没有这样的 LL(*) 甚至 LL(k) 解析器生成器,而是有很多 LR 生成器,因为 LL看起来本质上更容易。
I've noticed a distinct lack of LL parsers that create parsers in functional languages. The ideal find for what I've been looking for without success is something to generate a Haskell parser for an ANTLR-style LL(*) grammar (modulo minor reformatting of the grammar), and was surprised that every last parser generator with a functional language target I found was some kind of LR parser.
I want to transition the parser of this language I'm working on which has functional features from ANTLR to self-host in the language itself, and it would help a lot if I could port to my language something almost surely correct in another functional language (preferably ones I'm familiar with, Haskell and Scala), instead of having to rewrite it entirely from scratch, though in the end I might do this, since the core language is small.
At this point more than even a solution to this is I'm very curious as to why there are no such LL(*) or even LL(k) parser generators, but many LR generators, since LL seems inherently easier.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
主要原因是大多数用函数式语言编写的 LL(k) 解析器只是使用解析器组合器实现,因为生成解析器组合器库的最简单路径是 递归下降。
Haskell 的 parsec、attoparsec、polyparse 和 Scala 的常用解析器组合器都生成有效的 LL(*) 解析器。
parsec 和 attoparsec 都要求您使用显式的 try 组合器来进行回溯,但这只是为了提高效率和 scala 解析器组合器还可以处理Packrat 解析。
请考虑 Brent Yorgey 最近的 公告中的以下片段: /hackage.haskell.org/package/unbound">unbound 包:
很容易看到原始语法。
LR 解析器需要更复杂的预处理来生成要有效执行的表,因为使用诸如递归上升 非常糟糕。
通过将解析器组合器实现为 EDSL 而不是外部工具,您可以更好地利用编程语言的高级功能。您可以使部分语法更高阶,将 词法分析器 hack 直接构建到解析器中,等等。典型的 LR 解析器生成器可以不做这些事情,或者只能在有限的上下文中以临时方式提供它们,因为最终需要能够发出表格。
The major reason for this is that most LL(k) parsers that are written in functional languages are just implemented using parser combinators, because the easiest path to generate a parser combinator library is recursive descent.
Haskell's parsec, attoparsec, and polyparse and Scala's stock parser combinators all produce what are effectively LL(*) parsers.
Both parsec and attoparsec require you to use an explicit try combinator to get backtracking, but this is only done for efficiency and the scala parser combinators can also deal with packrat parsing.
Consider the following fragment from the announcement of Brent Yorgey's recent unbound package:
it is pretty easy to see the original grammar.
LR parsers require much more complicated preprocessing to generate the tables to execute efficiently, since the direct hand encoding of one using something like recursive ascent is pretty awful.
By implementing your parser combinators as an EDSL rather than an external tool you enable greater use of advanced features of your programming language. You can make portions of the grammar higher order, build the lexer hack directly into the parser, etc. Typical LR parser generators can't do these things, or can only offer them in ad hoc ways in limited contexts because of the need to be able to emit the tables in the end.
你激励我将一个旧的爱好项目发布到 https://github.com/nrnrnr/ebnf 。它支持标准 ML 的 LL(1) 解析器生成。适应 Haskell 并不难,只要你能找到懂 Icon 的人。
爱德华关于人们倾向于使用组合器进行解析的评论是非常正确的,但是能够找到 FIRST 和 FOLLOW 集并且会抱怨歧义的工具有其优点。
EBNF 的主要优点是它的表示法:序列、
Maybe
类型和保留字都原生支持,没有额外的麻烦。You've inspired me to post an old hobby project to https://github.com/nrnrnr/ebnf. It supports LL(1) parser generation for Standard ML. Adapting to Haskell would not be hard, provided you can find someone who understands Icon.
Edward's comment that people tend to prefer combinators for parsing is quite correct, but there are advantages to a tool that will find FIRST and FOLLOW sets and will complain about ambiguity.
The primary advantage of EBNF is its notation: sequences,
Maybe
types, and reserved words are all supported natively, with no extra fuss.SML 拥有 ml-antlr 已有几年了:
http://www.classes.cs.uchicago.edu/archive/2007/winter/22610-1/docs/lpt-manual.pdf
还有用于Scheme的sllgen。
至于为什么 LR 解析器生成器比 LL 多得多 - 手工编写 LR 解析器很困难,所以你确实需要一个生成器。使用 LL 解析器,手工编码的实现仍然与语法匹配,因此对生成器的需求要少得多。
SML has had ml-antlr for a couple of years now:
http://www.classes.cs.uchicago.edu/archive/2007/winter/22610-1/docs/lpt-manual.pdf
There is also sllgen for Scheme.
As to why there are many more LR parser generators than LL ones - it's difficult to write LR parsers by hand, so you really need a generator. With LL parsers, a hand-coded implementation still matches the grammar, so there is much less need for a generator.
通过 Scala,您可以使用所有现有的 Java 工具,而无需太多开销。 JavaCC 是一个 LL(k) 解析器生成器。您可以使用它自动创建具体的语法树并在 Scala 中执行其他所有操作。我实际上是在一个小项目中这样做的,只是因为 JavaCC 语法已经存在。
With Scala you could use all the existing Java tools without much overhead. JavaCC is an LL(k) parser generator. You can use it to create a concrete syntax tree automatically and do everything else in Scala. I actually did that for a small project, simply because the JavaCC grammar already existed.
在描述正在开发中的 LL(k) 解决方案之前,我将解释为什么我没有使用我所知道的其他可用选项。
解析器组合器库,即递归下降解析器,不会:
缺乏上下文自由表现为歧义在语法中,例如,源代码行开头的运算符(在未使用分号发送的行之后)是否是其右侧表达式上的一元前缀,或者是其右侧表达式上的中缀二元运算符前面的源代码行末尾的表达式。
JavaCC 有以下缺点:
我正在尝试创建一个 LL(k) 解析器生成器,它将输出到 Scala,然后希望引导程序输出我正在开发的语言(Copute,它将编译为 Scala)。
我当前的尝试是使用子集 SLK语法是语法,因此SLK工具可以用来验证语法是否上下文无关。
Before I describe a LL(k) solution under development, I will explain why I didn't use the other available options that I am aware of.
Parser combinator libraries, i.e. recursive descent parsers, do not:
A lack of context-freedom is manifested as ambiguities in the syntax, e.g. whether an operator at the start of a source code line (following a line that did not send with a semicolon) is a prefix unary on the expression to its right side, or an infix binary operator on the expression at the end of the prior source code line.
JavaCC has the following disadvantages:
I am attempting to create a LL(k) parser generator that would output to Scala, and then hopefully bootstrap to output the language I am developing (Copute, which will compile to Scala).
My current attempt is using a subset of the SLK grammar syntax, so the SLK tool can be used to verify the grammar is context-free.
您可以使用 ANTLR 来生成 Java 中的 LL* 解析器(以及其他),因此
.class resp
.jar
文件。You can use ANTLR which generates LL* parsers in Java (amongst others), hence
.class
resp.jar
files.