Lexer/parser 从 BNF 语法生成 Scala 代码
我目前正在寻找一个词法分析器/解析器,它可以从 BNF 语法(具有优先级和关联性的 ocamlyacc
文件)生成 Scala 代码。我很困惑,因为我几乎没有发现如何做到这一点。
为了进行解析,我找到了 scala-bison (我在使用它时遇到了很多麻烦)。所有其他工具只是导入到 Scala 中的 Java 解析器(如 ANTLR)。
对于词法分析,我什么也没找到。
我还发现了 Scala 著名的解析器组合器,但是(如果我错了请纠正我),即使它们相当吸引人,它们也会消耗大量的时间和内存,主要是由于回溯。
所以我有两个主要问题:
- 为什么人们似乎只关注 _parser 组合器?
- 对于与 Scala 一起使用的最佳词法分析器/解析器生成器建议是什么?
I'm currently looking for a lexer/parser that generates Scala code from a BNF grammar (an ocamlyacc
file with precedence and associativity). I'm quite confused since I found almost nothing on how to do it.
For parsing, I found scala-bison
(that I have a lot of trouble to work with). All the other tools are just Java parsers imported into Scala (like ANTLR
).
For lexing, I found nothing.
I also found the famous parser combinators of Scala, but (correct me if I'm wrong), even if they are quite appealing, they consume a lot of time and memory, mainly due to backtracking.
So I have two main questions:
- Why do people only seem to concentrate on _parser combinators?
- What is your best lexer/parser generator suggestion to use with Scala?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
作为 ScalaBison 论文的作者之一,我多次遇到过这个问题。 :-) 在 Scala 中进行扫描时我通常会使用 JFlex。它与 ScalaBison 配合得非常好,我们所有的基准测试都是使用该组合完成的。不幸的缺点是它确实会生成 Java 源代码,因此编译需要一些技巧。我相信John Boyland(论文主要作者)开发了一种Scala输出模式对于 JFlex,但我认为它尚未公开发布。
为了我自己的发展,我一直在研究无扫描器解析技术。 Scala 2.8 的 Packrat 解析器组合器非常好,但仍然不通用。我构建了一个实验库,它在解析器组合器框架内实现了广义解析。它的渐近界限比传统的解析器组合器要好得多,但实际上恒定时间开销更高(我仍在研究它)。
As one of the authors of the ScalaBison paper, I have run into this issue a few times. :-) What I would usually do for scanning in Scala is use JFlex. It works surprisingly well with ScalaBison, and all of our benchmarking was done using that combination. The unfortunate downside is that it does generate Java sources, and so compilation takes a bit of gymnastics. I believe that John Boyland (the main author of the paper) has developed a Scala output mode for JFlex, but I don't think it has been publicly released.
For my own development, I've been working a lot with scannerless parsing techniques. Scala 2.8's packrat parser combinators are quite good, though still not generalized. I've built an experimental library which implements generalized parsing within the parser combinator framework. Its asymptotic bounds are much better than traditional parser combinators, but in practice the constant time overhead is higher (I'm still working on it).
Scala 2.8 有一个 Packrat 解析器。我在这里引用 API 文档:
Scala 2.8 has a packrat parser. I quote from the API docs here:
我知道这个问题已经很老了,但是对于那些仍在寻找输出 Scala 代码的词法分析器生成器的人,我已经写了 JFlex 的一个分支,它发出 Scala 而不是 Java,包括相应的 Maven 和 sbt 插件。所有这些现在都可以在 Maven Central 上找到。
我们目前正在使用它(包括 Maven/sbt 插件)将英语文本标记为 中自然语言处理管道的一部分FACTORIE -- 包含 Scala 的示例 .flex 文件在这里。
I know that this question is old, but for those still in search of a lexer generator that outputs Scala code, I've written a fork of JFlex that emits Scala rather than Java, including corresponding Maven and sbt plugins. All are now available on Maven Central.
We're currently using it (including the Maven/sbt plugins) to tokenize English text as part of the natural language processing pipline in FACTORIE -- example .flex file containing Scala here.