柠檬力还是不柠檬力?

发布于 2024-10-08 22:07:17 字数 439 浏览 9 评论 0原文

对于语法解析器,我曾经使用 Bison 来“玩”,它有它的优点/缺点。

上周,我在 SqLite 网站上注意到该引擎是用另一个语法解析器完成的:柠檬

读完薄文档后听起来很棒。
您对这个解析器有什么反馈吗?

无法真正在谷歌和维基百科上看到相关信息(只是一些例子,相同的教程)它似乎不是很受欢迎。 (Stack Overflow 中没有 标签 [ed :现在有:P])

For grammar parser, I used to "play" with Bison which have its pros/cons.

Last week, I noticed on SqLite site that the engine is done with another grammar parser: Lemon

Sounds great after reading the thin documentation.
Do you have some feedback about this parser?

Cannot really see pertinent information on Google and Wikipedia (just a few examples, same tutorials) It doesn't seem very popular. (there is no tag in Stack Overflow [ed: there is now :P])

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

硪扪都還晓 2024-10-15 22:07:17

我们在固件项目中使用 Lemon 的原因是:

  • 生成的代码尺寸小且内存占用少。它产生了我发现的最小的解析器(我比较了由 flex、bison、ANTLR 和 Lemon 生成的类似复杂度的解析器);
  • 对嵌入式系统的出色支持:Lemon不依赖于标准库,可以指定外部内存管理函数,调试日志记录可移除。
  • 公共领域许可证。有一个根据 GPLv2 许可的单独的 Lemon 分支,由于病毒许可的原因,不适合我们的需求。因此,我们获取最新的 sqlite 源代码并从中编译 Lemon(它仅包含两个文件);
  • 拉式解析。它使代码比 Flex/Bison 解析代码更容易理解和维护。线程安全是我欣赏的额外好处。
  • 与标记器的简单集成。我们的项目性质需要对具有可变标记大小的二进制流进行标记。这是一个非常容易实现的分词器,并与只有 3 个函数和一个反馈上下文变量的解析器 API 集成。我们研究了将 Lemon 与 re2c 和 Ragel 集成的方法,发现它们也很容易实现。
  • 语法非常简单,学习起来很快。
  • Lemon明确地将分词器和词法分析器(解析器)分开开发。我的开发流程从解析器语法的设计开始。我可以在第一阶段通过多次 Parser(...) 调用来检查具有隐式标记序列的复杂规则。 Tokenizer是随后实现的。

当然,柠檬并不是灵丹妙药,它的应用范围有限。缺点是:

  • 与 Bison 相比,Lemon 需要编写更多规则,因为语法简化:无重复和可选、每个规则一个操作等。
  • 完整的 LALR(1) 解析器限制集。
  • 只有C语言。

在做出选择之前权衡利弊。我已经完成了我的;-)

Reasons we are using Lemon in our firmware project are:

  • Small size of generated code and memory footprint. It produces the smallest parser I found (I compared parsers of similar complexity generated by flex, bison, ANTLR, and Lemon);
  • Excellent support of embedded systems: Lemon doesn't depend on standard library, you can specify external memory management functions, debug logging is removable.
  • Public domain license. There is separate fork of Lemon licensed under GPLv2 that is not suitable for our needs because of viral license. So we get latest sqlite sources and compile Lemon out of them (it consists of only two files);
  • Pull-parsing. It makes code more straightforward to understand and maintain than Flex/Bison parsing code. Thread-safety as an additional bonus I admire.
  • Simple integration with tokenizers. Our project nature requires tokenizing of binary stream with variable tokens size. It was quite an easy to implemented tokenizer and integrate with parser API of only 3 functions and one feedback context variable. We investigated ways of integrating Lemon with re2c and Ragel and found them also quite easy to implement.
  • Very simple syntax fast to learn.
  • Lemon explicitly separate development of tokenizer and lexical analyzer(parser). My development flow starts with designing of parser grammar. I'm able to check complex rules with implicit token sequence by the means of several Parser(...) calls at this first stage. Tokenizer is implemented afterwards.

Surely Lemon is not a silver bullet, it has limited area of application. Among disadvantages:

  • Lemon requires to write more rules in comparison with Bison because of simplified syntax: no repetitions and optionals, one action per rule, etc.
  • Complete set of LALR(1) parser limitations.
  • Only the C language.

Weigh the pros and cons before making your choice. I've done mine ;-)

╰沐子 2024-10-15 22:07:17

有趣的发现!我没有实际使用过,所以评论是基于阅读文档。

重新设计使词法分析与解析分开进行似乎是有优点的。特别是,它有可能简化操作,例如处理多个或嵌套源文件。基于 Lex 的 yywrap() 机制不太理想。它避免了所有全局变量,并且具有仔细的内存分配和释放控制,这应该对它有利(它允许选择分配器和释放器也有很大帮助 - 至少对于我工作的环境来说,内存分配始终是一个问题) 。

重新思考如何组织规则以及如何识别终端是一个好主意。

总而言之,它看起来像是 Bison 经过深思熟虑的重新设计。

根据引用的网页,它属于公共领域。

Interesting find! I haven't actually used it, so the commentary is based on reading the documentation.

The redesign so that the lexical analysis is done separately from the parsing immediately seems to have merit. In particular, it has the potential to simplify operations such as handling multiple or nested source files. The Lex-based yywrap() mechanism is less than ideal. That it avoids all global variables and has careful memory allocation and deallocation control should count in its favour (that it allows the choice of allocator and deallocator greatly helps too - at least for the environments where I work, where memory allocation is always an issue).

The rethinking on how the rules are organized and how the terminals are identified is a good idea.

All in all, it looks like a well thought out redesign of Bison.

It is in the public domain according to the referenced web pages.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文