yacc/byacc/bison 和 lex/flex 的适当用途
我读到的大多数与这些实用程序相关的帖子通常建议使用其他方法来获得相同的效果。例如,提及这些工具的问题通常至少有一个包含以下内容的答案:
- 使用 boost 库(在此处插入适当的 boost 库)
- 不要创建 DSL 使用(在此处插入最喜欢的脚本语言)
- Antlr 更好
假设开发人员...
- ...熟悉 C 语言
- ...确实了解至少一种脚本 语言(例如,Python,Perl等)
- ...必须几乎编写一些解析代码 每个项目都在进行
所以我的问题是:
- 什么是适当的情况 非常适合这些实用程序?
- 有没有(合理的)情况 没有更好的地方 yacc 之外的问题替代方案 和 lex (或衍生物)?
- 实际解析问题中出现的频率 人们会遇到任何短路吗? yacc 和 lex 中的内容是 最近更好地解决了 解决方案?
- 对于还没有这样做的开发人员 熟悉这些工具是否值得 让他们投入时间 学习他们的语法/习语?怎么办 这些与其他解决方案相比?
Most of the posts that I read pertaining to these utilities usually suggest using some other method to obtain the same effect. For example, questions mentioning these tools usual have at least one answer containing some of the following:
- Use the boost library (insert appropriate boost library here)
- Don't create a DSL use (insert favorite scripting language here)
- Antlr is better
Assuming the developer ...
- ... is comfortable with the C language
- ... does know at least one scripting
language (e.g., Python, Perl, etc.) - ... must write some parsing code in almost
every project worked on
So my questions are:
- What are appropriate situations which
are well suited for these utilities? - Are there any (reasonable) situations
where there is not a better
alternative to a problem than yacc
and lex (or derivatives)? - How often in actual parsing problems
can one expect to run into any short
comings in yacc and lex which are
better addressed by more recent
solutions? - For a developer which is not already
familiar with these tools is it worth
it for them to invest time in
learning their syntax/idioms? How do
these compare with other solutions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
发布评论
评论(5)
是否值得学习这些工具将在很大程度上(几乎完全)取决于您编写了多少解析代码,或者您对按照一般顺序编写更多代码的兴趣有多大。我已经使用过它们很多次,并且发现它们非常有用。
您使用的工具并没有像许多人想象的那样产生很大的影响。对于我必须处理的大约 95% 的输入,它们之间的差异很小,因此最好的选择就是我最熟悉和最舒服的。
当然,lex 和 yacc 生成(并要求您用 C(或 C++)编写操作。如果您对它们不满意,那么使用和生成您喜欢的语言(例如 Python 或 Java)的工具无疑是更好的选择。就我而言,我不建议尝试将这样的工具与您不熟悉或不舒服的语言一起使用。特别是,如果您在产生编译器错误的操作中编写代码,那么在跟踪问题时,您从编译器获得的帮助可能会比平常少得多,因此您确实需要足够熟悉该语言才能识别问题只有关于编译器注意到错误的地方的最小提示。
在之前的项目中,我需要一种能够以相对非技术人员易于使用的方式生成对任意数据的查询的方法。这些数据是 CRM 类型的内容(例如名字、姓氏、电子邮件地址等),但它旨在针对许多不同的数据库工作,所有数据库都具有不同的模式。
因此,我开发了一些 DSL 来指定查询(例如,[FirstName]='Joe' AND [LastName]='Bloggs' 将选择每个名为“Joe Bloggs”的人)。它有一些更复杂的选项,例如“optedout(medium)”语法,它将选择所有选择不通过特定媒介(电子邮件、短信等)接收消息的人。有“ingroup(xyz)”,它会选择特定组中的每个人,等等。
基本上,它允许我们指定像“ingroup('GroupA') 而不是 ingroup('GroupB')”这样的查询,它将被转换为像这样的 SQL 查询:(
SELECT
*
FROM
Users
WHERE
Users.UserID IN (SELECT UserID FROM GroupMemberships WHERE GroupID=2) AND
Users.UserID NOT IN (SELECT UserID GroupMemberships WHERE GroupID=3)
正如您所看到的,查询并不是尽可能有效,但我猜这就是您通过机器生成得到的结果)。
我没有使用 flex/bison,但我确实使用了解析器生成器(目前我已经忘记了它的名字......)
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
lex/yacc 和衍生工具今天看起来如此普遍的原因是它们比其他工具存在的时间要长得多,它们在文献中的覆盖率要高得多,而且它们传统上随 Unix 操作系统一起提供。它与它们与其他词法分析器和解析器生成器工具的比较关系不大。
无论您选择哪种工具,总会有一个显着的学习曲线。因此,一旦您使用某个工具几次并相对熟悉它的使用,您就不太可能愿意花费额外的精力来学习另一种工具。这是很自然的事。
此外,在 lex/yacc 创建的 20 世纪 60 年代末和 1970 年代初,硬件限制对解析提出了严峻的挑战。 Yacc 使用的表驱动 LR 解析方法在当时是最合适的,因为它可以通过使用相对较小的通用程序逻辑并通过在磁带或磁盘上的文件中保留状态来实现,从而以较小的内存占用来实现。代码驱动的解析方法(例如 LL)具有更大的最小内存占用量,因为解析器程序的代码本身代表语法,因此它需要完全适合 RAM 才能执行,并将状态保存在 RAM 中的堆栈上。
当内存变得更加充足时,更多的研究进入了不同的解析方法,例如 LL 和 PEG,以及如何使用这些方法构建工具。这意味着在 lex/yacc 系列之后创建的许多替代工具使用不同类型的语法。然而,切换语法类型也会带来显着的学习曲线。一旦您熟悉一种类型的语法(例如 LR 或 LALR 语法),您就不太可能想要切换到使用不同类型语法(例如 LL 语法)的工具。
总体而言,lex/yacc 系列工具通常比最新推出的工具更为基础,后者通常具有复杂的用户界面,可以以图形方式可视化语法和语法冲突,甚至通过自动重构解决冲突。
因此,如果您之前没有任何解析器工具的经验,如果您无论如何都必须学习新工具,那么您可能应该考虑其他因素,例如语法和冲突的图形可视化、自动重构、良好文档的可用性、语言生成的词法分析器/解析器可以在其中输出等等。不要仅仅因为“这就是其他人似乎在使用的”而选择任何工具。
以下是我可以想到的使用 lex/yacc 或 flex/bison 的一些原因:
The reasons why lex/yacc and derivatives seem so ubiquitous today are that they have been around for much longer than other tools, that they have far more coverage in the literature and that they traditionally came with Unix operating systems. It has very little to do with how they compare to other lexer and parser generator tools.
No matter which tool you pick, there is always going to be a significant learning curve. So once you have used a given tool a few times and become relatively comfortable in its use, you are unlikely to want to incur the extra effort of learning another tool. That's only natural.
Also, in the late 1960s and early 1970s when lex/yacc were created, hardware limitations posed a serious challenge to parsing. The table driven LR parsing method used by Yacc was the most suitable at the time because it could be implemented with a small memory footprint by using a relatively small general program logic and by keeping state in files on tape or disk. Code driven parsing methods such as LL had a larger minimum memory footprint because the parser program's code itself represents the grammar and therefore it needs to fit entirely into RAM to execute and it keeps state on the stack in RAM.
When memory became more plentiful a lot more research went into different parsing methods such as LL and PEG and how to build tools using those methods. This means that many of the alternative tools that have been created after the lex/yacc family use different types of grammars. However, switching grammar types also incurs a significant learning curve. Once you are familiar with one type of grammar, for example LR or LALR grammars, you are less likely to want to switch to a tool that uses a different type of grammar, for example LL grammars.
Overall, the lex/yacc family of tools is generally more rudimentary than more recent arrivals which often have sophisticated user interfaces to graphically visualise grammars and grammar conflicts or even resolve conflicts through automatic refactoring.
So, if you have no prior experience with any parser tools, if you have to learn a new tool anyway, then you should probably look at other factors such as graphical visualisation of grammars and conflicts, auto-refactoring, availability of good documentation, languages in which the generated lexers/parsers can be output etc etc. Don't pick any tool simply because "this is what everybody else seems to be using".
Here are some reasons I could think of for using lex/yacc or flex/bison :