当前位置：文江博客话题详情

yacc/byacc/bison 和 lex/flex 的适当用途

发布于 08-24 03:37 字数 572 浏览 6 评论 0原文

我读到的大多数与这些实用程序相关的帖子通常建议使用其他方法来获得相同的效果。例如，提及这些工具的问题通常至少有一个包含以下内容的答案：

使用 boost 库（在此处插入适当的 boost 库）
不要创建 DSL 使用（在此处插入最喜欢的脚本语言）
Antlr 更好

假设开发人员...

...熟悉 C 语言
...确实了解至少一种脚本语言（例如，Python，Perl等）
...必须几乎编写一些解析代码每个项目都在进行

所以我的问题是：

什么是适当的情况非常适合这些实用程序？
有没有（合理的）情况没有更好的地方 yacc 之外的问题替代方案和 lex （或衍生物）？
实际解析问题中出现的频率人们会遇到任何短路吗？ yacc 和 lex 中的内容是最近更好地解决了解决方案？
对于还没有这样做的开发人员熟悉这些工具是否值得让他们投入时间学习他们的语法/习语？怎么办这些与其他解决方案相比？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

情泪▽动烟2024-08-31 03:37:59

lex/yacc 和衍生工具今天看起来如此普遍的原因是它们比其他工具存在的时间要长得多，它们在文献中的覆盖率要高得多，而且它们传统上随 Unix 操作系统一起提供。它与它们与其他词法分析器和解析器生成器工具的比较关系不大。

无论您选择哪种工具，总会有一个显着的学习曲线。因此，一旦您使用某个工具几次并相对熟悉它的使用，您就不太可能愿意花费额外的精力来学习另一种工具。这是很自然的事。

此外，在 lex/yacc 创建的 20 世纪 60 年代末和 1970 年代初，硬件限制对解析提出了严峻的挑战。 Yacc 使用的表驱动 LR 解析方法在当时是最合适的，因为它可以通过使用相对较小的通用程序逻辑并通过在磁带或磁盘上的文件中保留状态来实现，从而以较小的内存占用来实现。代码驱动的解析方法（例如 LL）具有更大的最小内存占用量，因为解析器程序的代码本身代表语法，因此它需要完全适合 RAM 才能执行，并将状态保存在 RAM 中的堆栈上。

当内存变得更加充足时，更多的研究进入了不同的解析方法，例如 LL 和 PEG，以及如何使用这些方法构建工具。这意味着在 lex/yacc 系列之后创建的许多替代工具使用不同类型的语法。然而，切换语法类型也会带来显着的学习曲线。一旦您熟悉一种类型的语法（例如 LR 或 LALR 语法），您就不太可能想要切换到使用不同类型语法（例如 LL 语法）的工具。

总体而言，lex/yacc 系列工具通常比最新推出的工具更为基础，后者通常具有复杂的用户界面，可以以图形方式可视化语法和语法冲突，甚至通过自动重构解决冲突。

因此，如果您之前没有任何解析器工具的经验，如果您无论如何都必须学习新工具，那么您可能应该考虑其他因素，例如语法和冲突的图形可视化、自动重构、良好文档的可用性、语言生成的词法分析器/解析器可以在其中输出等等。不要仅仅因为“这就是其他人似乎在使用的”而选择任何工具。

以下是我可以想到的使用 lex/yacc 或 flex/bison 的一些原因：

开发人员已经熟悉 lex/yacc 或 flex/bison
开发人员最熟悉并熟悉 LR/LALR 语法
开发人员有大量书籍涵盖lex/yacc，但没有涵盖其他内容的书籍
开发人员即将获得一份预期的工作机会，并被告知 lex/yacc 技能将增加他被雇用的机会
开发人员无法从项目成员/利益相关者那里获得使用的支持环境已安装 lex/yacc的其他工具
，并且由于某种原因安装其他工具不可行

The reasons why lex/yacc and derivatives seem so ubiquitous today are that they have been around for much longer than other tools, that they have far more coverage in the literature and that they traditionally came with Unix operating systems. It has very little to do with how they compare to other lexer and parser generator tools.

No matter which tool you pick, there is always going to be a significant learning curve. So once you have used a given tool a few times and become relatively comfortable in its use, you are unlikely to want to incur the extra effort of learning another tool. That's only natural.

Also, in the late 1960s and early 1970s when lex/yacc were created, hardware limitations posed a serious challenge to parsing. The table driven LR parsing method used by Yacc was the most suitable at the time because it could be implemented with a small memory footprint by using a relatively small general program logic and by keeping state in files on tape or disk. Code driven parsing methods such as LL had a larger minimum memory footprint because the parser program's code itself represents the grammar and therefore it needs to fit entirely into RAM to execute and it keeps state on the stack in RAM.

When memory became more plentiful a lot more research went into different parsing methods such as LL and PEG and how to build tools using those methods. This means that many of the alternative tools that have been created after the lex/yacc family use different types of grammars. However, switching grammar types also incurs a significant learning curve. Once you are familiar with one type of grammar, for example LR or LALR grammars, you are less likely to want to switch to a tool that uses a different type of grammar, for example LL grammars.

Overall, the lex/yacc family of tools is generally more rudimentary than more recent arrivals which often have sophisticated user interfaces to graphically visualise grammars and grammar conflicts or even resolve conflicts through automatic refactoring.

So, if you have no prior experience with any parser tools, if you have to learn a new tool anyway, then you should probably look at other factors such as graphical visualisation of grammars and conflicts, auto-refactoring, availability of good documentation, languages in which the generated lexers/parsers can be output etc etc. Don't pick any tool simply because "this is what everybody else seems to be using".

Here are some reasons I could think of for using lex/yacc or flex/bison :

the developer is already familiar with lex/yacc or flex/bison
the developer is most familiar and comfortable with LR/LALR grammars
the developer has plenty of books covering lex/yacc but no books covering others
the developer has a prospective job offer coming up and has been told that lex/yacc skills would increase his chances to get hired
the developer could not get buy-in from project members/stake holders for the use of other tools
the environment has lex/yacc installed and for some reason it is not feasible to install other tools

回复收藏 0 原文

疑心病2024-08-31 03:37:59

是否值得学习这些工具将在很大程度上（几乎完全）取决于您编写了多少解析代码，或者您对按照一般顺序编写更多代码的兴趣有多大。我已经使用过它们很多次，并且发现它们非常有用。

您使用的工具并没有像许多人想象的那样产生很大的影响。对于我必须处理的大约 95% 的输入，它们之间的差异很小，因此最好的选择就是我最熟悉和最舒服的。

当然，lex 和 yacc 生成（并要求您用 C（或 C++）编写操作。如果您对它们不满意，那么使用和生成您喜欢的语言（例如 Python 或 Java）的工具无疑是更好的选择。就我而言，我不建议尝试将这样的工具与您不熟悉或不舒服的语言一起使用。特别是，如果您在产生编译器错误的操作中编写代码，那么在跟踪问题时，您从编译器获得的帮助可能会比平常少得多，因此您确实需要足够熟悉该语言才能识别问题只有关于编译器注意到错误的地方的最小提示。

回复收藏 0 原文

终难遇2024-08-31 03:37:59

在之前的项目中，我需要一种能够以相对非技术人员易于使用的方式生成对任意数据的查询的方法。这些数据是 CRM 类型的内容（例如名字、姓氏、电子邮件地址等），但它旨在针对许多不同的数据库工作，所有数据库都具有不同的模式。

因此，我开发了一些 DSL 来指定查询（例如，[FirstName]='Joe' AND [LastName]='Bloggs' 将选择每个名为“Joe Bloggs”的人）。它有一些更复杂的选项，例如“optedout(medium)”语法，它将选择所有选择不通过特定媒介（电子邮件、短信等）接收消息的人。有“ingroup(xyz)”，它会选择特定组中的每个人，等等。

基本上，它允许我们指定像“ingroup('GroupA') 而不是 ingroup('GroupB')”这样的查询，它将被转换为像这样的 SQL 查询：（

SELECT
    *
FROM
    Users
WHERE
    Users.UserID IN (SELECT UserID FROM GroupMemberships WHERE GroupID=2) AND
    Users.UserID NOT IN (SELECT UserID GroupMemberships WHERE GroupID=3)

正如您所看到的，查询并不是尽可能有效，但我猜这就是您通过机器生成得到的结果）。

我没有使用 flex/bison，但我确实使用了解析器生成器（目前我已经忘记了它的名字......）

In a previous project, I needed a way to be able to generate queries on arbitrary data in a way that was easy for a relatively non-technical person to be able to use. The data was CRM-type stuff (so First Name, Last Name, Email Address, etc) but it was meant to work against a number of different databases, all with different schemas.

So I developed a little DSL for specifying the queries (e.g. [FirstName]='Joe' AND [LastName]='Bloggs' would select everybody called "Joe Bloggs"). It had some more complicated options, for example there was the "optedout(medium)" syntax which would select all people who had opted-out of receiving messages on a particular medium (email, sms, etc). There was "ingroup(xyz)" which would select everybody in a particular group, etc.

Basically, it allowed us to specify queries like "ingroup('GroupA') and not ingroup('GroupB')" which would be translated to an SQL query like this:

SELECT
    *
FROM
    Users
WHERE
    Users.UserID IN (SELECT UserID FROM GroupMemberships WHERE GroupID=2) AND
    Users.UserID NOT IN (SELECT UserID GroupMemberships WHERE GroupID=3)

(As you can see, the queries aren't as effecient as possible, but that's what you get with machine generation, I guess).

I didn't use flex/bison for it, but I did use a parser generator (the name of which has escaped me at the moment...)

回复收藏 0 原文