yacc/byacc/bison 和 lex/flex 的适当用途

发布于 08-24 03:37 字数 572 浏览 6 评论 0原文

我读到的大多数与这些实用程序相关的帖子通常建议使用其他方法来获得相同的效果。例如,提及这些工具的问题通常至少有一个包含以下内容的答案:

  • 使用 boost 库(在此处插入适当的 boost 库)
  • 不要创建 DSL 使用(在此处插入最喜欢的脚本语言)
  • Antlr 更好

假设开发人员...

  • ...熟悉 C 语言
  • ...确实了解至少一种脚本 语言(例如,Python,Perl等)
  • ...必须几乎编写一些解析代码 每个项目都在进行

所以我的问题是:

  • 什么是适当的情况 非常适合这些实用程序?
  • 有没有(合理的)情况 没有更好的地方 yacc 之外的问题替代方案 和 lex (或衍生物)?
  • 实际解析问题中出现的频率 人们会遇到任何短路吗? yacc 和 lex 中的内容是 最近更好地解决了 解决方案?
  • 对于还没有这样做的开发人员 熟悉这些工具是否值得 让他们投入时间 学习他们的语法/习语?怎么办 这些与其他解决方案相比?

Most of the posts that I read pertaining to these utilities usually suggest using some other method to obtain the same effect. For example, questions mentioning these tools usual have at least one answer containing some of the following:

  • Use the boost library (insert appropriate boost library here)
  • Don't create a DSL use (insert favorite scripting language here)
  • Antlr is better

Assuming the developer ...

  • ... is comfortable with the C language
  • ... does know at least one scripting
    language (e.g., Python, Perl, etc.)
  • ... must write some parsing code in almost
    every project worked on

So my questions are:

  • What are appropriate situations which
    are well suited for these utilities?
  • Are there any (reasonable) situations
    where there is not a better
    alternative to a problem than yacc
    and lex (or derivatives)?
  • How often in actual parsing problems
    can one expect to run into any short
    comings in yacc and lex which are
    better addressed by more recent
    solutions?
  • For a developer which is not already
    familiar with these tools is it worth
    it for them to invest time in
    learning their syntax/idioms? How do
    these compare with other solutions?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

情泪▽动烟2024-08-31 03:37:59

lex/yacc 和衍生工具今天看起来如此普遍的原因是它们比其他工具存在的时间要长得多,它们在文献中的覆盖率要高得多,而且它们传统上随 Unix 操作系统一起提供。它与它们与其他词法分析器和解析器生成器工具的比较关系不大。

无论您选择哪种工具,总会有一个显着的学习曲线。因此,一旦您使用某个工具几次并相对熟悉它的使用,您就不太可能愿意花费额外的精力来学习另一种工具。这是很自然的事。

此外,在 lex/yacc 创建的 20 世纪 60 年代末和 1970 年代初,硬件限制对解析提出了严峻的挑战。 Yacc 使用的表驱动 LR 解析方法在当时是最合适的,因为它可以通过使用相对较小的通用程序逻辑并通过在磁带或磁盘上的文件中保留状态来实现,从而以较小的内存占用来实现。代码驱动的解析方法(例如 LL)具有更大的最小内存占用量,因为解析器程序的代码本身代表语法,因此它需要完全适合 RAM 才能执行,并将状态保存在 RAM 中的堆栈上。

当内存变得更加充足时,更多的研究进入了不同的解析方法,例如 LL 和 PEG,以及如何使用这些方法构建工具。这意味着在 lex/yacc 系列之后创建的许多替代工具使用不同类型的语法。然而,切换语法类型也会带来显着的学习曲线。一旦您熟悉一种类型的语法(例如 LR 或 LALR 语法),您就不太可能想要切换到使用不同类型语法(例如 LL 语法)的工具。

总体而言,lex/yacc 系列工具通常比最新推出的工具更为基础,后者通常具有复杂的用户界面,可以以图形方式可视化语法和语法冲突,甚至通过自动重构解决冲突。

因此,如果您之前没有任何解析器工具的经验,如果您无论如何都必须学习新工具,那么您可能应该考虑其他因素,例如语法和冲突的图形可视化、自动重构、良好文档的可用性、语言生成的词法分析器/解析器可以在其中输出等等。不要仅仅因为“这就是其他人似乎在使用的”而选择任何工具。

以下是我可以想到的使用 lex/yacc 或 flex/bison 的一些原因:

  • 开发人员已经熟悉 lex/yacc 或 flex/bison
  • 开发人员最熟悉并熟悉 LR/LALR 语法
  • 开发人员有大量书籍涵盖lex/yacc,但没有涵盖其他内容的书籍
  • 开发人员即将获得一份预期的工作机会,并被告知 lex/yacc 技能将增加他被雇用的机会
  • 开发人员无法从项目成员/利益相关者那里获得使用的支持 环境已安装 lex/yacc的其他工具
  • ,并且由于某种原因安装其他工具不可行

The reasons why lex/yacc and derivatives seem so ubiquitous today are that they have been around for much longer than other tools, that they have far more coverage in the literature and that they traditionally came with Unix operating systems. It has very little to do with how they compare to other lexer and parser generator tools.

No matter which tool you pick, there is always going to be a significant learning curve. So once you have used a given tool a few times and become relatively comfortable in its use, you are unlikely to want to incur the extra effort of learning another tool. That's only natural.

Also, in the late 1960s and early 1970s when lex/yacc were created, hardware limitations posed a serious challenge to parsing. The table driven LR parsing method used by Yacc was the most suitable at the time because it could be implemented with a small memory footprint by using a relatively small general program logic and by keeping state in files on tape or disk. Code driven parsing methods such as LL had a larger minimum memory footprint because the parser program's code itself represents the grammar and therefore it needs to fit entirely into RAM to execute and it keeps state on the stack in RAM.

When memory became more plentiful a lot more research went into different parsing methods such as LL and PEG and how to build tools using those methods. This means that many of the alternative tools that have been created after the lex/yacc family use different types of grammars. However, switching grammar types also incurs a significant learning curve. Once you are familiar with one type of grammar, for example LR or LALR grammars, you are less likely to want to switch to a tool that uses a different type of grammar, for example LL grammars.

Overall, the lex/yacc family of tools is generally more rudimentary than more recent arrivals which often have sophisticated user interfaces to graphically visualise grammars and grammar conflicts or even resolve conflicts through automatic refactoring.

So, if you have no prior experience with any parser tools, if you have to learn a new tool anyway, then you should probably look at other factors such as graphical visualisation of grammars and conflicts, auto-refactoring, availability of good documentation, languages in which the generated lexers/parsers can be output etc etc. Don't pick any tool simply because "this is what everybody else seems to be using".

Here are some reasons I could think of for using lex/yacc or flex/bison :

  • the developer is already familiar with lex/yacc or flex/bison
  • the developer is most familiar and comfortable with LR/LALR grammars
  • the developer has plenty of books covering lex/yacc but no books covering others
  • the developer has a prospective job offer coming up and has been told that lex/yacc skills would increase his chances to get hired
  • the developer could not get buy-in from project members/stake holders for the use of other tools
  • the environment has lex/yacc installed and for some reason it is not feasible to install other tools
疑心病2024-08-31 03:37:59

是否值得学习这些工具将在很大程度上(几乎完全)取决于您编写了多少解析代码,或者您对按照一般顺序编写更多代码的兴趣有多大。我已经使用过它们很多次,并且发现它们非常有用。

您使用的工具并没有像许多人想象的那样产生很大的影响。对于我必须处理的大约 95% 的输入,它们之间的差异很小,因此最好的选择就是我最熟悉和最舒服的。

当然,lex 和 yacc 生成(并要求您用 C(或 C++)编写操作。如果您对它们不满意,那么使用和生成您喜欢的语言(例如 Python 或 Java)的工具无疑是更好的选择。就我而言,我不建议尝试将这样的工具与您不熟悉或不舒服的语言一起使用。特别是,如果您在产生编译器错误的操作中编写代码,那么在跟踪问题时,您从编译器获得的帮助可能会比平常少得多,因此您确实需要足够熟悉该语言才能识别问题只有关于编译器注意到错误的地方的最小提示。

Whether it's worth learning these tools or not will depend heavily (almost entirely) on how much parsing code you write, or how interested you are in writing more code on that general order. I've used them quite a bit, and find them extremely useful.

The tool you use doesn't really make as much difference as many would have you believe. For about 95% of the inputs I've had to deal with, there's little enough difference between one and another that the best choice is simply the one with which I'm most familiar and comfortable.

Of course, lex and yacc produce (and demand that you write your actions in) C (or C++). If you're not comfortable with them, a tool that uses and produces a language you prefer (e.g. Python or Java) will undoubtedly be a much better choice. I, for one, would not advise trying to use a tool like this with a language with which you're unfamiliar or uncomfortable. In particular, if you write code in an action that produces a compiler error, you'll probably get considerably less help from the compiler than usual in tracking down the problem, so you really need to be familiar enough with the language to recognize the problem with only a minimal hint about where compiler noticed something being wrong.

终难遇2024-08-31 03:37:59

在之前的项目中,我需要一种能够以相对非技术人员易于使用的方式生成对任意数据的查询的方法。这些数据是 CRM 类型的内容(例如名字、姓氏、电子邮件地址等),但它旨在针对许多不同的数据库工作,所有数据库都具有不同的模式。

因此,我开发了一些 DSL 来指定查询(例如,[FirstName]='Joe' AND [LastName]='Bloggs' 将选择每个名为“Joe Bloggs”的人)。它有一些更复杂的选项,例如“optedout(medium)”语法,它将选择所有选择不通过特定媒介(电子邮件、短信等)接收消息的人。有“ingroup(xyz)”,它会选择特定组中的每个人,等等。

基本上,它允许我们指定像“ingroup('GroupA') 而不是 ingroup('GroupB')”这样的查询,它将被转换为像这样的 SQL 查询:(

SELECT
    *
FROM
    Users
WHERE
    Users.UserID IN (SELECT UserID FROM GroupMemberships WHERE GroupID=2) AND
    Users.UserID NOT IN (SELECT UserID GroupMemberships WHERE GroupID=3)

正如您所看到的,查询并不是尽可能有效,但我猜这就是您通过机器生成得到的结果)。

我没有使用 flex/bison,但我确实使用了解析器生成器(目前我已经忘记了它的名字......)

In a previous project, I needed a way to be able to generate queries on arbitrary data in a way that was easy for a relatively non-technical person to be able to use. The data was CRM-type stuff (so First Name, Last Name, Email Address, etc) but it was meant to work against a number of different databases, all with different schemas.

So I developed a little DSL for specifying the queries (e.g. [FirstName]='Joe' AND [LastName]='Bloggs' would select everybody called "Joe Bloggs"). It had some more complicated options, for example there was the "optedout(medium)" syntax which would select all people who had opted-out of receiving messages on a particular medium (email, sms, etc). There was "ingroup(xyz)" which would select everybody in a particular group, etc.

Basically, it allowed us to specify queries like "ingroup('GroupA') and not ingroup('GroupB')" which would be translated to an SQL query like this:

SELECT
    *
FROM
    Users
WHERE
    Users.UserID IN (SELECT UserID FROM GroupMemberships WHERE GroupID=2) AND
    Users.UserID NOT IN (SELECT UserID GroupMemberships WHERE GroupID=3)

(As you can see, the queries aren't as effecient as possible, but that's what you get with machine generation, I guess).

I didn't use flex/bison for it, but I did use a parser generator (the name of which has escaped me at the moment...)

墟烟2024-08-31 03:37:59

我认为避免创建新语言只是为了支持特定于领域的语言是一个很好的建议。采用现有语言并通过领域功能对其进行扩展将更好地利用您的时间。

如果您出于其他原因尝试创建一种新语言,也许是为了研究语言设计,那么这些工具就有点过时了。较新的生成器(例如 antlr),甚至较新的实现语言(例如 ML),使语言设计变得更加容易。

如果有充分的理由使用这些工具,那可能是因为它们的遗产。您可能已经有了需要增强的语言框架,并且已经在其中一个工具中实现了该框架。您还可能会受益于有关这些旧工具的大量教程信息,但没有为更新和更灵活的语言实现方式编写的语料库。

I think it's pretty good advice to eschew the creation of new languages just to support a Domain specific language. It's going to be a better use of your time to take an existing language and extend it with domain functionality.

If you are trying to create a new language for some other reason, perhaps for research into language design, then these tools are a bit outdated. Newer generators such as antlr, or even newer implementation languages like ML, make language design a much easier affair.

If there's a good reason to use these tools, it's probably because of their legacy. You might already have a skeleton of a language you need to enhance, which is already implemented in one of these tools. You might also benefit from the huge volumes of tutorial information written about these old tools, for which there is not so great a corpus written for newer and slicker ways of implementing languages.

尬尬2024-08-31 03:37:59

我们在我的办公室实施了一整套编程语言。我们用它来实现这个目的。我认为这是一种快速、简单地编写解释器的方法。可以想象,您可以使用它们编写几乎任何类型的文本解析器,但很多时候要么 A)更容易自己快速编写,要么 B)您需要比它们提供的更多灵活性。

We have a whole programming language implemented in my office. We use it for that. I think it's meant to be a quick and easy way to write interpreters for things. You could conceivably write almost any sort of text parser using them, but a lot of times it's either A) easier to write it yourself quick or B) you need more flexibility than they provide.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文