c++ lex 和 bison 相对于自制分词器/解析器有什么优势
我想在 C++ 中进行一些解析和标记以用于学习目的。现在,我在网上阅读有关该主题的内容时经常遇到 bison/yacc 和 lex。 与使用 STL 或 boost::regex 甚至只是 C 编写的分词器/解析器相比,使用它们是否有任何主要好处?
I would like to do some parsing and tokenizing in c++ for learning purposes. Now I often times came across bison/yacc and lex when reading about this subject online.
Would there be any mayor benefit of using those over for instance a tokenizer/parser written using STL or boost::regex or maybe even just C?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我最近开始编写一个简单的词法分析器和解析器。
事实证明,词法分析器的手动编码更简单。但解析器有点困难。我的 Bison 生成的解析器几乎立刻就工作了,它给了我很多有用的信息,告诉我我在哪里忘记了状态。后来我手工编写了相同的解析器,但在让它完美运行之前需要进行更多的调试。
词法分析器和解析器生成工具的吸引力在于,您可以用一种干净、易于阅读的语言编写规范,该语言接近于规范的最短可能再现。手写的解析器通常至少是两倍大。此外,自动解析器 (/lexer) 附带了大量诊断代码和逻辑,可以帮助您进行调试。
如果您的语言或需求发生变化,类似 BNF 语言的解析器/词法分析器规范也更容易更改。如果您正在处理手写的解析器/词法分析器,您可能需要深入研究代码并进行重大更改。
最后,因为它们通常被实现为没有回溯的有限状态机(Bison 上有无数的选项,所以这并不总是给定的),所以您的自动生成的代码很可能比您的手动编码产品更有效。
I recently undertook writing a simple lexer and parser.
It turned out that the lexer was simpler to code by hand. But the parser was a little more difficult. My Bison-generated parser worked almost right off the bat, and it gave me a lot of helpful messages about where I had forgotten about states. I later wrote the same parser by hand but it took a lot more debugging before I had it working perfectly.
The appeal of generating tools for lexers and parsers is that you can write the specification in a clean, easy-to-read language that comes close to being a shortest-possible rendition of your spec. A hand-written parser is usually at least twice as big. Also, the automated parser (/lexer) comes with a lot of diagnostic code and logic to help you get the thing debugged.
A parser/lexer spec in BNF-like language is also a lot easier to change, should your language or requirements change. If you're dealing with a hand-written parser/lexer, you may need to dig deeply into your code and make significant changes.
Finally, because they're often implemented as finite state machines without backtracking (gazillions of options on Bison, so this is not always a given), it's quite possible that your auto-generated code will be more efficient than your hand-coded product.
其他人已经为您编写并调试了它们?
Somebody else has already written and DEBUGGED them for you?
它更容易,而且更通用。 Bison/Lex 可以对任意语法进行分词和解析,并以更简单的格式呈现它。它们也可能会更快,具体取决于您编写正则表达式的程度。
我不想用 C 语言编写自己的解析器,因为该语言对字符串没有很好的直觉。如果你自己编写,我会推荐 perl 以便于正则表达式(或者可能是 python)。
使用现有工具可能会更快,但它可能会也可能不会那么有趣。如果你有时间并且只是为了学习,那就去吧。 C++ 是一种很好的入门语言。
Its easier and they are more general. Bison/Lex can tonkenize and parse arbitrary grammar and present it in what may be an easier format. They might be faster as well, depending on how well you write your regex.
I wouldn't want to write my own parser in C since the language doesn't have great intuition about strings. If you write your own, I would recommend perl for ease of regex (or possibly python).
It is probably faster to use existing tools, but it may or may not be as much fun. If you have time and since it is just for learning, go for it. C++ is a good language to start with.
不同的人有不同的打法。我个人喜欢递归下降解析器 - 我发现它们很容易理解,并且您可以使它们生成比 bison 等工具生成的更好的最终用户错误消息。
Different strokes for different folks. I personally like recursive descent parsers - I find them easy to understand and you can make them produce superior end-user error messages to those produced by tools like bison.