yacc/lex 还是手动编码?
我正在研究新的编程语言,但我总是对每个人都使用 yaxx/lex 来解析代码这一事实感到困惑,但我不是。
我的编译器(已经可以工作)是用 C++/STL 手工编码的,我不能说它很复杂或花费了太多时间。它同时具有某种词法分析器和解析器,但它们不是自动生成的。
早些时候,我用同样的方式编写了一个 C 编译器(不是完整的规范)——它能够一次性编译程序,所有这些反向引用都可以解析 & 。预处理——这对于 yacc/lex 来说绝对是不可能的。
我只是无法说服自己放弃所有这些,并开始深入研究 yaxx/lex - 这可能需要相当多的努力来实现,并且可能会引入一些语法限制。
不使用 yacc/lex 时我会错过什么吗?难道我做了坏事吗?
I am working on new programming language, but I was always puzzled by the fact that everyone is using yaxx/lex to parse the code, but I am not.
My compiler (which is already working) is handcoded in C++/STL, and I cannot say it's complex or took too much time. It has both some kind of lexer and parser, but they are not autogenerated.
Earlier, I wrote a C compiler(not full spec) the same way - it was able to compile the program in 1 pass, with all these back references resolving & preprocessing - this is definitely impossible with yacc/lex.
I just cannot convince myself to scrap all this, and start diving into yaxx/lex - which might need quite an effort to implement and might possibly introduce some grammar limitations.
Is there something I miss when not using yacc/lex? Do I do an evil thing?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
使用任何类型的词法分析器/解析器生成器的主要优点是,如果您的语言不断发展,它可以为您提供更大的灵活性。在手动编码的词法分析器/解析器中(特别是如果您在一次传递中混合了很多功能!),对语言的更改很快就会变得令人讨厌,而使用解析器生成器进行更改,重新运行发电机,继续你的生活。总是手工编写所有内容当然没有固有的技术限制,但我认为自动化消除无聊的部分的可进化性和可维护性是值得的!
The main advantages of using any kind of lexer/parser generator is that it gives you a lot more flexibility if your language evolves. In a hand-coded lexer/parser (especially if you've mixed in a lot of functionality in a single pass!), changes to the language get nasty fairly quickly, whereas with a parser generator you make the change, re-run the generator, and move on with your life. There are certainly no inherent technical limitations to always just writing everything by hand, but I think the evolvability and maintainability of automating away the boring bits is worth it!
Yacc 在某些方面不灵活:
此外,我注意到 lex/yacc 目标代码通常比手写的递归下降解析器大(来源代码往往是相反的)。
我没有使用过 ANTLR,所以我不能说这在这些方面是否更好。
Yacc is inflexible in some ways:
Furthermore, I have noticed that lex/yacc object code is often bigger than a hand-written recursive descent parser (source code tends to be the other way round).
I have not used ANTLR so I cannot say if that is better at these points.
使用生成器的另一个巨大优势是,它们可以保证准确地处理您在语法中指定的语言。任何手写代码都不能这么说。 LR/LALR 变体也保证是 O(N),这同样你不能断言任何手工编码,至少在构建证明时不需要付出很大的努力。
两者我都写过,也都和它们一起生活过,我再也不会手工编写代码了。我之所以这么做是因为当时平台上没有yacc。
The other huge advantage of using generators is that they are guaranteed to process exactly and only the language you specified in the grammar. You can't say that of any hand-written code. The LR/LALR variants are also guaranteed to be O(N), which again you can't assert about any hand coding, at least not without a lot of effort in constructing the proof.
I've written both and lived with both and I would never hand-code again. I only did that one because I didn't have yacc on the platform at the time.
也许您错过了 ANTLR,它对于可以使用递归下降解析策略定义的语言很有用。
使用 Yacc/Lex 可能有一些优点,但并不是强制使用它们。使用 Yacc/Lex 也有一些缺点,但优点通常大于缺点。特别是,维护 Yacc 驱动的语法通常比手动编码的语法更容易,并且您可以从 Yacc 提供的自动化中受益。
然而,从头开始编写自己的解析器并不是一件坏事。它可能会使将来的维护变得更加困难,但也可能使其变得更容易。
Maybe you are missing out on ANTLR, which is good for languages that can be defined with a recursive-descent parsing strategy.
There are potentially some advantages to using Yacc/Lex, but it is not mandatory to use them. There are some downsides to using Yacc/Lex too, but the advantages usually outweigh the disadvantages. In particular, it is often easier to maintain a Yacc-driven grammar than a hand-coded one, and you benefit from the automation that Yacc provides.
However, writing your own parser from scratch is not an evil thing to do. It may make it harder to maintain in future, but it may make it easier, too.
这当然取决于您的语言语法的复杂性。简单的语法意味着简单的实现,您可以自己完成。
看看可能是最糟糕的例子:C++ :)(除了自然语言之外,还有人知道另一种语言吗?这种语言更难正确解析?)即使使用像 Antlr 这样的工具,也很难正确解析,尽管这是可以管理的。另一方面,尽管难度要大得多,但似乎一些最好的 C++ 解析器(例如 GCC 和 LLVM)也大多是手写的。
如果您不需要太多的灵活性并且您的语言也不是太简单,那么使用 Antlr 肯定会节省一些工作/时间。
It certainly depends on the complexity of your language grammar. An easy grammar means that there is an easy implementation and you can just do it yourself.
Take a look at maybe the worst possible example at all: C++ :) (Does anybody knows another language, besides natural languages, which are more difficult to parse correctly?) Even with tools like Antlr, is it quite difficult to get it right, though it is manageable. Whereby on the other side, even while being much harder, it seems that some of the best C++ parsers, e.g. GCC and LLVM, are also mostly handwritten.
If you don't need too much flexibility and your language is not too trivial, you will certainly safe some work/time by using Antlr.