编写编程语言解析器的最佳实践

发布于 2024-07-14 10:47:09 字数 26 浏览 6 评论 0原文

在编写解析器时我应该遵循哪些最佳实践?

Are there any best practices that I should follow while writing a parser?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

酒浓于脸红 2024-07-21 10:47:09

公认的智慧是使用解析器生成器+语法,这似乎是个好建议,因为您使用的是严格的工具,并且可能会减少这样做的工作量和潜在的错误。

要使用解析器生成器,语法必须与上下文无关。 如果您正在设计要解析的语言,那么您可以控制它。 如果你不确定,那么如果你开始学习语法路线,可能会花费你很多努力。 即使它在实践中是上下文无关的,除非语法非常庞大,否则手动编写递归体面的解析器会更简单。

上下文无关不仅使解析器生成器成为可能,而且还使手工编码的解析器变得更加简单。 您最终得到的是每个短语一个(或两个)功能。 也就是说,如果您干净地组织和命名代码,并不比语法更难查看(如果您的 IDE 可以显示调用层次结构,那么您几乎可以看到语法是什么)。

优点: -

  • 更简单的构建
  • 更好的性能
  • 更好的输出控制
  • 可以应对小偏差,例如使用不是 100% 上下文无关的语法

我并不是说语法总是不合适,但通常好处很小,而且常常被忽视由成本和风险决定。

(我相信他们的论点似乎很有吸引力,并且对他们存在普遍偏见,因为这是一种表明一个人更有计算机科学素养的方式。)

The received wisdom is to use parser generators + grammars and it seems like good advice, because you are using a rigorous tool and presumably reducing effort and potential for bugs in doing so.

To use a parser generator the grammar has to be context free. If you are designing the languauge to be parsed then you can control this. If you are not sure then it could cost you a lot of effort if you start down the grammar route. Even if it is context free in practice, unless the grammar is enormous, it can be simpler to hand code a recursive decent parser.

Being context free does not only make the parser generator possible, but it also makes hand coded parsers a lot simpler. What you end up with is one (or two) functions per phrase. Which is if you organise and name the code cleanly is not much harder to see than a grammar (if your IDE can show you call hierachies then you can pretty much see what the grammar is).

The advantages:-

  • Simpler build
  • Better performance
  • Better control of output
  • Can cope with small deviations, e.g. work with a grammar that is not 100% context free

I am not saying grammars are always unsuitable, but often the benefits are minimal and are often out weighed by the costs and risks.

(I believe the arguments for them are speciously appealing and that there is a general bias for them as it is a way of signaling that one is more computer-science literate.)

南薇 2024-07-21 10:47:09

几点建议:

  • 了解语法 - 以合适的形式写下来
  • 选择正确的工具。 使用 Spirit2x 在 C++ 中执行此操作,或者选择外部解析器工具,如 antlr、yacc 或任何适合您的工具
  • 您需要解析器吗? 也许正则表达式就足够了? 或者也许破解一个 perl 脚本来达到这个目的? 编写复杂的解析器需要时间。

Few pieces of advice:

  • Know your grammar - write it down in a suitable form
  • Choose the right tool. Do it from within C++ with Spirit2x, or choose external parser tools like antlr, yacc, or whatever suits you
  • Do you need a parser? Maybe regexp will suffice? Or maybe hack a perl script to do the trick? Writing complex parsers take time.
段念尘 2024-07-21 10:47:09

不要过度使用正则表达式 - 虽然它们有其用处,但它们根本没有能力处理任何类型的真正解析。 你可以推动它们,但你最终会碰壁或最终陷入无法维护的混乱。 您最好找到一个可以处理更大语言集的解析器生成器。 如果您真的不想使用工具,您可以查看递归下降解析器 - 这是手动编写小型解析器的非常简单的模式。 它们不像大型解析器生成器那样灵活或强大,但它们的学习曲线要​​短得多。

除非您有非常严格的性能要求,否则请尝试将各个层分开 - 词法分析器读取各个标记,解析器将它们排列成树,然后语义分析检查所有内容并链接引用,然后是最后阶段输出任何内容正在制作中。 将逻辑的不同部分分开将使以后更容易维护。

Don't overuse regular expressions - while they have their place, they simply don't have the power to handle any kind of real parsing. You can push them, but you're eventually going to hit a wall or end up with an unmaintainable mess. You're better off finding a parser generator that can handle a larger language set. If you really don't want to get into tools, you can look at recursive descent parsers - it's a really simple pattern for hand-writing a small parser. They aren't as flexible or as powerful as the big parser generators, but they have a much shorter learning curve.

Unless you have very tight performance requirements, try and keep your layers separate - the lexer reads in individual tokens, the parser arranges those into a tree, and then semantic analysis checks over everything and links up references, and then a final phase to output whatever is being produced. Keeping the different parts of logic separate will make things easier to maintain later.

岁月静好 2024-07-21 10:47:09

首先阅读龙之书的大部分内容。

如果您知道如何构建解析器,解析器并不复杂,但它们并不是那种只要您投入足够时间就能最终实现的东西。 最好以现有的知识库为基础。 (否则就指望写了几十次就扔掉了)。

Read most of the Dragon book first.

Parsers are not complicated if you know how to build them, but they are NOT the type of thing that if you put in enough time, you'll eventually get there. It's way better to build on the existing knowledge base. (Otherwise expect to write it and throw it away a few dozen times).

病毒体 2024-07-21 10:47:09

是的。 尝试生成它,而不是编写。 考虑使用 yacc、ANTLR、Flex/Bison、Coco/R、GOLD 解析器生成器等。仅当现有解析器生成器都不满足您的需求时,才手动编写解析器。

Yep. Try to generate it, not write. Consider using yacc, ANTLR, Flex/Bison, Coco/R, GOLD Parser generator, etc. Resort to manually writing a parser only if none of existing parser generators fit your needs.

时光磨忆 2024-07-21 10:47:09
  • 选择正确类型的解析器,有时递归后代就足够了,有时您应该使用 LR 解析器(而且,LR 解析器有很多类型)。
  • 如果您有复杂的语法,请构建抽象语法树。
  • 尝试很好地识别词法分析器中的内容、语法的一部分以及语义问题。
  • 尝试使解析器尽可能减少与词法分析器实现的耦合。
  • 为用户提供良好的界面,以便他不知道解析器的实现。
  • Choose the right kind of parser, sometimes a Recursive Descendant will be enough, sometimes you should use an LR parser (also, there are many types of LR parsers).
  • If you have a complex grammar, build an Abstract Syntax Tree.
  • Try to identify very well what goes into the lexer, what is part of the syntax and what is a matter of semantics.
  • Try to make the parser the least coupled to the lexer implementation as possible.
  • Provide a good interface to the user so he is agnostic of the parser implementation.
我早已燃尽 2024-07-21 10:47:09

首先,不要尝试应用相同的技术来解析所有内容。 有许多可能的用例,从 IP 地址(一些临时代码)到 C++ 程序(需要具有符号表反馈的工业强度解析器),以及用户输入(需要非常频繁地处理)。快速)到编译器(通常可以花一点时间进行解析)。 如果您想要有用的答案,您可能需要指定您正在做什么。

其次,记住要解析的语法。 越复杂,规范就需要越正式。 尽量避免过于正式。

第三,这取决于你在做什么。

First, don't try to apply the same techniques to parsing everything. There are numerous possible use cases, from something like IP addresses (a bit of ad hoc code) to C++ programs (which need an industrial-strength parser with feedback from the symbol table), and from user input (which needs to be processed very fast) to compilers (which normally can afford to spend a little time parsing). You might want to specify what you're doing if you want useful answers.

Second, have a grammar in mind to parse with. The more complicated it is, the more formal the specification needs to be. Try to err on the side of being too formal.

Third, well, that depends on what you're doing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文