解析器生成器和 Ragel...制作我自己的 D 解析器
我是编译器世界的新手,最近听说了一种叫做解析器生成器的东西。根据我(认为)我的理解,解析器生成器接受语法文件并输出可以使用给定语法解析文件的源代码文件。
有几个问题:
我理解正确吗?
如果是这样,Ragel 是这样一个工具吗?
如果是,Ragel 能否将 D 解析器输出到 D 源代码中?
谢谢你!
I'm new to the world of compilers, and I recently heard about something called a parser generator. From what I (think) I've understood, parser generators take in a syntax file and output a source code file that can parse files with the given syntax.
A few questions:
Did I understand that correctly?
If so, is Ragel such a tool?
If it is, can Ragel output a D parser into D source code?
Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
基本上就是这样。解析器生成器将语法转换为源文件,该源文件可用于识别属于语言 由语法定义。通常(但并非总是),解析器生成器需要词法分析器在执行工作之前将文本分解为标记。 Lex 和 Yacc 是配对词法分析器和解析器生成器的经典示例。
现代解析器生成器提供了额外的功能。例如,ANTLR可以生成词法分析、语法分析的代码,甚至可以遍历生成的抽象语法树。 Elkhound 生成一个使用 GLR 解析算法。这使得它比非广义解析算法能够识别更广泛的语言。 PEG 解析器不需要单独的词法分析器。
Ragel 实际上生成了一个有限状态机形式的词法分析器。它可以识别常规语言,但不能识别上下文无关语言。这意味着它无法识别大多数编程语言,包括 D。
如果您需要快速词法分析器,Ragel 确实会生成 D 代码。
要完全理解解析器生成器为您做什么,您需要一些形式语言和解析理论。还有比龙之书更糟糕的起点。另请参阅:学习编写编译器。
如果您足够勇敢,请务必查看随 DMD 编译器分发的词法分析和解析代码 - /dmd2/src/dmd/ - lexer.c 和 parse.c。
That's basically it. Parser generators transform a grammar into a source file that can be used to recognize strings that are members of the language defined by the grammar. Often, but not always, a parser generator requires a lexical analyzer to break text down into tokens before it does its work. Lex and Yacc are classic examples of a paired lexical analyzer and parser generator.
Modern parser generators offer additional features. For instance, ANTLR can generate code for lexical analysis, grammatical analysis, and even walk the generated abstract syntax tree. Elkhound generates a parser that uses the GLR parsing algorithm. This allows it to recognize a wider range of languages than non-generalized parsing algorithms. PEG Parsers don't require a separate lexical analyzer.
Ragel actually generates a lexical analyzer in the form of a finite state machine. It can recognize a regular language but not a context-free language. This means it can't recognize most programming languages, including D.
Ragel does generate D code if you need a fast lexical analyzer.
To fully understand what a parser generator does for you, you'll need some formal language and parsing theory. There are worse places to start than the The Dragon Book. See also: Learning to write a compiler.
If you're feeling brave, be sure to check out the lexing and parsing code distributed with the DMD compiler - /dmd2/src/dmd/ - lexer.c and parse.c.
虽然 Ragel 基于正则表达式,但它不仅仅是一个正则表达式 FSM 生成器。它允许使用额外的调用/返回语法进行递归,以及允许解析非常规语言的其他功能。因此,虽然 Ragel 确实生成 FSM,但它允许生成多个不同的 FSM,并提供在任意点之间跳转的机制,或使用特殊的机器转换语法。它还允许在状态转换时执行任意代码。
Ragel 的另一个独特之处在于它是在线的。换句话说,它很容易用于从异步源(例如非阻塞套接字)扫描数据。它也不使用动态资源,除了对于调用/返回,您可以使用堆栈的静态、自动或动态内存;不管你想要什么。也不存在全局状态。
拉格尔是相当独特的。与大多数(全部?)传统生成器不同,它是为网络编程而设计的。
While Ragel is based on regular expressions, it's not just a regex FSM generator. It allows recursion using an additional call/return syntax, as well as other features which allow parsing non-regular languages. So while Ragel does generate FSMs, it allows generating multiple different FSMs and provides mechanisms for jumping between them at arbitrary points, or using a special machine transition syntax. It also allows executing arbitrary code at state transitions.
Another thing that makes Ragel unique is that it's online. In other words, it's easy to use to scan data from an asynchronous source, such as a non-blocking socket. It also uses no dynamic resources, except that for call/return you can use either static, automatic, or dynamic memory for the stack; however you want. There's no global state, either.
Ragel is quite unique. Unlike most (all?) traditional generators, it was made for network programming.
可能是:
MySourceCode --> (扫描仪)-->我的扫描仪数据文件
我的扫描仪数据文件 --> (解析器)-->我的解析器数据文件
我的解析器数据文件 --> (代码生成器) --> MyExecutableFile
或:
MySourceCode --> (ScannerAndParser) -->我的扫描仪和解析器数据文件
MyScannerAndParserDataFile --> (代码生成器) -->我的可执行文件
Could be:
MySourceCode --> (Scanner) --> MyScannerDataFile
MyScannerDataFile --> (Parser) --> MyParserDataFile
MyParserDataFile --> (CodeGenerator) --> MyExecutableFile
or:
MySourceCode --> (ScannerAndParser) --> MyScannerAndParserDataFile
MyScannerAndParserDataFile --> (CodeGenerator) --> MyExecutableFile