寻找“分词器”、“解析器”的明确定义。 和“词法分析器” 它们是如何相互关联和使用的?
我正在寻找“分词器”、“解析器”和“词法分析器”的明确定义以及它们如何相互关联(例如,解析器是否使用分词器,反之亦然)? 我需要创建一个程序将通过 c/h 源文件来提取数据声明和定义。
我一直在寻找示例并且可以找到一些信息,但我真的很难掌握语法规则、解析树和抽象语法树等基本概念以及它们如何相互关联。 最终这些概念需要存储在实际的程序中,但是 1)它们是什么样的,2)是否有常见的实现。
我一直在查看有关这些主题和程序(例如 Lex 和 Yacc)的维基百科,但从未上过编译器课程(EE 主修),我发现很难完全理解正在发生的事情。
I am looking for a clear definition of what a "tokenizer", "parser" and "lexer" are and how they are related to each other (e.g., does a parser use a tokenizer or vice versa)? I need to create a program will go through c/h source files to extract data declaration and definitions.
I have been looking for examples and can find some info, but I really struggling to grasp the underlying concepts like grammar rules, parse trees and abstract syntax tree and how they interrelate to each other. Eventually these concepts need to be stored in an actual program, but 1) what do they look like, 2) are there common implementations.
I have been looking at Wikipedia on these topics and programs like Lex and Yacc, but having never gone through a compiler class (EE major) I am finding it difficult to fully understand what is going on.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
分词器通常通过查找空白(制表符、空格、换行符)将文本流分解为标记。
词法分析器基本上是一个标记生成器,但它通常为标记附加额外的上下文——这个标记是一个数字,那个标记是一个字符串文字,另一个标记是一个相等运算符。
解析器从词法分析器获取标记流,并将其转换为代表由原始文本表示的(通常)程序的抽象语法树。
最后我查了一下,关于这个主题的最好的书是“编译器:原理、技术和工具”通常被称为“龙之书”。
A tokenizer breaks a stream of text into tokens, usually by looking for whitespace (tabs, spaces, new lines).
A lexer is basically a tokenizer, but it usually attaches extra context to the tokens -- this token is a number, that token is a string literal, this other token is an equality operator.
A parser takes the stream of tokens from the lexer and turns it into an abstract syntax tree representing the (usually) program represented by the original text.
Last I checked, the best book on the subject was "Compilers: Principles, Techniques, and Tools" usually just known as "The Dragon Book".
示例:
词法分析器或分词器会将其拆分为标记“int”、“x”、“=”、“1”、“;”。
解析器将获取这些标记并使用它们以某种方式进行理解:
Example:
A lexer or tokeniser will split that up into tokens 'int', 'x', '=', '1', ';'.
A parser will take those tokens and use them to understand in some way:
我想说词法分析器和分词器基本上是同一件事,它们将文本分解为其组成部分(“标记”)。 然后解析器使用语法解释标记。
不过,我不会太在意精确的术语用法——人们经常使用“解析”来描述解释一大堆文本的任何动作。
I would say that a lexer and a tokenizer are basically the same thing, and that they smash the text up into its component parts (the 'tokens'). The parser then interprets the tokens using a grammar.
I wouldn't get too hung up on precise terminological usage though - people often use 'parsing' to describe any action of interpreting a lump of text.
(添加到给定的答案)
(adding to the given answers)