编写一个简单的编译器
我正在为一种简单语言编写一个编译器。
我制作了一个词法分析器/分词器,它接受一个文件并在 stdout
中打印标记。
现在我想做语法分析,但我不知道如何修改我的词法分析器以便将标记作为输入。
- 链表对于大文件来说效率极低(80MB 左右的源文件需要大约 1.3GB 的内存)
- 我可以修改我的词法分析器以在每次调用时给出下一个标记(想法取自 Dragon Book),但我不这样做知道如果在这个过程中的某个地方我必须返回并读取先前的标记,我会做什么。
做这些事情的正确方法是什么?
I am writing a compiler for a simple language.
I made a lexer/tokenizer that takes a file and prints the tokens in stdout
.
Now I want to to make the syntactical analysis, but I don't know how to modify my lexer in order to take the tokens as input.
- A linked list is extremely inefficient for large files (source files around 80MB take about 1.3GB of ram)
- I could modify my lexer to give the next token every time it is called (idea taken from the Dragon Book), but I don't know what I will do if somewhere in the process I have to go back and read a previous token.
What is the right way to do these things?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在词法分析器中实现
nextToken()
方法是标准方法。此方法由解析器(或语法分析器)调用,直到消耗掉整个输入。这通常情况并非如此。但是,解析器可能需要做的是“推回”已经看到的令牌(或多个令牌,具体取决于解析器的前瞻)。在这种情况下,词法分析器提供了一个
pushBack(Token)
,它确保下一次调用nextToken()
将返回提供的令牌,而不是输入中出现的下一个令牌。Implementing a
nextToken()
method in the lexical analyser is the standard way. This method is called by the parser (or syntax analyser) until the entire input has been consumed.This is not usually the case. But, what the parser may need to do is 'push back' a token (or a number of tokens depending on the lookahead of the parser) which has already been seen. In this case the lexer provides a
pushBack(Token)
which ensures that the next call tonextToken()
will return the supplied token, rather than the next token appearing in the input.这听起来你的匹配太贪婪了。
您可以研究回溯
It sounds like your matches are too greedy.
You might look into Backtracking