我需要执行哪些步骤才能完成此编程作业?
我很难理解我应该做什么。我唯一想到的是我需要在 cminus.y 文件上使用 yacc 。之后的一切我都完全困惑了。有人可以以不同的方式向我解释这一点,以便我能够理解我需要做什么吗?
简介:
我们将使用 lex/flex 和 yacc/Bison 生成 LALR 解析器。我会给你一个名为 cminus.y 的文件。这是一个 yacc 格式的语法文件,用于一种称为 C-minus 的简单类 C 语言,来自 Kenneth C. Louden 的《编译器构造》一书。我认为语法应该相当明显。 Yahoo 小组有一些关于如何使用 yacc 的描述的链接。现在您已经了解了 flex,学习 yacc 应该相当容易了。唯一的基本类型是 int。一个int是4个字节。布尔值被作为 int 处理,就像在 C 中一样。(实际上,语法允许您将变量声明为 void 类型,但我们不要这样做。)您可以拥有一维数组。 没有指针,但对数组元素的引用应被视为指针(如在 C 中)。 该语言提供赋值、IF-ELSE、WHILE 以及函数调用和返回。 我们希望我们的编译器输出 MIPS 汇编代码,然后我们就能够在 SPIM 上运行它。对于像这样没有优化的简单编译器,IR 应该是不必要的。我们可以直接一次性输出汇编代码。然而,我们的第一步是生成一个符号表。
符号表:
我喜欢 Barrett 博士的方法,它使用大量指针来处理不同类型的对象。本质上,符号表的元素是标识符、类型和指向属性对象的指针。属性对象的结构根据类型的不同而不同。我们只需要处理少量类型。我建议至少在开始时使用线性搜索来查找表中的符号。如果您想要更好的性能,可以稍后将其更改为散列。 (如果你想保留在C语言中,你可以使用malloc动态分配对象。) 首先,您需要列出所有不同类型的符号(数量不多)以及每种符号需要哪些属性。请务必允许添加新属性,因为我们 还没有涵盖所有问题。从语法上看,函数的参数列表问题是设计中需要思考的地方。我建议更多的符号表条目和指针。
测试:
语法是正确的,因此按原样采用现有语法并生成解析器,解析器将接受正确的 C-减号程序,但不会产生任何输出,因为没有与规则关联的代码片段。 我们想要添加代码片段来构建符号表并打印信息。 当声明标识符时,您应该打印输入到符号表中的信息。如果在同一范围内找到相同符号的先前声明,则应打印一条错误消息。 当引用标识符时,您应该在表中查找它以确保它存在。如果尚未在当前作用域中声明,则应打印错误消息。 关闭作用域时,应针对未引用的标识符生成警告。 您的测试输入应该是正确形成的 C-minus 程序,但此时大多数生产规则不会发生任何变化。
范围:
最基本的方法有一个全局范围和每个声明的函数的范围。 该语言允许在任何复合语句内进行声明,即范围嵌套。实现这一点需要某种范围编号或堆叠方案。 (堆叠最适合一次性 编译器,这就是我们正在构建的。)
I'm having a hard time understanding what I'm supposed to do. The only thing I've figured out is I need to use yacc on the cminus.y file. I'm totally confused about everything after that. Can someone explain this to me differently so that I can understand what I need to do?
INTRODUCTION:
We will use lex/flex and yacc/Bison to generate an LALR parser. I will give you a file called cminus.y. This is a yacc format grammar file for a simple C-like language called C-minus, from the book Compiler Construction by Kenneth C. Louden. I think the grammar should be fairly obvious.
The Yahoo group has links to several descriptions of how to use yacc. Now that you know flex it should be fairly easy to learn yacc. The only base type is int. An int is 4 bytes. Booleans are handled as ints, as in C. (Actually the grammar allows you to declare a variable as a type void, but let's not do that.) You can have one-dimensional arrays.
There are no pointers, but references to array elements should be treated as pointers (as in C).
The language provides for assignment, IF-ELSE, WHILE, and function calls and returns.
We want our compiler to output MIPS assembly code, and then we will be able to run it on SPIM. For a simple compiler like this with no optimization, an IR should not be necessary. We can output assembly code directly in one pass. However, our first step is to generate a symbol table.
SYMBOL TABLE:
I like Dr. Barrett’s approach here, which uses a lot of pointers to handle objects of different types. In essence the elements of the symbol table are identifier, type and pointer to an attribute object. The structure of the attribute object will differ according to the type. We only have a small number of types to deal with. I suggest using a linear search to find symbols in the table, at least to start. You can change it to hashing later if you want better performance. (If you want to keep in C, you can do dynamic allocation of objects using malloc.)
First you need to make a list of all the different types of symbols that there are—there are not many—and what attributes would be necessary for each. Be sure to allow for new attributes to be added, because we
have not covered all the issues yet. Looking at the grammar, the question of parameter lists for functions is a place where some thought needs to be put into the design. I suggest more symbol table entries and pointers.
TESTING:
The grammar is correct, so taking the existing grammar as it is and generating a parser, the parser will accept a correct C-minus program but it won’t produce any output, because there are no code snippets associated with the rules.
We want to add code snippets to build the symbol table and print information as it does so.
When an identifier is declared, you should print the information being entered into the symbol table. If a previous declaration of the same symbol in the same scope is found, an error message should be printed.
When an identifier is referenced, you should look it up in the table to make sure it is there. An error message should be printed if it has not been declared in the current scope.
When closing a scope, warnings should be generated for unreferenced identifiers.
Your test input should be a correctly formed C-minus program, but at this point nothing much will happen on most of the production rules.
SCOPING:
The most basic approach has a global scope and a scope for each function declared.
The language allows declarations within any compound statement, i.e. scope nesting. Implementing this will require some kind of scope numbering or stacking scheme. (Stacking works best for a one-pass
compiler, which is what we are building.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
(免责声明)我对编译器类没有太多经验(如学校关于编译器的课程),但我的理解如下:
1)您需要使用提到的工具来创建一个解析器,当给定输入时,它会告诉用户如果根据 cminus.y 中定义的语法,输入是正确的程序。我从未使用过 yacc/bison 所以我不知道它是如何完成的,但这似乎是完成的:
2)输出似乎还需要检查变量一致性(即,不能使用未声明为与任何编程语言相同的变量),这是通过符号表完成的。简而言之,每次声明某些内容时,您都会将其添加到符号表中。当您遇到标识符时,如果它不是语言标识符之一(例如
if
或while
或for
),您会查找它在符号表中确定它是否已被声明。如果存在,请继续。如果不是 - 打印某种错误注意:点(2)有一个符号表的简化版本;实际上,它们的内容比我刚刚写的要多,但这应该可以帮助您开始。
我将从 yacc 示例开始 - 看看 yacc 可以做什么以及它是如何做的。我想一定有一些带有符号表的完整示例,您可以阅读以进一步理解。
示例:
让我们采用输入 A:
和输入 B:
并假设我们使用 C 语法进行解析。你的解析器应该认为输入 A 是正确的,但对于输入 B 应该大喊“b 未声明”。
(disclaimer) I don't have much experience with compiler classes (as in school courses on compilers) but here's what I understand:
1) You need to use the tools mentioned to create a parser which, when given input will tell the user if the input is a correct program as to the grammar defined in cminus.y. I've never used yacc/bison so I don't know how it is done, but this is what seems to be done:
2) It also seems that the output needs to check for variable consistency (ie, you can't use a variable you haven't declared same as any programming language), which is done via a symbol table. In short, every time something is declared you add it to the symbol table. When you encounter an identifier, if it is not one of the language identifiers (like
if
orwhile
orfor
), you'll look it up in the symbol table to determine if it has been declared. If it is there, go on. If it's not - print some-sort-of-errorNote: point(2) there is a simplified take on a symbol table; in reality there's more to them than I just wrote but that should get you started.
I'd start with yacc examples - see what yacc can do and how it does it. I guess there must be some big example-complete-with-symbol-table out there which you can read to understand further.
Example:
Let's take input A:
And input B:
and assume we're using C syntax for parsing. Your parser should deem Input A all right, but should yell "b is undeclared" for Input B.