编译器 - AST 中类型声明的指令选择

发布于 2024-12-16 22:58:28 字数 322 浏览 5 评论 0原文

我正在学习编译器并为处理两种类型的简单语言创建代码生成器：字符和整数。

当用户输入被扫描器扫描并被解析器解析后，我得到了输入的 AST 表示。我为一种更简单的语言生成了代码，它只处理带有整数、运算符和变量的表达式。

然而，使用这种新语言，我有时会得到一个类型声明的子树，如下所示：

(IS TYPE (x) (INT))

它表示 x 的类型为 INT。

我的代码生成器中是否应该有一个处理这些类型声明的案例？或者这只是为了语义分析器进行类型检查，所以我应该假设类型已被检查并忽略树的这一部分并简单地为 x 分配值？

原文

I'm learning compilers and creating a code generator for a simple language that deals with two types: characters and integers.

After the user input has been scanned by the scanner and then parsed by the parser, I get an AST representation of the input. I have made a code generation for an even simpler language which only processes expressions with integers, operators and variables.

However with this new language I sometimes get a subtree for a type declaration, like this:

(IS TYPE (x) (INT))

which says x is of type INT.

Should there be a case in my code generator which deals with these type declarations? Or is this simply for the semantic analyzer to type check, so I should just assume the types have been checked and ignore this part of the tree and simply assign the value for x?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

动次打次papapa 2024-12-23 22:58:28

这两种情况都是可能的，您需要更多地描述您的语言，看看您是否真的需要将该功能添加到您的代码生成器中，或者因为不必要而跳过它，并避免在设计编程语言这个困难而有趣的主题上进行额外的工作。

您的“代码生成器”是一个以一种编程语言（可能是小型语言）接收输入代码并以另一种编程语言（可能是小型语言）输出代码的程序吗？

这个工具通常被称为“翻译器”。

您的“代码生成器”是一个接收编程语言作为输入并输出像编程语言一样的汇编程序/字节码的程序吗？

该工具通常称为“编译器”。

注意：“堆”是“堆栈”的同义词。

通常 AST 存储操作或函数调用的类型。例如，在 c 中：

...
int a = 3;
int b = 5;
float c = (float)(a * b);
...

最后一行生成与此类似的 AST（跳过其他行的 AST）：

..................................................................
..................................................................
......................+--------------+............................
......................|    [root]    |............................
......................| (no type) =  |............................
......................+------+-------+............................
.............................|....................................
.................+-----------+------------+.......................
.................|........................|.......................
...........+-----+-----+....+-------------+-------------+.........
...........| (int) c   |....| (float) (cast operation)  |.........
...........+-----------+....+-------------+-------------+.........
..........................................|.......................
....................................+-----+-----+.................
....................................| (int) ()  |.................
....................................+-----+-----+.................  
..........................................|.......................
....................................+-----+-----+.................
....................................| (int) *   |.................
....................................+-----+-----+.................
..........................................|.......................
..............................+-----------+-----------+...........
..............................|.......................|...........
........................+-----+-----+...........+-----+-----+.....
........................| (int)  a  |...........| (float) b |.....
........................+-----------+...........+-----------+.....
..................................................................
..................................................................

请注意，“(float)”将其转换为运算符或函数，
与你的问题类似。

祝你好运。

Both situations are possible, you need to describe more about your language, to see if you really need to add that feature to your code generator, or skip it as unnecessary, and avoid extra work with this difficult and interesting topic of designing a programming language.

Is you "code generator" a program that recieves as an input code in a programming language (maybe small one) and outputs code in another programming language (maybe small one) ?

This tool is usually called a "translator".

Is you "code generator" a program that receive as an input a programming language and outputs assembler / bytecode like programming language ?

This tool is usually called a "compiler".

Note: "pile" is a synonym for "stack".

Usually an A.S.T., stores the type of an operation, or function call. Example, in c:

...
int a = 3;
int b = 5;
float c = (float)(a * b);
...

The last line, generates an A.S.T. similar to this, (skip A.S.T. for other lines):

..................................................................
..................................................................
......................+--------------+............................
......................|    [root]    |............................
......................| (no type) =  |............................
......................+------+-------+............................
.............................|....................................
.................+-----------+------------+.......................
.................|........................|.......................
...........+-----+-----+....+-------------+-------------+.........
...........| (int) c   |....| (float) (cast operation)  |.........
...........+-----------+....+-------------+-------------+.........
..........................................|.......................
....................................+-----+-----+.................
....................................| (int) ()  |.................
....................................+-----+-----+.................  
..........................................|.......................
....................................+-----+-----+.................
....................................| (int) *   |.................
....................................+-----+-----+.................
..........................................|.......................
..............................+-----------+-----------+...........
..............................|.......................|...........
........................+-----+-----+...........+-----+-----+.....
........................| (int)  a  |...........| (float) b |.....
........................+-----------+...........+-----------+.....
..................................................................
..................................................................

Note that the "(float)" cast its like an operator or a function,
similar to your question.

Good Luck.

回复收藏 0 原文

忘东忘西忘不掉你 2024-12-23 22:58:28

如果这是一个声明

(IS TYPE (x) (INT))

，那么 x 应该被放置在内存中。对于 C 语言和自动变量，局部自动变量分配在堆栈上。要分配所需的堆栈大小，您应该知道所有局部变量的大小和大小来自类型。

如果此变量存储在寄存器中，则应选择所需大小的寄存器（考虑 x86 的：AL、AX、EAX、RAX - 具有不同大小的相同寄存器）（如果您的目标有这样的寄存器）。

此外，当 AST 中存在不明确的操作时，需要类型，它可以对不同的数据大小进行操作（例如 char、short、int - 或 8 位、16 位、32 位等）。对于某些汇编器，数据的大小被编码到指令本身中；所以 codegen 应该记住变量的大小。

或者，如果 AST 中没有记录操作类型，则 ADD：

(ADD (x) (y))

可能意味着 float 和 int 加法（ADD 或 FADD 指令），因此 x 和 y 的类型在代码生成中需要选择正确的变体。

If this is a declaration

(IS TYPE (x) (INT))

then x should be laid out in memory. In the case of C and automatic variables, local auto variables are allocated on stack. To allocate needed size of stack you should know sizes of all local vars and sizes are from types.

If this variable is stored in a register, you should select a register of needed size (think about x86 with: AL, AX, EAX, RAX - the same register with different sizes), if your target has such.

Also, type is needed when there is an ambiguous operation in AST, which can operate on different data sizes (e.g. char, short, int - or 8-bit, 16-bit, 32-bit, etc). And for some assemblers, size of data is encoded into instruction itself; so codegen should remember sizes of variables.

Or, if the type of operation was not recorded in AST, the ADD: