二元运算的IR树表示

发布于 2024-08-27 08:04:05 字数 533 浏览 3 评论 0原文

我有一个简单的操作，如下所示：

k + 1

它的 IR 树表示是什么？
到目前为止，我想出了：

BINOP(+, MEM(NAME(k)), CONST(1))

但我不确定我是否需要 NAME 的 MEM。欢迎任何帮助。

我们使用的符号与此 pdf 中的符号完全相同： http://www.computing.dcu.ie/~ Hamilton/teaching/CA449/notes/translate.pdf

来自Java 中的现代编译器实现 书。

原文

I have a simple operation like so:

k + 1

What would be a IR tree representation of it?
So far I came up with:

BINOP(+, MEM(NAME(k)), CONST(1))

but I am not sure if I need the MEM for NAME or not. Any help welcome.

The notation we use is exactly the same as in this pdf:
http://www.computing.dcu.ie/~hamilton/teaching/CA449/notes/translate.pdf

It is from modern compiler implementation in java book.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

晚雾 2024-09-03 08:04:05

嗯，这实际上取决于您的 IR 的表示方式。另外，正如指出 ://stackoverflow.com/users/257090/mrjoltcola">mrjoltcola，你所建议的看起来更像是一个AST（抽象语法树）。 AST 级别更高，通常是直接从源生成的树结构。 IR 通常更简单、更底层（代码通常表示为简单指令列表，而不是表达式树）。 AST 通常对于编译器来说是高度特定的。 IR 通常与编译器和/或语言无关（例如，请参见 LLVM）。

如果你要用文本来表示它，我认为那

+(k,1)

会更简单，因为无论如何你都需要一个相当复杂的词法分析器/解析器，这对人类来说更具可读性（尽管可读性稍差一些）对于计算机）。不需要像 MEM 或 NAME 这样的东西，因为在这种情况下，标识符显然是一个变量访问（它确实取决于您的语言和结构）不过编译器）。不过，我绝对不会使用像 BINOP 这样的东西，因为它实际上只会使代码变得复杂（您仍然必须与其他二进制操作分开处理加法）。

如果你只是记住它，这取决于语言。在 Haskell 中，我会做这样的事情：

data Expr = Const Int | Variable String | Add Expr Expr | ...

然后你的例子是：

Add (Variable "k") (Const 1)

在 C++/C#/Java 中，你可能会使用类来模拟代数数据类型。

例如，在粗略的 C++ 伪代码中：

class Expr {};
class Const : Expr {int v; Const(int v) : v(v) {}};
class Variable : Expr {string n; Variable(string n) : n(n) {}};
class Add : Expr {Expr a, b; Add(Expr a, Expr b) : a(a), b(b) {}};

...

Add(Variable("k"), Const(1))

Well, it really depends on how your IR is represented. Also, as pointed out by mrjoltcola, what you have suggested looks more like an AST (abstract syntax tree). An AST is more high level, and is generally a tree structure generated directly from the source. An IR is generally much simpler and more low-level (and code is usually represented as a list of simple instructions, not a tree of expressions). ASTs are generally highly specific to the compiler. An IR is often compiler and/or language independent (see for example LLVM).

If you're are going to represent it in text, I think that

+(k,1)

would be simpler, because you're going to need a fairly complex lexer/parser anyways, and this would be a bit more readable for humans (although a bit less readable for computers). There is no need to have things like MEM or NAME, because an identifier in that case would obviously be a variable access (it does depend on the language and the structure of your compiler though). I definitely wouldn't use something like BINOP though, because it really only complicates the code (you still have to handle addition separately from the other binary operations).

If you are just keeping it in memory, it depends on the language. In Haskell, I would do something like this:

data Expr = Const Int | Variable String | Add Expr Expr | ...

and then your example would be:

Add (Variable "k") (Const 1)

In C++/C#/Java, you would probably use classes to emulate the algebraic data types.

For example, in rough C++-ish pseudocode:

class Expr {};
class Const : Expr {int v; Const(int v) : v(v) {}};
class Variable : Expr {string n; Variable(string n) : n(n) {}};
class Add : Expr {Expr a, b; Add(Expr a, Expr b) : a(a), b(b) {}};

...

Add(Variable("k"), Const(1))

回复收藏 0 原文

挽手叙旧 2024-09-03 08:04:05

我唯一能做的就是阅读作者 (Appel) 的 IR 树语言规则。

对于语法：

  <ADDEXPR> ::= <EXPR> + <EXPR>

  <EXPR> ::= IDENTIFIER
           | LITERAL

AST 树可能是：

  BINOPEXPR(+, EXPR(IDENTIFIER(k)), EXPR(LITERAL(1)))

或者可能包括 IdentifierExpression 和 LiteralExpression，所以它可能是：

  BINOPEXPR(+, IDENT_EXPR(k), LITERAL_EXPR(1))

但 AST 和 IR 树是不同的东西。所以根据 Appel 的书，在 C 版本的 7.1 节中：

NAME(n) = 对应于汇编语言标签的 Sumbolic 常量 n。

MEM(e) = 从地址 e 开始的内存的 wordSize 字节的内容。当 MEM() 用作 MOVE() 的左子级时，它意味着“存储”，但在其他任何地方它意味着“获取”。

LABEL(n) = 将名称 n 的常量值定义为当前机器代码地址。这就像汇编语言中的标签定义。值 NAME(n) 可能是跳转、调用等的目标。

根据他的规则，NAME() 不是您想要的变量 k，名称用于代码标签。

根据阅读他的内容，我的猜测是，如果 k 是一个变量，在本例中它是一个 r 值，那么您只需使用 MEM(e) 但 e 可以是局部变量（在局部堆栈帧中），也可以是全局变量，甚至是临时变量。因此“k + 1”的翻译将取决于“k”的分配位置。如果它在本地堆栈帧中，则 k 为：

MEM(BINOP(+, TEMP fp, CONST (address of k)))

因此 k + 1 为：

BINOP(+, MEM(BINOP(+, TEMP fp, CONST (address of k))), 1)

因此，您需要清楚如何为变量分配存储空间。这将出现在标识符的符号表中。

我至少有 20 本编译器书籍，老实说，我发现 Appel 的书令人困惑，而且在某些方面过于简短。他在概念上做出了学生无法遵循的飞跃，并且可以在某些地方使用更多的阐述。由于我从未真正实现过 IR 树，（我总是编写文本中间语言，并支持声明不同作用域的变量和符号临时变量的汇编程序）我无法确定他的意思，所以我建议与您的教授确认因为他以前可能用这本书教过并且可能更了解它。

希望有帮助。

The only thing I can do is read the author's (Appel) rules of his IR tree language.

For a grammar:

  <ADDEXPR> ::= <EXPR> + <EXPR>

  <EXPR> ::= IDENTIFIER
           | LITERAL

The AST tree might be:

  BINOPEXPR(+, EXPR(IDENTIFIER(k)), EXPR(LITERAL(1)))

Or might include IdentifierExpression and LiteralExpression, so it might be:

  BINOPEXPR(+, IDENT_EXPR(k), LITERAL_EXPR(1))

But AST and IR tree are different things. So according to Appel's book, in section 7.1 of the C version:

NAME(n) = Sumbolic constant n corresponding to an assembly language label.

MEM(e) = Contents of wordSize bytes of memory starting at address e. When MEM() isused as the left child of a MOVE(), it means "store", but anywhere else it means "fetch".

LABEL(n) = Define constant value of name n to be the current machine code address. This is like a label definition in assembly language. The value NAME(n) may be the target of jumps, calls, etc.

Given his rules, NAME() is not what you want there for a variable k, name is used for code labels.

My guess based on reading him is if k is a variable, and in this case it is an r-value, then you use simply MEM(e) but e could be either a local variable (in the local stack frame), or a global variable, or even a temporary. So the translation for "k + 1" will depend on where "k" is allocated. If its in the local stack frame, then k is:

MEM(BINOP(+, TEMP fp, CONST (address of k)))

So k + 1 would be:

BINOP(+, MEM(BINOP(+, TEMP fp, CONST (address of k))), 1)

So you need to be clear on how you are allocating storage for your variables. That will be in the symbol table for the identifier.

I've got at least 20 compiler books and honestly, I find Appel's book confusing and too brief in areas. He makes conceptual leaps that a student does not follow, and could use a bit more elaboration in places. Since I've never actually implemented IR trees, (I always write to a textual Intermediate Language with assembler support for both declaring variables of different scopes, and symbolic temporaries) I cannot say for sure what he means, so I recommend confirming with your professor as he has probably taught with the book before and may know it better.

Hope that helps.

回复收藏 0 原文

~没有更多了~