如何用编译器优化这个函数?
我一直在学习编译器和工具课程(这学期)。我已经阅读了中间代码生成,并且还看到了 DAG 表示的最优性。编译器清楚的一件事是,无论生成什么中间代码,都必须将其映射到系统的指令集,以便我们可以运行我们的程序。
假设我有一个针对特定架构(例如A)的编译器,其中两个数字之间的加法是ADD R1,R2,R3(来自A的指令集),其中R1-是目的地,R2,R3 是源。我已经映射了这些指令,也就是说,当我想要添加中间代码中表示的两个数字(无论其类型如何,为了简单起见)时,我将运行ADD操作码!。
假设新的架构已经上市,其中两个数字的加法具有不同的指令集,例如AD R1,R2,R3。现在显然我的编译器不会添加这些数字!
现在我的问题是,当我为我的编程语言编写编译器时,我必须添加所有架构及其指令集,以便我的编译器正确执行它需要执行的操作?如果是的话,有哪些方法可以优化这种效果?因为添加所有指令集几乎会降低我的性能。
如果我错了就对了!
I have been doing the course on Compiler and Tools (this semester). I have read till intermediate code generation and also saw DAG representation for optimality. One thing is clear with the compiler is that what ever the intermediate code have been produced, it have to be mapped to the Instruction set of the system, so that we can run our program.
Let us say that i have a write a compiler for the particular architecture(say A),where the addition between two numbers is ADD R1,R2,R3(from the A's instruction set) where R1-is the destination,R2,R3 are the sources. And i have mapped with these instruction, that is when i want to add two numbers(regardless of its type,for simplicity) which is represented in the intermediate code, i will run the ADD opcode!.
Assume that the new architecture have been arrived in the market, in which addition of two numbers have different Instruction set say AD R1,R2,R3. Now obviously my complier wont add the numbers!
Now my question is when i write my compiler for the my programming language, i have to add all the architecture with their instruction set, so that my compiler does correctly what it need to do? If so what all the methods there to Optimize this effect? Because adding all the instruction set would nearly down my performance.
Correct If I'm wrong!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您为特定指令集构建编译器,该指令集是所选“指令集架构 (ISA)”的子集。
(许多指令集都有 I/O 指令,但编译器几乎从不生成这些指令)。
可能有几种不同的处理器设计可以执行这种“指令集架构”
它将与您选择的特定指令子集一起使用。
实践中会发生三种进化事件。
您确定如果使用更多来自 ISA 的指令,您的编译器会更好。例如,您可能认为 MULTIPLY 指令将允许您的编译器生成比您过去用于乘法的子例程调用更快的代码。在这种情况下,您可以稍微扩展编译器。
ISA 的所有者(Intel、AMD、IBM...)向 ISA 添加了全新的指令集。例如,数据高速缓存行上的数据并行操作(“SIMD 指令”)。您可以决定将其中一些添加到您的编译器中。这个事件可能很困难,因为新的指令系列通常会对数据的布局和处理方式做出不同的假设。
您发现您想要处理一些完全不同的 ISA。在这种情况下,您将重建编译器的后端,因为指令集在存在哪些寄存器、如何使用它们等方面完全不同。
编译器构建者通常会构建分阶段运行的编译器。生成实际机器代码之前的最后阶段通常将程序表示为对相当低级数据的抽象操作集(例如,对固定字大小值的操作)以及相当标准的抽象操作(加法、乘法、比较、跳转、 CALL、STORE、LOAD...)对实际 ISA 没有任何承诺(特别是没有关于寄存器或特定机器指令的承诺)。这样做允许独立于 ISA 进行更高级别的优化;只需将其视为良好的模块化即可。最后几个阶段专门针对 ISA;通常在分配寄存器的阶段,然后是模式将实际指令与抽象指令进行匹配的阶段。
有完整的关于更高级别优化的书籍,还有其他关于最终代码生成状态的书籍(通常是在单独的章节中讨论这两个问题的书籍)。 [Aho 和 Ullman Dragon 的书以及 Torczon 的 Engineering a Compiler 都是关于这两个主题的好书)。有很多技术允许人们写下最终的指令集和寄存器布局,并将生成最后阶段的大部分内容; GCC 有这样的。该技术很复杂,不适合这句话;最好去看书。
一旦您以这种方式获得适用于第一个 ISA 的编译器,您就可以使用相同的技术构建一个变体。您最终会得到两个物理编译器,每个 ISA 一个。他们共享所有前端逻辑以及抽象代码生成和优化。它们在最后阶段完全不同。
您应该了解的是,构建编译器以利用指令集是一个复杂的过程。
You build a compiler for a specific instruction set, which is a subset of a chosen an "instruction set architecture (ISA)".
(Many instruction sets have I/O instructions, but compilers almost never generate these).
There may be several different processor designs that execute this "instruction set architecture"
which will work with specific instruction subset you have chosen.
There are three kinds of evolutionary events occur in practice.
You determine your compiler would be better if used some more instructions from the ISA. For instance, you might decide that the MULTIPLY instruction would allow your compiler to generate faster code than the subroutine call you have used for multiply in the past. In this case, you extend your compiler slightly.
The owners of the ISA (Intel, AMD, IBM, ...) add entire new sets of instructions to the ISA. For instance, data parallel operations on a cache-line of data ("SIMD instructions"). You can decide to add some of these to your compiler. This event can be difficult, as new families of instructions generally make different assumptions about how data is layed out and processed.
You find some completely different ISA that you want to handle. In this case, you are going to rebuild the back end of your compiler as the insturction set is completely different, in terms of what registers exist, how they are used, etc.
Compiler builders often build the compilers to operated in stages. A last stage before generation of actual machine code typically represents the program as an abstract set of operations on pretty low level data (e.g., operations on fixed-word-size values) with fairly standard abstract operations (ADD, MULTIPLY, COMPARE, JUMP, CALL, STORE, LOAD, ...) that have no commitments to the actual ISA (especially no commitements about register or specific machine instructions). Doing it this way allows higher-level optimizations to be done independent of the ISA; just think of this as good modularization. The last few stages are specialized to the ISA; usually on stage to allocate registers, followed by a stage the pattern matches actual instructions against the abstract ones.
There are entire books written on optimizing on the higher level, and other books written on the final code generation sates (and often books that address both in seperate chapters). [Both Aho&Ullman Dragon book, and Torczon's Engineering a Compiler are quite good books on both topics). There is a lot of technology allowing one to write down the final instruction sets and registers layouts, and will generate much of the last stages; GCC has such. That technology is complex and wont fit in this sentence; best to go read the books.
Once you get a compiler working for the first ISA in this fashion, you can build a variant using the same technology. You end up with two physical compiler, one for each ISA. They share all the front end logic and absract code generation and optimization. They vary completely for the last stages.
What you should understand is that building the compiler to take advantage of instructions sets is a complex process.
这一切都取决于变化有多大。假设您有 ISA X 1.0,其中包含 ADD R1、R2、R3 指令。如果您获得新版本 X 1.1,它将指令 ADD R1、R2、R3 替换为 AD R1、R2、R3,则更改很小。本质上,您有相同的指令但名称不同。您可以使用 cc -arch X1.1 标志来适应此更改,该标志将发出“AD”而不是“ADD”。
如果变化较大,如 AD R1、R2 (R2 <- R1 + R2),则新指令与旧 ADD 不同。在这种情况下,您需要更改代码生成器并将该指令包含到您的可用指令集中。再次 cc -arch X1.1 应该让生成器知道该指令可用。
如果更改更大,例如将编译器重新定位到 ISA Y,那么您需要更改代码生成器生成的所有指令。这可能需要大量工作。
现在,现有的低级编译器优化是否适用于新目标也值得怀疑。一般来说,定义良好的后端(如 gcc)可以支持许多 ISA。它使用低级中间表示(IR),其中您的指令具有多个属性,包括操作码名称(“ADD”或“AD”)。然而,低级优化器并不像指令的其他属性那样关心名称,即它有多少个操作数?读/写哪些操作数?它访问内存吗?它会改变程序流程吗?等等。
如果您可以将目标架构调整到编译器框架中,那么您就可以成功利用现有的优化。
It all depends on how big the change is. Lets say you have ISA X 1.0 with instruction ADD R1, R2, R3. If you get a new version X 1.1 which replaces instruction ADD R1, R2, R3 with AD R1, R2, R3 the change is small. Essentially you have the same instruction with different name. You can accomodate thi change with a flag cc -arch X1.1 which will emit "AD" instead of "ADD".
If the change is bigger like AD R1, R2 (R2 <- R1 + R2) then the new instruction is different than the old ADD. In this case you need to change your code generator and include this instruction to your set of available instructions. Again cc -arch X1.1 should let the generator know that this instruction is available.
If the change is even bigger like retargeting the compiler to ISA Y then you need to change all instructions generated by the code generator. That can be a lot of work.
Now whether the existing low level compiler optimizations will work with the new target is also questionable. In general a well defined backend (like gcc) can support many ISA's. It is using a low level intermediate representation (IR) where your instruction has several properies including the opcode name ("ADD" or "AD"). However, the low level optimizers are not very much concerned with the name as much as other properies of the instruction i.e. How many operands does it have? What operands are read/written? Does it access memory? Does it change the program flow? etc.
If you can adapt your target architecture into the compiler framework then you can succesfuly utilize existing optimizations.