LLVM 做了哪些类型的优化以及其前端必须自行实现哪些类型的优化?

发布于 2024-12-03 01:54:57 字数 424 浏览 1 评论 0原文

注意:我注意到这个问题与这个问题有很大关系,所以如果如果您对我的问题感兴趣,您也绝对应该阅读另一个问题及其答案。

我可以想到 OOP 语言前端可以做的一些优化,例如创建临时变量来保存按顺序调用的 const 方法调用的值,而不需要对相关对象进行中间非常量调用,以切断函数调用,但我可以想不到更多。我想请人们创建一个更长的示例列表。

我问这个问题是因为我想创建一种小语言作为宠物项目,并且我不确定如何很好地研究这个主题。也许这是社区维基的一个例子? LLVM 后端所做的优化以及前端应该自行进行的优化的综合列表,您认为如何?

哦,我知道不同的前端可能有截然不同的需求,但我的重点是过程/OOP 语言。

Notice: I noticed this question is a lot related to this one, so if you're somebody interested in my question, you should definitely read that other one and its answers too.

I can think of some optimizations an OOP language frontend could do, such as creating temporary variables to hold values from const method calls called in sequence, without intermediary non-const calls to the object in question, to cut off function calls, but I can't think of many more. I'd like to ask people to create a longer list of examples.

I ask this because I want to create a small language as a pet project and I'm not sure how to study this subject very well. Maybe this is a case for the community wiki? A comprehensive list of optimizations the LLVM backend does and that frontends should do themselves, what do you think?

Oh, and I know different frontends can have widely different needs, but my focus is on procedural/OOP languages.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

流云如水 2024-12-10 01:54:57

这可能因语言而异... clang (C/C++) 在前端优化方面做得很少。我能想到的针对生成代码性能的唯一优化是 clang 在前端对 C++ 方法进行了一些去虚拟化。 clang 还进行了一些其他优化,例如常量折叠和死代码消除,但这主要是为了加快编译时间,而不是为了提高生成代码的性能。

编辑:实际上,再想一想,我只记得 clang 对 C++ 所做的一项更重要的优化:clang 知道一些在 C++ 中消除复制构造函数的技巧(google for NRVO)。

在某些情况下,特定于语言的 IR 优化过程可能很有用。有一个 SimplifyLibCalls 传递,它知道如何优化对 C 标准库的调用。对于新的 Objective-C ARC 语言功能,clang 将一些特定于 ARC 的通道放入管道中;这些优化了对各种 Objective-C 运行时函数的调用。

一般来说,只有当代码具有无法编码到 IR 中的属性(例如 C++ 对象具有常量 vtable 指针)时,在前端实现优化通常才有用。在实践中,您很可能希望首先实现哑代码生成,然后看看是否存在未优化的重要情况。优化器可以执行一些令人惊讶的复杂转换。

另请参阅 http://llvm.org/docs/tutorial/LangImpl7.html ;适当地使用 alloca 是对优化器有很大帮助的一件事,尽管它本身并不是真正的优化。

This probably varies a lot by language... clang (C/C++) is able to get away with doing very little in terms of optimizations in the frontend. The only optimization I can think of that is done for performance of the generated code is that clang does some devirtualization of C++ methods in the frontend. clang does some other optimizations like constant folding and dead code elimination, but that's primarily done to speed up compile-time, not for the performance of the generated code.

EDIT: Actually, thinking about it a bit more, I just remembered one more important optimization clang does for C++: clang knows a few tricks to elide copy constructors in C++ (google for NRVO).

In some cases, a language-specific IR optimization pass can be useful. There is a a SimplifyLibCalls pass which knows how to optimize calls into the C standard library. For the new Objective-C ARC language feature, clang puts some ARC-specific passes into the pipeline; those optimize out calls to various Objective-C runtime functions.

In general, implementing optimizations in the frontend is only generally helpful when code has properties which cannot be encoded into the IR (e.g. C++ objects have a constant vtable pointer). And in practice, you most likely want to implement dumb code generation first, and see whether there are important cases which are not optimized. The optimizers can do some surprisingly complex transformations.

See also http://llvm.org/docs/tutorial/LangImpl7.html ; using alloca appropriately is one thing which helps the optimizers substantially, although it isn't really an optimization itself.

滥情哥ㄟ 2024-12-10 01:54:57

有很多很多优化只需要 SSA 表单 中保存的信息,即由 LLVM 使用。 SSA 提供了很多在控制流、数据流方面进行分析的可能性。

另一方面,LLVM语言是RISC,因此丢失了许多高级信息。

所以答案是:前端能够进行优化,而这些优化需要转换为 SSA 后丢失的信息。我想到的例子:

  • 首选分支优化,一些例子
    • 郎。扩展,例如声明首选分支(在 Linux 内核中,某些分支被标记为几乎总是执行)
    • 抛出和捕获异常的实现
    • 协同例程实现和依赖信息
  • 呈指数增长的依赖信息优化(如循环取消切换增加代码大小),可能需要根据高级信息应用于特定地方。 - 可能来自源代码(前端)。
  • 语言特性(可能是反射或其他)被翻译成“多指针”(比如指向指针的指针......)相互链接的结构,这在低级别上可能很难猜测 - 因为在低级别上,所有的都可能看起来就像数组访问一样,虽然它可能在高层上有一些限制,这可能有助于优化。
  • 根据可用的硬件,复杂的功能可能会以不同的方式实现。让我们举几个例子:矩阵乘法、FFT 变换(压缩和解压缩算法)、大数算术等等……根据底层硬件的不同,可能会以不同的方式实现以实现最大性能。将内容转换为 LLVM 后,更改实现以更适合可用的硬件可能会非常非常昂贵(就计算复杂性而言)。这就是为什么编译到下层时应该由前端来决定。

这些只是一些想法和希望,展示了可能涉及的优化。

There are many, many optimizations that need only as much information as is kept in SSA form, which is used by LLVM. SSA gives a lot of possibility to analyse in terms of control-flow, data-flow.

On the other hand, LLVM language is RISC, so many high level information is lost.

So answer is : front-end is capable of doing optimisations that requires information that is lost after translating into SSA. Examples that come to my mind:

  • preferred branching optimisations, some examples
    • lang. extensions like declaring preferred branches (in Linux kernel some branches are marked as almost always executed)
    • implementation of throwing and catching exceptions
    • co-routines implementation and dependency information
  • optimisations that grow exponentially (like loop-unswitch grow code-size), might need to be applied to specific places according to high level info. - might be from source code (front-end).
  • language features (it might be reflection or sth else) that is translated into "many-pointers" (like pointers to pointers...) interlinked structures, that might be hard to guess on low level - as on low level, all might look like array access, while it might have some constrains on hight level that might help in optimisations.
  • complex functions might be implemented differently depending on available hardware. Let's take a few examples: matrix multiplication, FFT transformation (compression&decompression algorithms), big-numbers arithmetic etc etc... depending on underlying hardware it might be implemented differently to achieve maximum performance. After translating stuff into LLVM it might be very-very-very costly (in terms in computation complexity) to change implementation with more appropriate to available hardware. That's, why decision should be made by front-end when compiling into lower-level.

Those are just a few ideas, a hope, showing kind of optimisations that might be involved.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文