对于内联函数来说,什么是好的启发式方法?
考虑到您只是尝试优化速度,那么决定是否内联函数的良好启发式是什么?显然代码大小应该很重要,但是当(例如)gcc 或 icc 确定是否内联函数调用时通常会使用其他因素吗?该领域是否有任何重要的学术工作?
Considering that you're trying solely to optimize for speed, what are good heuristics for deciding whether to inline a function or not? Obviously code size should be important, but are there any other factors typically used when (say) gcc or icc is determining whether to inline a function call? Has there been any significant academic work in the area?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
维基百科有 a 几段 与此相关,底部有一些链接:
因为虚拟方法不是静态已知的,但 JIT 可以收集运行时分析信息,例如方法调用频率:
练习 JUDO:动态优化下的 Java 声称他们的“内联策略是基于代码大小和分析信息。如果方法入口的执行频率低于某个阈值,则该方法不会被内联,因为它被视为冷方法。以避免代码爆炸,我们不会内联字节码大小超过 25 字节的方法,为了避免沿深度调用链内联,当沿调用链的累积内联字节码大小超过 40 字节时,内联会停止。”尽管他们有运行时分析信息(方法调用频率),但他们仍然小心地避免内联大型函数或函数链以防止膨胀。
在 Google Scholar 上进行搜索发现了许多论文,例如
适用于嵌入式处理器
Google 图书上的搜索 揭示了相当多的书籍,其中包含有关各种上下文中的函数内联的论文或章节。
《编译器设计手册:优化和机器代码生成》有一章介绍编译器设计中的统计和机器学习技术,其中包含启发式设置各种参数、分析结果的方法。本章引用了 Vaswani 等人的论文 Microarchitecture用于编译器优化的敏感经验模型,他们提出“使用经验模型
构建微架构敏感模型以进行编译器优化的技术”。
(其他一些书籍从程序员的角度讨论了 inling,例如 C++ for Game Programmers,讨论了过于频繁内联函数的危险以及内联和内联之间的区别如果编译器确定这样做弊大于利,则通常会忽略程序员的内联请求;作为最后的手段可以使用宏来覆盖。)
Wikipedia has a few paragraphs about this, with some links at the bottom:
Languages with JIT compilers and runtime class loading have other tradeoffs since the virtual methods aren't known statically, yet the JIT can collect runtime profiling information, such as method call frequency:
Design, Implementation, and Evaluation of Optimizations in a Just-in-Time Compiler (for Java) talks about method inlining of static methods and dynamically loaded classes and its improvements on performance.
Practicing JUDO: Java Under Dynamic Optimizations claims that their "inlining policy is based on the code size and profiling information. If the execution frequency of a method entry is below a certain threshold, the method is then not inlined because it is regarded as a cold method. To avoid code explosion, we do not inline a method with a bytecode size of more than 25 bytes. . . . To avoid inlining along a deep call chain, inlining stops when the accumulated inlined bytecode size along the call chain exceeds 40 bytes." Although they have runtime profiling information (method call frequency) they are still careful to avoid inlining large functions or chains of functions to prevent bloat.
A search on Google Scholar reveals a number of papers, such as
for Embedded Processors
A search on Google Books reveals quite a number of books with papers or chapters about function inlining in various contexts.
The Compiler Design Handbook: Optimizations and Machine Code Generation has a chapter about Statisical and Machine Learning Techniques in Compiler Design, with heuristics to set various parameters, profiling the results. This chapter references the Vaswani et al paper Microarchitecture Sensitive Empirical Models for Compiler Optimizations where they propose "the use of empirical modeling
techniques for building microarchitecture sensitive models for compiler optimizations".
(Some other books talk about inling from the programmer's point of view, such as C++ for Game Programmers, which talks about the dangers of inlining functions too often and the differences between inlining and macros. Compilers often ignore the programmer's inline requests if they can determine that they would do more harm than good; this can be overridden with macros as a last resort.)
函数调用意味着一些额外的代码(函数序言,其中设置新的堆栈帧,以及函数尾声,其中它被清理)。如果您的编译器发现函数代码与序言和结尾相比很小,它可以决定不值得进行实际调用,并将内联该函数。
我认为调用函数而不是内联函数的唯一好处是与大小相关。我猜想内联函数然后展开循环可能会导致大小显着增加。
A function call implies some additional code (the function prologue, where the new stack frame is set up, and the function epilogue, where it's cleaned up). If your compiler sees that the function code is small in comparison to the prologue and epilogue, it can decide it's not worth it to make an actual call, and will inline the function.
The only benefit I see of calling a function instead of inlining it are size-related. I guess inlining a function then unrolling a loop can result in a significant size increase.
据我所知,函数大小是编译器用来确定内联的唯一因素。但是,如果您进行配置文件引导优化(PGO),我相信编译器能够使用其他变量,例如调用次数/调用设置时间。
as far as I have saw, function size is the only factor compilers used to determine inline. However if you do profile guided optimization (PGO), i believe compiler is able to use other variables, such as number of calls/call setup time.
在 .NET 中,主要是基于大小。测量父函数和子函数的大小(以编译字节为单位)。然后测量组合函数的大小。如果组合后的函数较小,那么内联是一个好主意。
这样做的原因是为了能够将尽可能多的代码放入 CPU 的缓存中。现代 CPU 中,高速缓存未命中的代价比函数调用的代价要高得多。
In .NET is is mostly based on size. Measure the size of the parent function and child function in compiled bytes. Then measure the size of the combined function. If the combined function is smaller, then inlining is a good idea.
The reason for this is to make it possible to shove as much code into the CPU's cache as possible. Cache misses are far more expensive than function calls in modern CPUs.