编译器内联函数的深度有多深?
假设我有一些函数,每个函数大约有两行简单的代码,它们像这样相互调用: A
调用 B
调用 C
调用D
...调用K
。 (所以基本上它是一长串短函数调用。)编译器通常会在调用树中深入多深来内联这些函数?
Say I have some functions, each of about two simple lines of code, and they call each other like this: A
calls B
calls C
calls D
... calls K
. (So basically it's a long series of short function calls.) How deep will compilers usually go in the call tree to inline these functions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这个问题没有意义。
如果您考虑内联及其后果,您就会意识到:
在决定是否内联时,编译器会在产生的潜在膨胀和预期的速度增益之间执行平衡操作。这种平衡行为受到选项的影响:对于 gcc
-O3
意味着优化速度,而-Oz
意味着优化大小,在内联时它们具有准相反的行为!因此,重要的不是“嵌套级别”,而是指令的数量(可能是加权的,因为并非所有指令都是相等的)。
这意味着一个简单的转发函数:
从内联的角度来看本质上是“透明的”。
另一方面,一个有一百行代码的函数不太可能被内联。除了仅调用一次的静态自由函数是准系统内联的之外,因为在这种情况下它不会创建任何重复。
从这两个例子中,我们对启发式的行为有一个预感:
之后,它们是您应该能够设置影响的参数无论如何(MSVC 作为 __force_inline 强烈暗示 inling,
gcc
因为它们-finline-limit
标志来“提高”阈值指令计数等...)切线:你知道部分内联吗?
它是在 gcc 4.6 中引入的。顾名思义,这个想法是部分内联函数。大多数情况下,是为了避免当函数受到“保护”并且可能(在某些情况下)几乎立即返回时函数调用的开销。
例如:
可以“优化”为:
当然,内联的启发式方法再次适用,但它们的应用更具区别性!
最后,除非您使用 WPO(全程序优化)或 LTO(链接时间优化),否则只有当函数的定义与调用站点位于同一 TU(翻译单元)中时,才能内联函数。
The question is not meaningful.
If you think about inlining, and its consequences, you'll realise it:
When deciding whether to inline or not, the compiler thus performs a balancing act between the potential bloat created and the speed gain expected. This balancing act is affected by options: for gcc
-O3
means optimize for speed while-Oz
means optimize for size, on inlining they have quasi opposite behaviors!Therefore, what matters is not the "nesting level" it is the number of instruction (possibly weighted as not all are created equal).
This means that a simple forwarding function:
is essentially "transparent" from the inlining point of view.
One the other hand, a function counting a hundred lines of code is unlikely to get inlined. Except that a
static
free functions called only once are quasi systematically inlined, as it does not create any duplication in this case.From this two examples we get a hunch of how the heuristics behave:
After that, they are parameters you should be able to set to influence one way or another (MSVC as
__force_inline
which hints strongly at inling,gcc
as they-finline-limit
flag to "raise" the treshold on the instruction count, etc...)On a tangent: do you know about partial inlining ?
It was introduced in gcc in 4.6. The idea, as the name suggests, is to partially inline a function. Mostly, to avoid the overhead of a function call when the function is "guarded" and may (in some cases) return nearly immediately.
For example:
could get "optimized" as:
Of course, once again the heuristics for inlining apply, but they apply more discriminately!
And finally, unless you use WPO (Whole Program Optimization) or LTO (Link Time Optimization), functions can only be inlined if their definition is in the same TU (Translation Unit) that the call site.
我见过编译器深度内联了超过 5 个函数。但在某些时候,它基本上成为编译器所做的空间效率权衡。每个编译器在这方面都是不同的。 Visual Studio 对于内联非常保守。 GCC(在 -O3 下)和 Intel 编译器喜欢内联...
I've seen compilers inline more than 5 functions deep. But at some point, it basically becomes a space-efficiency trade-off that the compiler makes. Every compiler is different in this aspect. Visual Studio is very conservative with inlining. GCC (under -O3) and the Intel Compiler love to inline...