当前位置：文江博客话题详情

编译器内联函数的深度有多深？

发布于 2024-12-05 08:24:19 字数 172 浏览 5 评论 0原文

假设我有一些函数，每个函数大约有两行简单的代码，它们像这样相互调用： A 调用 B 调用 C 调用D ...调用K。（所以基本上它是一长串短函数调用。）编译器通常会在调用树中深入多深来内联这些函数？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

避讳 2024-12-12 08:24:19

这个问题没有意义。

如果您考虑内联及其后果，您就会意识到：

避免函数调用（所有寄存器保存/帧调整）
向优化器公开更多上下文（死存储、死代码、公共子表达式消除...... .)
重复代码（使指令高速缓存和可执行文件大小膨胀等）

在决定是否内联时，编译器会在产生的潜在膨胀和预期的速度增益之间执行平衡操作。这种平衡行为受到选项的影响：对于 gcc -O3 意味着优化速度，而 -Oz 意味着优化大小，在内联时它们具有准相反的行为！

因此，重要的不是“嵌套级别”，而是指令的数量（可能是加权的，因为并非所有指令都是相等的）。

这意味着一个简单的转发函数：

int foo(int a, int b) { return foo(a, b, 3); }

从内联的角度来看本质上是“透明的”。

另一方面，一个有一百行代码的函数不太可能被内联。除了仅调用一次的静态自由函数是准系统内联的之外，因为在这种情况下它不会创建任何重复。

从这两个例子中，我们对启发式的行为有一个预感：

函数的指令越少，内联越好；调用
频率越低，内联越好。

之后，它们是您应该能够设置影响的参数无论如何（MSVC 作为 __force_inline 强烈暗示 inling，gcc 因为它们 -finline-limit 标志来“提高”阈值指令计数等...）

切线：你知道部分内联吗？

它是在 gcc 4.6 中引入的。顾名思义，这个想法是部分内联函数。大多数情况下，是为了避免当函数受到“保护”并且可能（在某些情况下）几乎立即返回时函数调用的开销。

例如：

void foo(Bar* x) {
  if (not x) { return; } // null pointer, pfff!

  // ... BIG BLOC OF STATEMENTS ...
}

void bar(Bar* x) {
  // DO 1
  foo(x);
  // DO 2
}

可以“优化”为：

void foo@0(Bar* x) {
  // ... BIG BLOC OF STATEMENTS ...
}

void bar(Bar* x) {
  // DO 1
  if (x) { foo@0(x); }
  // DO 2
}

当然，内联的启发式方法再次适用，但它们的应用更具区别性！

最后，除非您使用 WPO（全程序优化）或 LTO（链接时间优化），否则只有当函数的定义与调用站点位于同一 TU（翻译单元）中时，才能内联函数。

The question is not meaningful.

If you think about inlining, and its consequences, you'll realise it:

Avoids a function call (with all the register saving/frame adjustment)
Exposes more context to the optimizer (dead stores, dead code, common sub-expression elimintation...)
Duplicates code (bloating the instruction cache and the executable size, among other things)

When deciding whether to inline or not, the compiler thus performs a balancing act between the potential bloat created and the speed gain expected. This balancing act is affected by options: for gcc -O3 means optimize for speed while -Oz means optimize for size, on inlining they have quasi opposite behaviors!

Therefore, what matters is not the "nesting level" it is the number of instruction (possibly weighted as not all are created equal).

This means that a simple forwarding function:

int foo(int a, int b) { return foo(a, b, 3); }

is essentially "transparent" from the inlining point of view.

One the other hand, a function counting a hundred lines of code is unlikely to get inlined. Except that a static free functions called only once are quasi systematically inlined, as it does not create any duplication in this case.

From this two examples we get a hunch of how the heuristics behave:

the less instructions the function have, the better for inling
the less often it is called, the better for inlining

After that, they are parameters you should be able to set to influence one way or another (MSVC as __force_inline which hints strongly at inling, gcc as they -finline-limit flag to "raise" the treshold on the instruction count, etc...)

On a tangent: do you know about partial inlining ?

It was introduced in gcc in 4.6. The idea, as the name suggests, is to partially inline a function. Mostly, to avoid the overhead of a function call when the function is "guarded" and may (in some cases) return nearly immediately.

For example:

void foo(Bar* x) {
  if (not x) { return; } // null pointer, pfff!

  // ... BIG BLOC OF STATEMENTS ...
}

void bar(Bar* x) {
  // DO 1
  foo(x);
  // DO 2
}

could get "optimized" as:

void foo@0(Bar* x) {
  // ... BIG BLOC OF STATEMENTS ...
}

void bar(Bar* x) {
  // DO 1
  if (x) { foo@0(x); }
  // DO 2
}

Of course, once again the heuristics for inlining apply, but they apply more discriminately!

And finally, unless you use WPO (Whole Program Optimization) or LTO (Link Time Optimization), functions can only be inlined if their definition is in the same TU (Translation Unit) that the call site.

回复收藏 0 原文