如何强制 gcc 内联函数?

发布于 2024-12-19 07:40:03 字数 65 浏览 2 评论 0原文

__attribute__((always_inline)) 是否强制 gcc 内联函数?

Does __attribute__((always_inline)) force a function to be inlined by gcc?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

北城半夏 2024-12-26 07:40:03

是的。

总是内联

通常,除非指定优化,否则函数不会内联。对于声明为内联的函数,即使未指定优化级别,此属性也会内联该函数。

Yes.

always_inline

Generally, functions are not inlined unless optimization is specified. For functions declared inline, this attribute inlines the function even if no optimization level was specified.

孤千羽 2024-12-26 07:40:03

它应该。我非常喜欢手动内联。当然,用多了就不好了。但很多时候,在优化代码时,必须内联一两个函数,否则性能就会下降。坦率地说,根据我的经验,C 编译器在使用 inline 关键字时通常不会内联这些函数。

我完全愿意让编译器为我内联大部分代码。我真正关心的只是那六个左右绝对重要的案件。人们说“编译器在这方面做得很好”。我想看看这方面的证据。到目前为止,我从未见过 C 编译器在不使用某种强制内联语法(msvc 上的 __forceinline)的情况下内联我告诉它的重要代码段__attribute__((always_inline))(在 gcc 上)。

It should. I'm a big fan of manual inlining. Sure, used in excess it's a bad thing. But often times when optimizing code, there will be one or two functions that simply have to be inlined or performance goes down the toilet. And frankly, in my experience C compilers typically do not inline those functions when using the inline keyword.

I'm perfectly willing to let the compiler inline most of my code for me. It's only those half dozen or so absolutely vital cases that I really care about. People say "compilers do a good job at this." I'd like to see proof of that, please. So far, I've never seen a C compiler inline a vital piece of code I told it to without using some sort of forced inline syntax (__forceinline on msvc __attribute__((always_inline)) on gcc).

昇り龍 2024-12-26 07:40:03

是的,会的。这并不一定意味着这是一个好主意。

Yes, it will. That doesn't necessarily mean it's a good idea.

软的没边 2024-12-26 07:40:03

根据 gcc 优化选项 文档,您可以使用参数调整内联:

-finline-limit=n
By default, GCC limits the size of functions that can be inlined. This flag 
allows coarse control of this limit. n is the size of functions that can be 
inlined in number of  pseudo instructions.

Inlining is actually controlled by a number of parameters, which may be specified
individually by using --param name=value. The -finline-limit=n option sets some 
of these parameters as follows:

    max-inline-insns-single is set to n/2. 
    max-inline-insns-auto is set to n/2.

我建议详细阅读有关内联的所有参数,并适当地设置它们。

According to the gcc optimize options documentation, you can tune inlining with parameters:

-finline-limit=n
By default, GCC limits the size of functions that can be inlined. This flag 
allows coarse control of this limit. n is the size of functions that can be 
inlined in number of  pseudo instructions.

Inlining is actually controlled by a number of parameters, which may be specified
individually by using --param name=value. The -finline-limit=n option sets some 
of these parameters as follows:

    max-inline-insns-single is set to n/2. 
    max-inline-insns-auto is set to n/2.

I suggest reading more in details about all the parameters for inlining, and setting them appropriately.

贪了杯 2024-12-26 07:40:03

我想在这里补充一点,我有一个 SIMD 数学库,其中内联对于性能绝对至关重要。最初,我将所有函数设置为内联,但反汇编表明,即使对于最简单的运算符,它也会决定实际调用该函数。 MSVC 和 Clang 都显示了这一点,并且所有优化标志都打开。

我按照 SO 中其他帖子的建议进行操作,并为 MSVC 添加了 __forceinline ,为所有其他编译器添加了 __attribute__((always_inline)) 。从基本乘法到正弦运算,各种紧密循环的性能持续提高了 25-35%。

我不明白为什么他们内联如此困难(也许模板化代码更难?),但底线是:手动内联有非常有效的用例,并且可以获得巨大的加速。

如果你好奇的话,这就是我实现它的地方。 https://github.com/redorav/hlslpp

I want to add here that I have a SIMD math library where inlining is absolutely critical for performance. Initially I set all functions to inline but the disassembly showed that even for the most trivial operators it would decide to actually call the function. Both MSVC and Clang showed this, with all optimization flags on.

I did as suggested in other posts in SO and added __forceinline for MSVC and __attribute__((always_inline)) for all other compilers. There was a consistent 25-35% improvement in performance in various tight loops with operations ranging from basic multiplies to sines.

I didn't figure out why they had such a hard time inlining (perhaps templated code is harder?) but the bottom line is: there are very valid use cases for inlining manually and huge speedups to be gained.

If you're curious this is where I implemented it. https://github.com/redorav/hlslpp

删除→记忆 2024-12-26 07:40:03

是的。无论设置任何其他选项,它都会内联该函数。请参阅此处

Yes. It will inline the function regardless of any other options set. See here.

时光匆匆的小流年 2024-12-26 07:40:03

还可以使用__always_inline。我一直将其用于 GCC 4.8.1 的 C++ 成员函数。但在 GCC 文档中找不到很好的解释。

One can also use __always_inline. I have been using that for C++ member functions for GCC 4.8.1. But could not found a good explanation in GCC doc.

泼猴你往哪里跑 2024-12-26 07:40:03

实际上答案是“不”。这意味着即使禁用优化,该函数也是内联的候选函数。

Actually the answer is "no". All it means is that the function is a candidate for inlining even with optimizations disabled.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文