如何强制 gcc 内联函数?
__attribute__((always_inline))
是否强制 gcc 内联函数?
Does __attribute__((always_inline))
force a function to be inlined by gcc?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
是的。
Yes.
它应该。我非常喜欢手动内联。当然,用多了就不好了。但很多时候,在优化代码时,必须内联一两个函数,否则性能就会下降。坦率地说,根据我的经验,C 编译器在使用 inline 关键字时通常不会内联这些函数。
我完全愿意让编译器为我内联大部分代码。我真正关心的只是那六个左右绝对重要的案件。人们说“编译器在这方面做得很好”。我想看看这方面的证据。到目前为止,我从未见过 C 编译器在不使用某种强制内联语法(msvc
上的
(在 gcc 上)。__forceinline
)的情况下内联我告诉它的重要代码段__attribute__((always_inline))It should. I'm a big fan of manual inlining. Sure, used in excess it's a bad thing. But often times when optimizing code, there will be one or two functions that simply have to be inlined or performance goes down the toilet. And frankly, in my experience C compilers typically do not inline those functions when using the inline keyword.
I'm perfectly willing to let the compiler inline most of my code for me. It's only those half dozen or so absolutely vital cases that I really care about. People say "compilers do a good job at this." I'd like to see proof of that, please. So far, I've never seen a C compiler inline a vital piece of code I told it to without using some sort of forced inline syntax (
__forceinline
on msvc__attribute__((always_inline))
on gcc).是的,会的。这并不一定意味着这是一个好主意。
Yes, it will. That doesn't necessarily mean it's a good idea.
根据 gcc 优化选项 文档,您可以使用参数调整内联:
我建议详细阅读有关内联的所有参数,并适当地设置它们。
According to the gcc optimize options documentation, you can tune inlining with parameters:
I suggest reading more in details about all the parameters for inlining, and setting them appropriately.
我想在这里补充一点,我有一个 SIMD 数学库,其中内联对于性能绝对至关重要。最初,我将所有函数设置为内联,但反汇编表明,即使对于最简单的运算符,它也会决定实际调用该函数。 MSVC 和 Clang 都显示了这一点,并且所有优化标志都打开。
我按照 SO 中其他帖子的建议进行操作,并为 MSVC 添加了
__forceinline
,为所有其他编译器添加了__attribute__((always_inline))
。从基本乘法到正弦运算,各种紧密循环的性能持续提高了 25-35%。我不明白为什么他们内联如此困难(也许模板化代码更难?),但底线是:手动内联有非常有效的用例,并且可以获得巨大的加速。
如果你好奇的话,这就是我实现它的地方。 https://github.com/redorav/hlslpp
I want to add here that I have a SIMD math library where inlining is absolutely critical for performance. Initially I set all functions to inline but the disassembly showed that even for the most trivial operators it would decide to actually call the function. Both MSVC and Clang showed this, with all optimization flags on.
I did as suggested in other posts in SO and added
__forceinline
for MSVC and__attribute__((always_inline))
for all other compilers. There was a consistent 25-35% improvement in performance in various tight loops with operations ranging from basic multiplies to sines.I didn't figure out why they had such a hard time inlining (perhaps templated code is harder?) but the bottom line is: there are very valid use cases for inlining manually and huge speedups to be gained.
If you're curious this is where I implemented it. https://github.com/redorav/hlslpp
是的。无论设置任何其他选项,它都会内联该函数。请参阅此处。
Yes. It will inline the function regardless of any other options set. See here.
还可以使用
__always_inline
。我一直将其用于 GCC 4.8.1 的 C++ 成员函数。但在 GCC 文档中找不到很好的解释。One can also use
__always_inline
. I have been using that for C++ member functions for GCC 4.8.1. But could not found a good explanation in GCC doc.实际上答案是“不”。这意味着即使禁用优化,该函数也是内联的候选函数。
Actually the answer is "no". All it means is that the function is a candidate for inlining even with optimizations disabled.