加速 gcc 中的虚拟函数调用
使用 gprof 分析我的 C++ 代码,我发现我的很大一部分时间都花在一遍又一遍地调用一个虚拟方法上。 该方法本身很短,如果它不是虚拟的,则可能会被内联。
有哪些方法可以加快速度,而不需要将其全部重写为非虚拟的?
Profiling my C++ code with gprof, I discovered that a significant portion of my time is spent calling one virtual method over and over. The method itself is short and could probably be inlined if it wasn't virtual.
What are some ways I could speed this up short of rewriting it all to not be virtual?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
您确定时间都与通话有关吗? 难道成本就在于功能本身吗? 如果是这种情况,简单地内联可能会使该函数从您的分析器中消失,但您不会看到太多加速。
假设这确实是进行如此多虚拟调用的开销,那么在不使事情变得非虚拟的情况下,您可以做的事情是有限的。
如果呼叫有诸如时间/标志之类的提前退出,那么我通常会使用两级方法。 检查是通过非虚拟调用内联的,仅在必要时才调用特定于类的行为。
例如
Are you sure the time is all call-related? Could it be the function itself where the cost is? If this is the case simply inlining things might make the function vanish from your profiler but you won't see much speed-up.
Assuming it really is the overhead of making so many virtual calls there's a limit to what you can do without making things non-virtual.
If the call has early-outs for things like time/flags then I'll often use a two-level approach. The checking is inlined with a non-virtual call, with the class-specific behavior only called if necessary.
E.g.
如果虚拟调用确实是瓶颈,请尝试CRTP。
If the virtual calling really is the bottleneck give CRTP a try.
时间花在实际的函数调用上,还是花在函数本身上?
虚拟函数调用明显比非虚拟调用慢,因为虚拟调用需要额外的取消引用。 (如果您想阅读所有详细信息,请 Google 搜索“vtable”。))更新:事实证明 维基百科文章在这方面还不错。
不过,这里的“值得注意”意味着几条指令,如果它消耗了总计算的很大一部分,包括在被调用函数中花费的时间,那么这听起来像是考虑非虚拟化和内联的绝佳场所。
但在 C++ 近 20 年的历史中,我认为我从未见过这种情况真正发生过。 我很想看看代码。
Is the time being spent in the actual function call, or in the function itself?
A virtual function call is noticeably slower than a non-virtual call, because the virtual call requires an extra dereference. (Google for 'vtable' if you want to read all the hairy details.) )Update: It turns out the Wikipedia article isn't bad on this.
"Noticeably" here, though, means a couple of instructions If it's consuming a significant part of the total computation including time spent in the called function, that sounds like a marvelous place to consider unvirtualizing and inlining.
But in something close to 20 years of C++, I don't think I've ever seen that really happen. I'd love to see the code.
请注意,“虚拟”和“内联”并不是对立的——一个方法可以两者兼而有之。 如果编译器可以在编译时确定对象的类型,编译器将很乐意内联虚拟函数:
[更新:使得某些
rb
的真实动态对象类型在编译时无法得知 - - 感谢 MSalters]如果对象的类型可以在编译时确定,但函数不可内联(例如,它很大或在类定义之外定义),则它将被称为非虚拟。
Please be aware that "virtual" and "inline" are not opposites -- a method can be both. The compiler will happily inline a virtual function if it can determine the type of the object at compile time:
[UPDATE: Made certain
rb
's true dynamic object type cannot be known at compile time -- thanks to MSalters]If the type of the object can be determined at compile time but the function is not inlineable (e.g. it is large or is defined outside of the class definition), it will be called non-virtually.
如果没有可用的 C++ 语法糖,考虑一下如何用古老的“C”编写代码有时会很有启发。 有时答案不是使用间接调用。 有关示例,请参阅此答案。
It's sometimes instructive to consider how you'd write the code in good old 'C' if you didn't have C++'s syntactic sugar available. Sometimes the answer isn't using an indirect call. See this answer for an example.
通过更改调用约定,您也许可以从虚拟调用中获得更好的性能。 旧的 Borland 编译器有一个 __fastcall 约定,它在 cpu 寄存器而不是堆栈上传递参数。
如果您受困于虚拟调用并且这几个操作确实很重要,那么请检查编译器文档以获取支持的调用约定。
You might be able get a little better performance from the virtual call by changing the calling convention. The old Borland compiler had a __fastcall convention which passed arguments in cpu registers instead of on the stack.
If you're stuck with the virtual call and those few operations really count, then check your compiler documentation for supported calling conventions.
这是使用 RTTI 实现此目的的一种可能方法。
Here is one possible way to do it using RTTI.