虚拟功能和性能 - C++
在我的类设计中,我广泛使用抽象类和虚函数。 我有一种感觉,虚拟函数会影响性能。 这是真的? 但我认为这种性能差异并不明显,看起来我正在做过早的优化。 正确的?
In my class design, I use abstract classes and virtual functions extensively. I had a feeling that virtual functions affects the performance. Is this true? But I think this performance difference is not noticeable and looks like I am doing premature optimization. Right?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(15)
我认为虚拟函数会成为性能问题的唯一方法是,如果在紧密循环内调用许多虚拟函数,并且当且仅当它们导致页面错误或其他“严重”问题。 ” 发生内存操作。
尽管就像其他人所说的那样,这在现实生活中对您来说几乎永远不会成为问题。 如果您认为是这样,请运行探查器,进行一些测试,并在尝试“取消设计”代码以获得性能优势之前验证这是否确实是一个问题。
The only ever way that I can see that a virtual function will become a performance problem is if many virtual functions are called within a tight loop, and if and only if they cause a page fault or other "heavy" memory operation to occur.
Though like other people have said it's pretty much never going to be a problem for you in real life. And if you think it is, run a profiler, do some tests, and verify if this really is a problem before trying to "undesign" your code for a performance benefit.
当类方法不是虚拟的时,编译器通常会进行内联。 相反,当您使用指向带有虚函数的类的指针时,只有在运行时才会知道真实地址。
测试很好地说明了这一点,时间差~700%(!):
虚拟函数调用的影响很大程度上取决于情况。
如果函数内部调用很少且工作量很大,那么它可能可以忽略不计。
或者,当它是重复使用多次的虚拟调用,同时执行一些简单操作时 - 它可能非常大。
When class method is not virtual, compiler usually does in-lining. In contrary, when you use pointer to some class with virtual function, the real address will be known only at runtime.
This is well illustrated by test, time difference ~700% (!):
The impact of virtual function call highly depends on situation.
If there are few calls and significant amount of work inside function - it could be negligible.
Or, when it is a virtual call repeatedly used many times, while doing some simple operation - it could be really big.
在我的特定项目中,我已经反复讨论了至少 20 次。 尽管在代码重用、清晰度、可维护性和可读性方面可以取得一些巨大的进步,但另一方面,虚拟函数仍然存在性能问题。
在现代笔记本电脑/台式机/平板电脑上,性能受到的影响是否会很明显......可能不会! 但是,在嵌入式系统的某些情况下,性能下降可能是代码效率低下的驱动因素,特别是在循环中一遍又一遍地调用虚拟函数的情况下。
这是一篇有点过时的论文,分析了嵌入式系统环境中 C/C++ 的最佳实践:http://www.open-std.org/jtc1/sc22/wg21/docs/ESC_Boston_01_304_paper.pdf
总结:程序员需要了解使用某种结构优于另一种结构。 除非您是超级性能驱动的,否则您可能不关心性能影响,并且应该使用 C++ 中所有简洁的 OO 内容来帮助使您的代码尽可能可用。
I've gone back and forth on this at least 20 times on my particular project. Although there can be some great gains in terms of code reuse, clarity, maintainability, and readability, on the other hand, performance hits still do exist with virtual functions.
Is the performance hit going to be noticeable on a modern laptop/desktop/tablet... probably not! However, in certain cases with embedded systems, the performance hit may be the driving factor in your code's inefficiency, especially if the virtual function is called over and over again in a loop.
Here's a some-what dated paper that anaylzes best practices for C/C++ in the embedded systems context: http://www.open-std.org/jtc1/sc22/wg21/docs/ESC_Boston_01_304_paper.pdf
To conclude: it's up to the programmer to understand the pros/cons of using a certain construct over another. Unless you're super performance driven, you probably don't care about the performance hit and should use all the neat OO stuff in C++ to help make your code as usable as possible.
根据我的经验,主要相关的是内联函数的能力。 如果您的性能/优化需求决定需要内联函数,那么您不能将函数设为虚拟,因为这会阻止这种情况的发生。 否则,您可能不会注意到其中的差异。
In my experience, the main relevant thing is the ability to inline a function. If you have performance/optimization needs that dictate a function needs to be inlined, then you can't make the function virtual because it would prevent that. Otherwise, you probably won't notice the difference.
需要注意的是,这:
可能比这更快:
这是因为第一个方法仅调用一个函数,而第二个方法可能调用许多不同的函数。 这适用于任何语言的任何虚拟函数。
我说“可能”是因为这取决于编译器、缓存等。
One thing to note is that this:
may be faster than this:
This is because the first method is only calling one function while the second may be calling many different functions. This applies to any virtual function in any language.
I say "may" because this depends on the compiler, the cache etc.
使用虚拟函数的性能损失永远不会超过您在设计级别获得的优势。 据称,调用虚函数的效率比直接调用静态函数的效率低 25%。 这是因为通过 VMT 存在一定程度的间接性。 然而,与实际执行函数所花费的时间相比,进行调用所花费的时间通常非常短,因此总性能成本可以忽略不计,特别是在当前硬件性能的情况下。
此外,编译器有时可以优化并发现不需要虚拟调用并将其编译为静态调用。 所以不用担心,根据需要尽可能多地使用虚函数和抽象类。
The performance penalty of using virtual functions can never outweight the advantages you get at the design level. Supposedly a call to a virtual function would be 25% less efficient then a direct call to a static function. This is because there is a level of indirection throught the VMT. However the time taken to make the call is normally very small compared to the time taken in the actual execution of your function so the total performance cost will be nigligable, especially with current performance of hardware.
Furthermore the compiler can sometimes optimise and see that no virtual call is needed and compile it into a static call. So don't worry use virtual functions and abstract classes as much as you need.
我总是问自己这个问题,特别是因为 - 几年前 - 我也做了这样一个测试,比较标准成员方法调用和虚拟方法调用的时间,并且对当时的结果感到非常生气,因为空的虚拟调用被比非虚拟慢 8 倍。
今天,我必须决定是否在一个性能非常关键的应用程序中使用虚拟函数在我的缓冲区类中分配更多内存,所以我用谷歌搜索(并找到了你),最后再次进行了测试。
真的很惊讶它 - 事实上 - 真的不再重要了。
虽然内联比非虚拟更快是有意义的,而且它们比虚拟更快,但它通常涉及计算机的整体负载,无论您的缓存是否有必要的数据,并且虽然您可能能够优化我认为,在缓存级别,这应该由编译器开发人员而不是应用程序开发人员来完成。
I always questioned myself this, especially since - quite a few years ago - I also did such a test comparing the timings of a standard member method call with a virtual one and was really angry about the results at that time, having empty virtual calls being 8 times slower than non-virtuals.
Today I had to decide whether or not to use a virtual function for allocating more memory in my buffer class, in a very performance critical app, so I googled (and found you), and in the end, did the test again.
And was really surprised that it - in fact - really does not matter at all anymore.
While it makes just sense to have inlines faster than non-virtuals, and them being faster then virtuals, it often comes to the load of the computer overall, whether your cache has the necessary data or not, and whilst you might be able to optimize at cache-level, I think, that this should be done by the compiler developers more than by application devs.
你的问题让我很好奇,所以我继续在我们使用的 3GHz 有序 PowerPC CPU 上运行了一些计时。 我运行的测试是使用 get/set 函数创建一个简单的 4d 向量类
然后我设置了三个数组,每个数组包含 1024 个这些向量(小到足以适合 L1)并运行一个循环将它们相互添加(Ax = Bx+Cx)1000次。 我使用定义为
内联
、虚拟
和常规函数调用的函数来运行它。 结果如下:因此,在这种情况下(一切都适合缓存)虚拟函数调用约为 20 倍比内联调用慢。 但这到底意味着什么呢? 每次循环都会导致
3 * 4 * 1024 = 12,288
次函数调用(1024 个向量乘以四个分量乘以每次添加的 3 个调用),因此这些时间表示1000 * 12,288 = 12,288,000
代码>函数调用。 虚拟循环比直接循环花费了 92 毫秒,因此每个函数每次调用的额外开销为 7 纳秒。由此我得出结论:是,虚拟函数比直接函数慢得多,否,除非您打算每秒调用它们一千万次,否则不会。没关系。
另请参阅:生成的程序集的比较。
Your question made me curious, so I went ahead and ran some timings on the 3GHz in-order PowerPC CPU we work with. The test I ran was to make a simple 4d vector class with get/set functions
Then I set up three arrays each containing 1024 of these vectors (small enough to fit in L1) and ran a loop that added them to one another (A.x = B.x + C.x) 1000 times. I ran this with the functions defined as
inline
,virtual
, and regular function calls. Here are the results:So, in this case (where everything fits in cache) the virtual function calls were about 20x slower than the inline calls. But what does this really mean? Each trip through the loop caused exactly
3 * 4 * 1024 = 12,288
function calls (1024 vectors times four components times three calls per add), so these times represent1000 * 12,288 = 12,288,000
function calls. The virtual loop took 92ms longer than the direct loop, so the additional overhead per call was 7 nanoseconds per function.From this I conclude: yes, virtual functions are much slower than direct functions, and no, unless you're planning on calling them ten million times per second, it doesn't matter.
See also: comparison of the generated assembly.
一个好的经验法则是:
虚拟函数的使用将对性能产生非常轻微的影响,但不太可能影响应用程序的整体性能。 寻求性能改进的更好地方是算法和 I/O。
成员函数指针和最快的 C++ 是一篇讨论虚拟函数(以及更多内容)的优秀文章代表们。
A good rule of thumb is:
The use of virtual functions will have a very slight effect on performance, but it's unlikely to affect the overall performance of your application. Better places to look for performance improvements are in algorithms and I/O.
An excellent article that talks about virtual functions (and more) is Member Function Pointers and the Fastest Possible C++ Delegates.
当 Objective-C(所有方法都是虚拟的)是 iPhone 的主要语言,而奇怪的 Java 是 Android 的主要语言时,我认为在我们的 3 GHz 上使用 C++ 虚拟函数是相当安全的双核塔式。
When Objective-C (where all methods are virtual) is the primary language for the iPhone and freakin' Java is the main language for Android, I think it's pretty safe to use C++ virtual functions on our 3 GHz dual-core towers.
在性能非常关键的应用程序(如视频游戏)中,虚拟函数调用可能会太慢。 对于现代硬件,最大的性能问题是缓存未命中。 如果数据不在缓存中,则可能需要数百个周期后才可用。
当 CPU 获取新函数的第一条指令并且该指令不在高速缓存中时,正常的函数调用可能会产生指令高速缓存未命中。
虚拟函数调用首先需要从对象加载vtable指针。 这可能会导致数据缓存未命中。 然后它从 vtable 加载函数指针,这可能导致另一个数据缓存未命中。 然后它调用该函数,该函数可能会像非虚函数一样导致指令缓存未命中。
在许多情况下,两次额外的缓存未命中并不重要,但在性能关键代码的紧密循环中,它会显着降低性能。
In very performance critical applications (like video games) a virtual function call can be too slow. With modern hardware, the biggest performance concern is the cache miss. If data isn't in the cache, it may be hundreds of cycles before it's available.
A normal function call can generate an instruction cache miss when the CPU fetches the first instruction of the new function and it's not in the cache.
A virtual function call first needs to load the vtable pointer from the object. This can result in a data cache miss. Then it loads the function pointer from the vtable which can result in another data cache miss. Then it calls the function which can result in an instruction cache miss like a non-virtual function.
In many cases, two extra cache misses are not a concern, but in a tight loop on performance critical code it can dramatically reduce performance.
Agner Fog 的“用 C++ 优化软件”手册第 44 页:
From page 44 of Agner Fog's "Optimizing Software in C++" manual:
绝对地。 当计算机以 100Mhz 运行时,这是一个问题,因为每个方法调用都需要在调用之前查找 vtable。 但今天.. 在具有一级缓存且内存比我的第一台计算机更多的 3Ghz CPU 上? 一点也不。 与所有功能都是虚拟的相比,从主 RAM 分配内存会花费更多时间。
就像过去人们说结构化编程很慢一样,因为所有代码都被分成函数,每个函数都需要堆栈分配和函数调用!
唯一一次我什至会考虑费心考虑虚拟函数对性能的影响,是如果它在模板化代码中被大量使用和实例化,而最终贯穿所有内容。 即使这样,我也不会花太多精力!
PS 考虑其他“易于使用”的语言 - 它们的所有方法都是虚拟的,并且现在不再爬行。
absolutely. It was a problem way back when computers ran at 100Mhz, as every method call required a lookup on the vtable before it was called. But today.. on a 3Ghz CPU that has 1st level cache with more memory than my first computer had? Not at all. Allocating memory from main RAM will cost you more time than if all your functions were virtual.
Its like the old, old days where people said structured programming was slow because all the code was split into functions, each function required stack allocations and a function call!
The only time I would even think of bothering to consider the performance impact of a virtual function, is if it was very heavily used and instantiated in templated code that ended up throughout everything. Even then, I wouldn't spend too much effort on it!
PS think of other 'easy to use' languages - all their methods are virtual under the covers and they don't crawl nowadays.
除了执行时间之外,还有另一个性能标准。 Vtable 也会占用内存空间,在某些情况下可以避免:ATL 使用编译时 "模拟动态绑定"与模板以获得“静态”的效果多态性”,这有点难以解释; 您基本上将派生类作为参数传递给基类模板,因此在编译时,基类“知道”每个实例中的派生类是什么。 不会让您在基类型集合中存储多个不同的派生类(即运行时多态性),但从静态意义上来说,如果您想创建一个与预先存在的模板类 X 相同的类 Y,该模板类 X 具有对于这种重写的钩子,你只需要重写你关心的方法,然后你就可以获得类X的基方法,而无需有vtable。
在内存占用较大的类中,单个 vtable 指针的成本并不高,但 COM 中的某些 ATL 类非常小,如果永远不会发生运行时多态情况,那么节省 vtable 是值得的。
另请参阅这个其他问题。
顺便说一句,这是一个帖子我发现 讨论了 CPU 时间性能方面。
There's another performance criteria besides execution time. A Vtable takes up memory space as well, and in some cases can be avoided: ATL uses compile-time "simulated dynamic binding" with templates to get the effect of "static polymorphism", which is sort of hard to explain; you basically pass the derived class as a parameter to a base class template, so at compile time the base class "knows" what its derived class is in each instance. Won't let you store multiple different derived classes in a collection of base types (that's run-time polymorphism) but from a static sense, if you want to make a class Y that is the same as a preexisting template class X which has the hooks for this kind of overriding, you just need to override the methods you care about, and then you get the base methods of class X without having to have a vtable.
In classes with large memory footprints, the cost of a single vtable pointer is not much, but some of the ATL classes in COM are very small, and it's worth the vtable savings if the run-time polymorphism case is never going to occur.
See also this other SO question.
By the way here's a posting I found that talks about the CPU-time performance aspects.
是的,你是对的,如果你对虚拟函数调用的成本感到好奇,你可能会发现 这篇文章很有趣。
Yes, you're right and if you curious about the cost of virtual function call you might find this post interesting.