虚拟功能和性能 C++
我真的很想在我的应用程序中使用虚拟函数,让事情变得简单一百倍(这不是 OOP 的全部内容吗;))。但我在某处读到它们是以性能成本为代价的,除了同样老套的过早优化的人为炒作之外什么也没有,我决定在一个小型基准测试中快速尝试一下,使用:
#include "CProfiler.h"
CProfiler::CProfiler(void (*func)(void), unsigned int iterations) {
gettimeofday(&a, 0);
for (;iterations > 0; iterations --) {
gettimeofday(&b, 0);
result = (b.tv_sec * (unsigned int)1e6 + b.tv_usec) - (a.tv_sec * (unsigned int)1e6 + a.tv_usec);
#include "CProfiler.h"
#include <iostream>
class CC {
int width, height, area;
class VCC {
int width, height, area;
virtual void set_area () {}
class CS: public CC {
void set_area () { area = width * height; }
class VCS: public VCC {
void set_area () { area = width * height; }
void profileNonVirtual() {
CS *abc = new CS;
delete abc;
void profileVirtual() {
VCS *abc = new VCS;
delete abc;
int main() {
int iterations = 5000;
CProfiler prf2(&profileNonVirtual, iterations);
CProfiler prf(&profileVirtual, iterations);
std::cout << prf.result;
std::cout << "\n";
std::cout << prf2.result;
return 0;
起初我只进行了 100 次和 10000 次迭代,结果令人担忧:非虚拟化为 4ms,虚拟化为 250ms!我几乎要“nooooooo”进去,但后来我将迭代次数提高到了 500,000 次左右;看到结果几乎完全相同(如果没有启用优化标志,速度可能会慢 5%)。
Before you cringe at the duplicate title, the other question wasn't suited to what I ask here (IMO). So.
I am really wanting to use virtual functions in my application to make things a hundred times easier (isn't that what OOP is all about ;)). But I read somewhere they came at a performance cost, seeing nothing but the same old contrived hype of premature optimization, I decided to give it a quick whirl in a small benchmark test using:
#include "CProfiler.h"
CProfiler::CProfiler(void (*func)(void), unsigned int iterations) {
gettimeofday(&a, 0);
for (;iterations > 0; iterations --) {
gettimeofday(&b, 0);
result = (b.tv_sec * (unsigned int)1e6 + b.tv_usec) - (a.tv_sec * (unsigned int)1e6 + a.tv_usec);
#include "CProfiler.h"
#include <iostream>
class CC {
int width, height, area;
class VCC {
int width, height, area;
virtual void set_area () {}
class CS: public CC {
void set_area () { area = width * height; }
class VCS: public VCC {
void set_area () { area = width * height; }
void profileNonVirtual() {
CS *abc = new CS;
delete abc;
void profileVirtual() {
VCS *abc = new VCS;
delete abc;
int main() {
int iterations = 5000;
CProfiler prf2(&profileNonVirtual, iterations);
CProfiler prf(&profileVirtual, iterations);
std::cout << prf.result;
std::cout << "\n";
std::cout << prf2.result;
return 0;
At first I only did 100 and 10000 iterations, and the results were worrying: 4ms for non virtualised, and 250ms for the virtualised! I almost went "nooooooo" inside, but then I upped the iterations to around 500,000; to see the results become almost completely identical (maybe 5% slower without optimization flags enabled).
My question is, why was there such a significant change with a low amount of iterations compared to high amount? Was it purely because the virtual functions are hot in cache at that many iterations?
I understand that my 'profiling' code is not perfect, but it, as it has, gives an estimate of things, which is all that matters at this level. Also I am asking these questions to learn, not to solely optimize my application.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

1) 您正在浪费时间调用
I think that this kind of testing is pretty useless, in fact:
1) you are wasting time for profiling itself invoking
;2) you are not really testing virtual functions, and IMHO this is the worst thing.
Why? Because you use virtual functions to avoid writing things such as:
in this code, you miss the "if... else" block so you don't really get the advantage of virtual functions. This is a scenario where they are always "loser" against non-virtual.
To do a proper profiling, I think you should add something like the code I've posted.
sizeof(VCS) >大小(VS)。如果将
但是:您确实应该比较类似的功能。使用虚函数时,这样做是有原因的,即根据对象的标识调用不同的成员函数。如果您需要此功能,并且不想使用虚拟函数,则必须手动实现它,无论是使用函数表还是 switch 语句。这也是有代价的,这就是您应该与虚拟函数进行比较的地方。
There could be several reasons for the difference in time.
the heap manager may influence the result, because
sizeof(VCS) > sizeof(VS)
. What happens if you move thenew
out of the loop?Again, due to size differences, memory cache may indeed be part of the difference in time.
BUT: you should really compare similar functionality. When using virtual functions, you do so for a reason, which is calling a different member function dependent on the object's identity. If you need this functionality, and don't want to use virtual functions, you would have to implement it manually, be it using a function table or even a switch statement. This comes at a cost, too, and that's what you should compare against virtual functions.
当使用太少的迭代时,测量中会有很多噪声。 gettimeofday 函数不够准确,无法为您提供少量迭代的良好测量结果,更不用说它记录总挂机时间(其中包括被其他线程抢占时所花费的时间)。
When using too few iterations, there is a lot of noise in the measurement. The
function is not going to be accurate enough to give you good measurements for only a handful of iterations, not to mention that it records total wall time (which includes time spent when preempted by other threads).Bottom line, though, you shouldn't come up with some ridiculously convoluted design to avoid virtual functions. They really don't add much overhead. If you have incredibly performance critical code and you know that virtual functions make up most of the time, then perhaps it's something to worry about. In any practical application, though, virtual functions won't be what's making your application slow.
在我看来,当循环数量较少时,可能没有上下文切换,但是当您增加循环数量时,则很有可能发生上下文切换,这会主导阅读。例如,第一个程序需要 1 秒,第二个程序需要 3 秒,但如果上下文切换需要 10 秒,则差异为 13/11 而不是 3/1。
In my opinion, When there was less number of loops, may be there was no context switching, But when you increased the number of loops, then there are very strong chances that context switching takes place and that is dominating the reading. For example first program takes 1 sec and second program 3 secs, but if context switch takes 10 secs, then the difference is 13/11 instead of 3/1.
I believe that your test case is too artificial to be of any great value.
First, inside your profiled function you dynamically allocate and deallocate an object as well as call a function, if you want to profile just the function call then you should do just that.
Second, you are not profiling a case where a virtual function call represents a viable alternative to a given problem. A virtual function call provides dynamic dispatch. You should try profiling a case such as where a virtual function call is used as an alternative to something using a switch-on-type anti-pattern.
这里的基准函数是 template,因为 template 可能会被内联,而通过函数指针调用则不太可能。
运行结果(gcc.3.4.2 ,-O2,SLES10 四核服务器)注意:使用另一个翻译单元中的函数定义,以防止内联
Extending Charles' answer.
The problem here is that your loop is doing more than just testing the virtual call itself (the memory allocation probably dwarfs the virtual call overhead anyway), so his suggestion is to change the code so that only the virtual call is tested.
Here the benchmark function is template, because template may be inlined while call through function pointers are unlikely to.
And the measuring:
Note: this tests the overhead of a virtual call in comparison to a regular function. The functionality is different (since you do not have runtime dispatch in the second case), but it's therefore a worst-case overhead.
Results of the run (gcc.3.4.2, -O2, SLES10 quadcore server) note: with the functions definitions in another translation unit, to prevent inlining
Not really convincing.
With a small number of iterations there's a chance that your code is preempted with some other program running in parallel or swapping occurs or anything else operating system isolates your program from happens and you'll have the time it was suspended by the operating system included into your benchmark results. This is number one reason why you should run your code something like a dozen million times to measure anything more or less reliably.
There is a performance impact to calling a virtual function, because it does slightly more than calling a regular function. However, the impact is likely to be completely negligible in a real-world application -- even less so than appear in even the most finely crafted benchmarks.
In a real world application, the alternative to a virtual function is usually going to involve you hand-writing some similar system anyhow, because the behavior of calling a virtual function and calling a non-virtual function differs -- the former changes based on the runtime type of the invoking object. Your benchmark, even disregarding whatever flaws it has, doesn't measure equivalent behavior, only equivalent-ish syntax. If you were to institute a coding policy banning virtual functions you'd either have to write some potentially very roundabout or confusing code (which might be slower) or re-implement a similar kind of runtime dispatch system that the compiler is using to implement virtual function behavior (which is certainly going to be no faster than what the compiler does, in most cases).