C++：对抗多态性开销

发布于 2024-10-01 01:54:39 字数 3750 浏览 5 评论 0原文

我知道多态性会增加明显的开销。调用虚函数比调用非虚函数慢。（我所有的经验都是关于 GCC 的，但我认为/听说这对于任何真正的编译器都是如此。）

很多时候，给定的虚拟函数会在同一个对象上一遍又一遍地调用；我知道对象类型不会改变，并且大多数时候编译器可以轻松地推断出这一点：

BaseType &obj = ...;
while( looping )
    obj.f(); // BaseType::f is virtual

为了加快代码速度，我可以像这样重写上面的代码：

BaseType &obj = ...;
FinalType &fo = dynamic_cast< FinalType& >( obj );
while( looping )
    fo.f(); // FinalType::f is not virtual

我想知道避免由于多态性而产生的这种开销的最佳方法是什么在这些情况下。

上层转换的想法（如第二个片段所示）对我来说看起来不太好：BaseType 可以被许多类继承，并且尝试对所有类进行上层转换将是相当冗长。

另一个想法可能是将 obj.f 存储在函数指针中（没有测试这一点，不确定它会消除运行时开销），但是这个方法看起来并不完美：上面的方法，它需要编写更多的代码，并且无法利用一些优化（例如：如果 FinalType::f 是一个内联函数，它就不会被内联——但我想避免这种情况的唯一方法是将 obj 转换为其最终类型...）

那么，有没有更好的方法呢？

编辑： 嗯，当然这不会产生太大影响。这个问题主要是想知道是否有什么事情要做，因为看起来这个开销是免费提供的（这个开销看起来很容易消除）我不明白为什么不这样做。

我所希望的是一个用于少量优化的简单关键字，例如 C99 restrict，告诉编译器多态对象是固定类型。

无论如何，只是为了回复评论，存在一点开销。看看这个临时极端代码：

struct Base { virtual void f(){} };
struct Final : public Base { void f(){} };

int main( ) {
    Final final;
    Final &f = final;
    Base &b = f;

    for( int i = 0; i < 1024*1024*1024; ++ i )
#ifdef BASE
        b.f( );
#else
        f.f( );
#endif

    return 0;
}

编译并运行它，花费时间：

$ for OPT in {"",-O0,-O1,-O2,-O3,-Os}; do
    for DEF in {BASE,FINAL}; do
        g++ $OPT -D$DEF -o virt virt.cpp &&
        TIME="$DEF $OPT: %U" time ./virt;
    done;
  done           
BASE : 5.19                                                                                                                                                                         
FINAL : 4.21                                                                                                                                                                        
BASE -O0: 5.22                                                                                                                                                                      
FINAL -O0: 4.19                                                                                                                                                                     
BASE -O1: 3.55                                                                                                                                                                      
FINAL -O1: 1.53                                                                                                                                                                     
BASE -O2: 3.61                                                                                                                                                                      
FINAL -O2: 0.00                                                                                                                                                                     
BASE -O3: 3.58                                                                                                                                                                      
FINAL -O3: 0.00                                                                                                                                                                     
BASE -Os: 6.14                                                                                                                                                                      
FINAL -Os: 0.00

我猜只有-O2、-O3和-Os是内联Final::f的。

这些测试已在我的机器上运行，运行最新的 GCC 和 AMD Athlon(tm) 64 X2 双核处理器 4000+ CPU。我想在更便宜的平台上它可能会慢很多。

原文

I know that polymorphism can add a noticeable overhead. Calling a virtual function is slower than calling a non-virtual one. (All my experience is about GCC, but I think/heard that this is true for any realcompiler.)

Many times a given virtual function gets called on the same object over and over; I know that object type doesn't change, and most of the times compiler could easily deduct that has well:

BaseType &obj = ...;
while( looping )
    obj.f(); // BaseType::f is virtual

To speed up the code I could rewrite the above code like this:

BaseType &obj = ...;
FinalType &fo = dynamic_cast< FinalType& >( obj );
while( looping )
    fo.f(); // FinalType::f is not virtual

I wonder what's the best way to avoid this overhead due to polymorphism in these cases.

The idea of upper-casting (as shown in the second snippet) doesn't look that good to me: BaseType could be inherited by many classes, and trying to upper-cast to all of them would be pretty prolix.

Another idea could be that of storing obj.f in a function pointer (didn't test this, not sure it would kill run-time overhead), but again this method doesn't look perfect: as the above method, it would require to write more code and it wouldn't be able to exploit some optimizations (eg: if FinalType::f was an inline function, it wouldn't get inlined -- but I guess the only way to avoid this would be to upper-cast obj to its final type...)

So, is there any better method?

Edit:
Well, of course this is not going to impact that much. This question was mostly to know if there was something to do, since it looks like this overhead is given for free (this overhead looks to be very easy to kill) I don't see why not to.

An easy keyword for little optimizations, like C99 restrict, to tell compiler a polymorphic object is of a fixed type is what I was hoping for.

Anyway, just to answer back to comments, a little overhead is present. Look at this ad-hoc extreme code:

struct Base { virtual void f(){} };
struct Final : public Base { void f(){} };

int main( ) {
    Final final;
    Final &f = final;
    Base &b = f;

    for( int i = 0; i < 1024*1024*1024; ++ i )
#ifdef BASE
        b.f( );
#else
        f.f( );
#endif

    return 0;
}

Compiling and running it, taking times:

$ for OPT in {"",-O0,-O1,-O2,-O3,-Os}; do
    for DEF in {BASE,FINAL}; do
        g++ $OPT -D$DEF -o virt virt.cpp &&
        TIME="$DEF $OPT: %U" time ./virt;
    done;
  done           
BASE : 5.19                                                                                                                                                                         
FINAL : 4.21                                                                                                                                                                        
BASE -O0: 5.22                                                                                                                                                                      
FINAL -O0: 4.19                                                                                                                                                                     
BASE -O1: 3.55                                                                                                                                                                      
FINAL -O1: 1.53                                                                                                                                                                     
BASE -O2: 3.61                                                                                                                                                                      
FINAL -O2: 0.00                                                                                                                                                                     
BASE -O3: 3.58                                                                                                                                                                      
FINAL -O3: 0.00                                                                                                                                                                     
BASE -Os: 6.14                                                                                                                                                                      
FINAL -Os: 0.00

I guess only -O2, -O3 and -Os are inlining Final::f.

And these tests have been run on my machine, running the latest GCC and an AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ CPU. I guess it could be a lot slower on a cheaper platform.

分享到QQ

分享到微博