C++:对抗多态性开销

发布于 2024-10-01 01:54:39 字数 3750 浏览 5 评论 0原文

我知道多态性会增加明显的开销。调用虚函数比调用非虚函数慢。 (我所有的经验都是关于 GCC 的,但我认为/听说这对于任何真正的编译器都是如此。)

很多时候,给定的虚拟函数会在同一个对象上一遍又一遍地调用;我知道对象类型不会改变,并且大多数时候编译器可以轻松地推断出这一点:

BaseType &obj = ...;
while( looping )
    obj.f(); // BaseType::f is virtual

为了加快代码速度,我可以像这样重写上面的代码:

BaseType &obj = ...;
FinalType &fo = dynamic_cast< FinalType& >( obj );
while( looping )
    fo.f(); // FinalType::f is not virtual

我想知道避免由于多态性而产生的这种开销的最佳方法是什么在这些情况下。

上层转换的想法(如第二个片段所示)对我来说看起来不太好:BaseType 可以被许多类继承,并且尝试对所有类进行上层转换将是相当冗长。

另一个想法可能是将 obj.f 存储在函数指针中(没有测试这一点,不确定它会消除运行时开销),但是这个方法看起来并不完美:上面的方法,它需要编写更多的代码,并且无法利用一些优化(例如:如果 FinalType::f 是一个内联函数,它就不会被内联——但我想避免这种情况的唯一方法是将 obj 转换为其最终类型...)

那么,有没有更好的方法呢?

编辑: 嗯,当然这不会产生太大影响。这个问题主要是想知道是否有什么事情要做,因为看起来这个开销是免费提供的(这个开销看起来很容易消除)我不明白为什么不这样做。

我所希望的是一个用于少量优化的简单关键字,例如 C99 restrict,告诉编译器多态对象是固定类型。

无论如何,只是为了回复评论,存在一点开销。看看这个临时极端代码:

struct Base { virtual void f(){} };
struct Final : public Base { void f(){} };

int main( ) {
    Final final;
    Final &f = final;
    Base &b = f;

    for( int i = 0; i < 1024*1024*1024; ++ i )
#ifdef BASE
        b.f( );
#else
        f.f( );
#endif

    return 0;
}

编译并运行它,花费时间:

$ for OPT in {"",-O0,-O1,-O2,-O3,-Os}; do
    for DEF in {BASE,FINAL}; do
        g++ $OPT -D$DEF -o virt virt.cpp &&
        TIME="$DEF $OPT: %U" time ./virt;
    done;
  done           
BASE : 5.19                                                                                                                                                                         
FINAL : 4.21                                                                                                                                                                        
BASE -O0: 5.22                                                                                                                                                                      
FINAL -O0: 4.19                                                                                                                                                                     
BASE -O1: 3.55                                                                                                                                                                      
FINAL -O1: 1.53                                                                                                                                                                     
BASE -O2: 3.61                                                                                                                                                                      
FINAL -O2: 0.00                                                                                                                                                                     
BASE -O3: 3.58                                                                                                                                                                      
FINAL -O3: 0.00                                                                                                                                                                     
BASE -Os: 6.14                                                                                                                                                                      
FINAL -Os: 0.00

我猜只有-O2、-O3和-Os是内联Final::f的。

这些测试已在我的机器上运行,运行最新的 GCC 和 AMD Athlon(tm) 64 X2 双核处理器 4000+ CPU。我想在更便宜的平台上它可能会慢很多。

I know that polymorphism can add a noticeable overhead. Calling a virtual function is slower than calling a non-virtual one. (All my experience is about GCC, but I think/heard that this is true for any realcompiler.)

Many times a given virtual function gets called on the same object over and over; I know that object type doesn't change, and most of the times compiler could easily deduct that has well:

BaseType &obj = ...;
while( looping )
    obj.f(); // BaseType::f is virtual

To speed up the code I could rewrite the above code like this:

BaseType &obj = ...;
FinalType &fo = dynamic_cast< FinalType& >( obj );
while( looping )
    fo.f(); // FinalType::f is not virtual

I wonder what's the best way to avoid this overhead due to polymorphism in these cases.

The idea of upper-casting (as shown in the second snippet) doesn't look that good to me: BaseType could be inherited by many classes, and trying to upper-cast to all of them would be pretty prolix.

Another idea could be that of storing obj.f in a function pointer (didn't test this, not sure it would kill run-time overhead), but again this method doesn't look perfect: as the above method, it would require to write more code and it wouldn't be able to exploit some optimizations (eg: if FinalType::f was an inline function, it wouldn't get inlined -- but I guess the only way to avoid this would be to upper-cast obj to its final type...)

So, is there any better method?

Edit:
Well, of course this is not going to impact that much. This question was mostly to know if there was something to do, since it looks like this overhead is given for free (this overhead looks to be very easy to kill) I don't see why not to.

An easy keyword for little optimizations, like C99 restrict, to tell compiler a polymorphic object is of a fixed type is what I was hoping for.

Anyway, just to answer back to comments, a little overhead is present. Look at this ad-hoc extreme code:

struct Base { virtual void f(){} };
struct Final : public Base { void f(){} };

int main( ) {
    Final final;
    Final &f = final;
    Base &b = f;

    for( int i = 0; i < 1024*1024*1024; ++ i )
#ifdef BASE
        b.f( );
#else
        f.f( );
#endif

    return 0;
}

Compiling and running it, taking times:

$ for OPT in {"",-O0,-O1,-O2,-O3,-Os}; do
    for DEF in {BASE,FINAL}; do
        g++ $OPT -D$DEF -o virt virt.cpp &&
        TIME="$DEF $OPT: %U" time ./virt;
    done;
  done           
BASE : 5.19                                                                                                                                                                         
FINAL : 4.21                                                                                                                                                                        
BASE -O0: 5.22                                                                                                                                                                      
FINAL -O0: 4.19                                                                                                                                                                     
BASE -O1: 3.55                                                                                                                                                                      
FINAL -O1: 1.53                                                                                                                                                                     
BASE -O2: 3.61                                                                                                                                                                      
FINAL -O2: 0.00                                                                                                                                                                     
BASE -O3: 3.58                                                                                                                                                                      
FINAL -O3: 0.00                                                                                                                                                                     
BASE -Os: 6.14                                                                                                                                                                      
FINAL -Os: 0.00

I guess only -O2, -O3 and -Os are inlining Final::f.

And these tests have been run on my machine, running the latest GCC and an AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ CPU. I guess it could be a lot slower on a cheaper platform.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

花桑 2024-10-08 01:54:39

如果动态分派是你程序中的性能瓶颈,那么解决问题的方法就是不使用动态分派(不要使用虚函数)。

您可以通过使用模板和泛型编程而不是虚拟函数,将某些运行时多态性替换为编译时多态性。这可能会也可能不会带来更好的性能;只有分析器才能确定地告诉您。

不过需要明确的是,正如 wilhelmtell 在对该问题的评论中已经指出的那样,动态调度造成的开销很少足以令人担心。在用笨拙的自定义实现替换内置的便利性之前,请绝对确定它是您的性能热点。

If dynamic dispatch is a performance bottleneck in your program, then the way to solve the problem is not to use dynamic dispatch (don't use virtual functions).

You can replace some run-time polymorphism with compile-time polymorphism by using templates and generic programming instead of virtual functions. This may or may not result in better performance; only a profiler can tell you for sure.

To be clear though, as wilhelmtell has already pointed out in comments to the question, it's rare that the overhead caused by dynamic dispatch is significant enough to worry about. Be absolutely sure that it's your performance hot-spot before you go replacing built-in convenience with an unwieldy custom implementation.

怕倦 2024-10-08 01:54:39

如果需要使用多态性,那就使用它。确实没有更快的方法可以做到这一点。

不过,我想回答另一个问题:这是你最大的问题吗?如果是这样,那么您的代码已经是最佳的或接近最佳的。如果没有,找出最大的问题是什么,然后集中精力解决这个问题。

If you need to use polymorphism, then use it. There is really no faster way to do it.

However, I would respond with another question: Is this your biggest problem? If so, your code is already optimal or nearly so. If not, find out what the biggest problem is, and concentrate on that instead.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文