虚拟功能和性能 C++

发布于 2024-10-15 07:06:53 字数 1746 浏览 4 评论 0原文

在您对重复的标题感到畏缩之前，另一个问题不适合我在这里问的问题（IMO）。所以。

我真的很想在我的应用程序中使用虚拟函数，让事情变得简单一百倍（这不是 OOP 的全部内容吗；））。但我在某处读到它们是以性能成本为代价的，除了同样老套的过早优化的人为炒作之外什么也没有，我决定在一个小型基准测试中快速尝试一下，使用：

CProfiler.cpp

#include "CProfiler.h"

CProfiler::CProfiler(void (*func)(void), unsigned int iterations) {
    gettimeofday(&a, 0);
    for (;iterations > 0; iterations --) {
        func();
    }
    gettimeofday(&b, 0);
    result = (b.tv_sec * (unsigned int)1e6 + b.tv_usec) - (a.tv_sec * (unsigned int)1e6 + a.tv_usec);
};

main.cpp

#include "CProfiler.h"

#include <iostream>

class CC {
  protected:
    int width, height, area;
  };

class VCC {
  protected:
    int width, height, area;
  public:
    virtual void set_area () {}
  };

class CS: public CC {
  public:
    void set_area () { area = width * height; }
  };

class VCS: public VCC {
  public:
    void set_area () {  area = width * height; }
  };

void profileNonVirtual() {
    CS *abc = new CS;
    abc->set_area();
    delete abc;
}

void profileVirtual() {
    VCS *abc = new VCS;
    abc->set_area();
    delete abc;
}

int main() {
    int iterations = 5000;
    CProfiler prf2(&profileNonVirtual, iterations);
    CProfiler prf(&profileVirtual, iterations);

    std::cout << prf.result;
    std::cout << "\n";
    std::cout << prf2.result;

    return 0;
}

起初我只进行了 100 次和 10000 次迭代，结果令人担忧：非虚拟化为 4ms，虚拟化为 250ms！我几乎要“nooooooo”进去，但后来我将迭代次数提高到了 500,000 次左右；看到结果几乎完全相同（如果没有启用优化标志，速度可能会慢 5%）。

我的问题是，为什么与大量迭代相比，少量迭代会出现如此显着的变化？纯粹是因为虚拟函数在多次迭代时在缓存中很热吗？

免责声明
我知道我的“分析”代码并不完美，但它确实给出了对事物的估计，这才是这个级别上最重要的。另外，我问这些问题是为了学习，而不仅仅是为了优化我的应用程序。

原文

Before you cringe at the duplicate title, the other question wasn't suited to what I ask here (IMO). So.

I am really wanting to use virtual functions in my application to make things a hundred times easier (isn't that what OOP is all about ;)). But I read somewhere they came at a performance cost, seeing nothing but the same old contrived hype of premature optimization, I decided to give it a quick whirl in a small benchmark test using:

CProfiler.cpp

#include "CProfiler.h"

CProfiler::CProfiler(void (*func)(void), unsigned int iterations) {
    gettimeofday(&a, 0);
    for (;iterations > 0; iterations --) {
        func();
    }
    gettimeofday(&b, 0);
    result = (b.tv_sec * (unsigned int)1e6 + b.tv_usec) - (a.tv_sec * (unsigned int)1e6 + a.tv_usec);
};

main.cpp

#include "CProfiler.h"

#include <iostream>

class CC {
  protected:
    int width, height, area;
  };

class VCC {
  protected:
    int width, height, area;
  public:
    virtual void set_area () {}
  };

class CS: public CC {
  public:
    void set_area () { area = width * height; }
  };

class VCS: public VCC {
  public:
    void set_area () {  area = width * height; }
  };

void profileNonVirtual() {
    CS *abc = new CS;
    abc->set_area();
    delete abc;
}

void profileVirtual() {
    VCS *abc = new VCS;
    abc->set_area();
    delete abc;
}

int main() {
    int iterations = 5000;
    CProfiler prf2(&profileNonVirtual, iterations);
    CProfiler prf(&profileVirtual, iterations);

    std::cout << prf.result;
    std::cout << "\n";
    std::cout << prf2.result;

    return 0;
}

At first I only did 100 and 10000 iterations, and the results were worrying: 4ms for non virtualised, and 250ms for the virtualised! I almost went "nooooooo" inside, but then I upped the iterations to around 500,000; to see the results become almost completely identical (maybe 5% slower without optimization flags enabled).

My question is, why was there such a significant change with a low amount of iterations compared to high amount? Was it purely because the virtual functions are hot in cache at that many iterations?

Disclaimer
I understand that my 'profiling' code is not perfect, but it, as it has, gives an estimate of things, which is all that matters at this level. Also I am asking these questions to learn, not to solely optimize my application.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

北方的韩爷 2024-10-22 07:06:54

事实上，我认为这种测试毫无用处：
1) 您正在浪费时间调用 gettimeofday();
2）你并没有真正测试虚拟功能，恕我直言，这是最糟糕的事情。

为什么？因为您使用虚函数来避免编写以下内容：

<pseudocode>
switch typeof(object) {

case ClassA: functionA(object);

case ClassB: functionB(object);

case ClassC: functionC(object);
}
</pseudocode>

在这段代码中，您错过了“if...else”块，因此您没有真正获得虚函数的优势。这是他们对非虚始终是“失败者”的场景。

要进行正确的分析，我认为您应该添加类似我发布的代码的内容。

I think that this kind of testing is pretty useless, in fact:
1) you are wasting time for profiling itself invoking gettimeofday();
2) you are not really testing virtual functions, and IMHO this is the worst thing.

Why? Because you use virtual functions to avoid writing things such as:

<pseudocode>
switch typeof(object) {

case ClassA: functionA(object);

case ClassB: functionB(object);

case ClassC: functionC(object);
}
</pseudocode>

in this code, you miss the "if... else" block so you don't really get the advantage of virtual functions. This is a scenario where they are always "loser" against non-virtual.

To do a proper profiling, I think you should add something like the code I've posted.

回复收藏 0 原文

关于从前 2024-10-22 07:06:54

造成时间差异的原因可能有多种。

您的计时函数不够精确
堆管理器可能会影响结果，因为 sizeof(VCS) >大小（VS）。如果将 new / delete 移出循环会发生什么？
同样，由于大小差异，内存缓存确实可能是时间差异的一部分。

但是：您确实应该比较类似的功能。使用虚函数时，这样做是有原因的，即根据对象的标识调用不同的成员函数。如果您需要此功能，并且不想使用虚拟函数，则必须手动实现它，无论是使用函数表还是 switch 语句。这也是有代价的，这就是您应该与虚拟函数进行比较的地方。

回复收藏 0 原文

人事已非 2024-10-22 07:06:54

当使用太少的迭代时，测量中会有很多噪声。 gettimeofday 函数不够准确，无法为您提供少量迭代的良好测量结果，更不用说它记录总挂机时间（其中包括被其他线程抢占时所花费的时间）。

但最重要的是，您不应该想出一些极其复杂的设计来避免虚拟函数。它们实际上并没有增加太多开销。如果您有非常关键的性能代码，并且您知道虚拟函数占据了大部分时间，那么也许这是值得担心的事情。不过，在任何实际应用程序中，虚拟函数都不会导致应用程序变慢。

回复收藏 0 原文

西瓜 2024-10-22 07:06:54

在我看来，当循环数量较少时，可能没有上下文切换，但是当您增加循环数量时，则很有可能发生上下文切换，这会主导阅读。例如，第一个程序需要 1 秒，第二个程序需要 3 秒，但如果上下文切换需要 10 秒，则差异为 13/11 而不是 3/1。

回复收藏 0 原文

め七分饶幸 2024-10-22 07:06:53

我认为您的测试用例过于人为，没有任何重大价值。

首先，在分析函数内，您动态分配和释放对象以及调用函数，如果您只想分析函数调用，那么您应该这样做。

其次，您没有分析虚拟函数调用代表给定问题的可行替代方案的情况。虚函数调用提供动态调度。您应该尝试分析一种情况，例如使用虚拟函数调用作为使用开关型反模式的替代方案。

回复收藏 0 原文

谈情不如逗狗 2024-10-22 07:06:53

扩展查尔斯的回答。

这里的问题是，您的循环不仅仅测试虚拟调用本身（无论如何，内存分配可能会使虚拟调用开销相形见绌），因此他的建议是更改代码，以便仅测试虚拟调用。

这里的基准函数是 template，因为 template 可能会被内联，而通过函数指针调用则不太可能。

template <typename Type>
double benchmark(Type const& t, size_t iterations)
{
  timeval a, b;
  gettimeofday(&a, 0);
  for (;iterations > 0; --iterations) {
    t.getArea();
  }
  gettimeofday(&b, 0);
  return (b.tv_sec * (unsigned int)1e6 + b.tv_usec) -
         (a.tv_sec * (unsigned int)1e6 + a.tv_usec);
}

类：

struct Regular
{
  Regular(size_t w, size_t h): _width(w), _height(h) {}

  size_t getArea() const;

  size_t _width;
  size_t _height;
};

// The following line in another translation unit
// to avoid inlining
size_t Regular::getArea() const { return _width * _height; }

struct Base
{
  Base(size_t w, size_t h): _width(w), _height(h) {}

  virtual size_t getArea() const = 0;

  size_t _width;
  size_t _height;
};

struct Derived: Base
{
  Derived(size_t w, size_t h): Base(w, h) {}

  virtual size_t getArea() const;
};

// The following two functions in another translation unit
// to avoid inlining
size_t Derived::getArea() const  { return _width * _height; }

std::auto_ptr<Base> generateDerived()
{
  return std::auto_ptr<Base>(new Derived(3,7));
}

以及测量：

int main(int argc, char* argv[])
{
  if (argc != 2) {
    std::cerr << "Usage: %prog iterations\n";
    return 1;
  }

  Regular regular(3, 7);
  std::auto_ptr<Base> derived = generateDerived();

  double regTime = benchmark<Regular>(regular, atoi(argv[1]));
  double derTime = benchmark<Base>(*derived, atoi(argv[1]));

  std::cout << "Regular: " << regTime << "\nDerived: " << derTime << "\n";

  return 0;
}

注意：这测试了虚拟调用与常规函数相比的开销。功能不同（因为在第二种情况下没有运行时调度），但因此这是最坏情况的开销。

编辑：

运行结果（gcc.3.4.2 ，-O2，SLES10 四核服务器）注意：使用另一个翻译单元中的函数定义，以防止内联

> ./test 5000000
Regular: 17041
Derived: 17194

不太令人信服。

Extending Charles' answer.

The problem here is that your loop is doing more than just testing the virtual call itself (the memory allocation probably dwarfs the virtual call overhead anyway), so his suggestion is to change the code so that only the virtual call is tested.

Here the benchmark function is template, because template may be inlined while call through function pointers are unlikely to.

template <typename Type>
double benchmark(Type const& t, size_t iterations)
{
  timeval a, b;
  gettimeofday(&a, 0);
  for (;iterations > 0; --iterations) {
    t.getArea();
  }
  gettimeofday(&b, 0);
  return (b.tv_sec * (unsigned int)1e6 + b.tv_usec) -
         (a.tv_sec * (unsigned int)1e6 + a.tv_usec);
}

Classes:

struct Regular
{
  Regular(size_t w, size_t h): _width(w), _height(h) {}

  size_t getArea() const;

  size_t _width;
  size_t _height;
};

// The following line in another translation unit
// to avoid inlining
size_t Regular::getArea() const { return _width * _height; }

struct Base
{
  Base(size_t w, size_t h): _width(w), _height(h) {}

  virtual size_t getArea() const = 0;

  size_t _width;
  size_t _height;
};

struct Derived: Base
{
  Derived(size_t w, size_t h): Base(w, h) {}

  virtual size_t getArea() const;
};

// The following two functions in another translation unit
// to avoid inlining
size_t Derived::getArea() const  { return _width * _height; }

std::auto_ptr<Base> generateDerived()
{
  return std::auto_ptr<Base>(new Derived(3,7));
}

And the measuring:

int main(int argc, char* argv[])
{
  if (argc != 2) {
    std::cerr << "Usage: %prog iterations\n";
    return 1;
  }

  Regular regular(3, 7);
  std::auto_ptr<Base> derived = generateDerived();

  double regTime = benchmark<Regular>(regular, atoi(argv[1]));
  double derTime = benchmark<Base>(*derived, atoi(argv[1]));

  std::cout << "Regular: " << regTime << "\nDerived: " << derTime << "\n";

  return 0;
}

Note: this tests the overhead of a virtual call in comparison to a regular function. The functionality is different (since you do not have runtime dispatch in the second case), but it's therefore a worst-case overhead.

EDIT:

Results of the run (gcc.3.4.2, -O2, SLES10 quadcore server) note: with the functions definitions in another translation unit, to prevent inlining

> ./test 5000000
Regular: 17041
Derived: 17194

Not really convincing.

回复收藏 0 原文

握住我的手 2024-10-22 07:06:53

通过少量的迭代，您的代码有可能被并行运行的其他程序抢占，或者发生交换，或者发生任何其他操作系统将您的程序隔离的情况，并且您将有时间它被包含在其中的操作系统挂起。您的基准测试结果。这是为什么您应该运行代码大约一千万次以或多或少可靠地测量任何内容的首要原因。

回复收藏 0 原文

人疚 2024-10-22 07:06:53

调用虚函数会对性能产生影响，因为它比调用常规函数稍微多一些。然而，在现实应用程序中，这种影响可能完全可以忽略不计——甚至比最精心设计的基准测试中出现的影响还要小。

在现实世界的应用程序中，虚拟函数的替代方案通常需要您手写一些类似的系统，因为调用虚拟函数和调用非虚拟函数的行为不同——前者根据调用对象的运行时类型。你的基准测试，即使不考虑它有什么缺陷，也不会衡量等效的行为，只会衡量等效的语法。如果您要制定禁止虚拟函数的编码策略，您要么必须编写一些可能非常迂回或令人困惑的代码（这可能会更慢），要么重新实现编译器用于实现虚拟函数的类似运行时调度系统函数行为（在大多数情况下，这肯定不会比编译器的行为更快）。

回复收藏 0 原文

~没有更多了~