std ::变体的多态性多态性

发布于 2025-01-21 09:38:08 字数 2028 浏览 1 评论 0原文

仔细查看__ do_visit in std ::变体我对std :: variant多态方法的性能感到好奇，

我写了一个小的测试程序将旧学校的继承与std :: variant

#include <variant>
#include <vector>
#include <iostream>
#include <string>
#include <chrono>

int i = 0;
// Polymorphism using variants
class circle
{
  public:
    void draw() const { i++; }
};

class line
{
  public:
    void draw() const { i++; }
};
using v_t  = std::variant<circle, line>;

void variant_way(const std::vector<v_t>& v)
{
  for (const auto &var : v)
    std::visit([](const auto& o) {
        o.draw();
        }, var);
}

// old school
class shape
{
  public:
    virtual void draw() const = 0;
    virtual ~shape() { }
};
class circle_in : public shape
{
  public:
    virtual void draw() const { i++; }
};

class line_in : public shape
{
  public:
   virtual void draw() const { i++; }
};

void inherit_way(const std::vector<shape*>& v)
{
  for (const auto var : v)
        var->draw();
}

// call and measure
template <typename F, typename D>
void run(F f, const D& data, std::string name)
{
  auto start = std::chrono::high_resolution_clock::now();
  f(data);
  auto end = std::chrono::high_resolution_clock::now();
  auto elapsed = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
  std::cout << name << ": "<< elapsed.count() << std::endl;
}

int main()
{
  constexpr int howmany = 100000;
  {
    std::vector<v_t> v {howmany};
    run(variant_way, v, "variant");
  }
  {
    std::vector<shape*> v;
    for (int i = 0; i < howmany; i++)
      v.push_back(new circle_in());
    run(inherit_way, v, "inherit_way");
    // deallocate
  }
  return 0;
}

在我的计算机上（i7，16GB RAM）上进行比较，我得到了这些结果：

variant: 7487
inherit_way: 1302

我怀疑这个结果反映了std的事实:: variant方法在每次迭代中创建vtable，而继承方法一次都可以。

这解释正确吗？

有没有办法减少开销？

原文

Having a closer look at __do_visit in std::variant I grew curious about the performances of the std::variant polymorphic approach

I wrote a small test program to compare old school inheritance to the std::variant one

#include <variant>
#include <vector>
#include <iostream>
#include <string>
#include <chrono>

int i = 0;
// Polymorphism using variants
class circle
{
  public:
    void draw() const { i++; }
};

class line
{
  public:
    void draw() const { i++; }
};
using v_t  = std::variant<circle, line>;

void variant_way(const std::vector<v_t>& v)
{
  for (const auto &var : v)
    std::visit([](const auto& o) {
        o.draw();
        }, var);
}

// old school
class shape
{
  public:
    virtual void draw() const = 0;
    virtual ~shape() { }
};
class circle_in : public shape
{
  public:
    virtual void draw() const { i++; }
};

class line_in : public shape
{
  public:
   virtual void draw() const { i++; }
};

void inherit_way(const std::vector<shape*>& v)
{
  for (const auto var : v)
        var->draw();
}

// call and measure
template <typename F, typename D>
void run(F f, const D& data, std::string name)
{
  auto start = std::chrono::high_resolution_clock::now();
  f(data);
  auto end = std::chrono::high_resolution_clock::now();
  auto elapsed = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
  std::cout << name << ": "<< elapsed.count() << std::endl;
}

int main()
{
  constexpr int howmany = 100000;
  {
    std::vector<v_t> v {howmany};
    run(variant_way, v, "variant");
  }
  {
    std::vector<shape*> v;
    for (int i = 0; i < howmany; i++)
      v.push_back(new circle_in());
    run(inherit_way, v, "inherit_way");
    // deallocate
  }
  return 0;
}

On my machine (i7, 16GB RAM), I get these results:

variant: 7487
inherit_way: 1302

I suspect that this result reflects the fact that the std::variant approach creates the vtable at each iteration while the inheriting approach does it once for all.

Is this explanation correct?

Is there a way to reduce the overhead?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

栀梦 2025-01-28 09:38:08

这个问题有几件事是错误的。从技术上讲，这不一定是答案，但我认为可以为思想提供一些有用的食物。

首先，多态类别和类型联合（变体）中的类型都没有数据。因此，这些类型的大小是非木普“普通旧数据”类型的1个字节，而OPY类型的8个字节是1个字节。

要构建至少一个合理的示例，您必须实际上使这些类型保留在某些数据中；如果没有，有什么意义？您不会衡量自己认为自己正在测量的内容。

例如;在没有优化的情况下运行此示例，给出与您在此处提供的相同结果。

但是用clang ++ -Std = C ++ 20 -O3突然，结果在std :: variant的偏爱中广泛转移：

variant: 0
inherit_way: 167

我愿意敢打赌编译器足够聪明，可以看到，嘿，变体情况只是尺寸为1个字节的元素的向量，但永远不会改变价值，并且完全对其进行优化，并且由于POD型的成员功能只是增加从来没有读过的全局变量，从字面上看，它不需要做更多的事情，因此0结果。

以下是一个更好的“示例问题” - 尽管是一个同样人为的问题，但实际上它花了时间来衡量我认为您实际上所追求的目标。

#include <algorithm>
#include <chrono>
#include <cmath>
#include <cstdint>
#include <iostream>
#include <random>
#include <string>
#include <type_traits>
#include <variant>
#include <vector>

int i = 0;
using u64 = std::uint64_t;

static constexpr auto PI = 3.14;

// ignore this, if you don't understand it
template <class... T> constexpr bool always_false = false;

struct point {
  u64 x;
  u64 y;
};

// some imaginary call to an internal rendering function, returning pixels
// written
inline constexpr auto internal_circle_render(auto xrad, auto yrad) {
  return PI * xrad * yrad;
}

// some imaginary call to an internal rendering function, returning pixels
// written
inline constexpr auto internal_line_render(auto width, point p1, point p2) {
  return std::sqrt(std::pow((p2.x - p1.x), 2) + std::pow((p2.y - p1.y), 2)) *
         width;
}

// POD for the win
struct circle {
  u64 xrad;
  u64 yrad;
  point center;
};

struct line {
  point a;
  point b;
  u64 width;
};

template <typename T> constexpr inline auto render_shape(T &&shape) -> u64 {
  using type = std::decay_t<T>;
  if constexpr (std::is_same_v<type, line>) {
    return internal_line_render(shape.width, shape.a, shape.b);
  } else if constexpr (std::is_same_v<type, circle>) {
    return internal_circle_render(shape.xrad, shape.yrad);
  } else {
    static_assert(always_false<T>, "unknown type");
  }
}

using v_t = std::variant<circle, line>;
auto variant_way(const std::vector<v_t> &v) {
  auto total_pixels = 0;
  for (const auto &var : v)
    total_pixels += std::visit(
        [](const auto &shape) -> u64 { return render_shape(shape); }, var);
  return total_pixels;
}

// old school
class shape {
public:
  shape() = default;
  virtual ~shape() = default;
  virtual u64 render_pixels() const = 0;
};

class circle_in : public shape {
public:
  circle_in(u64 xrad, u64 yrad, point center)
      : shape(), xrad(xrad), yrad(yrad), center(center) {}
  ~circle_in() override = default;

  virtual u64 render_pixels() const override {
    return internal_circle_render(xrad, yrad);
  }

private:
  u64 xrad;
  u64 yrad;
  point center;
};

class line_in : public shape {
public:
  line_in(point a, point b, u64 width) : shape(), a(a), b(b), width(width) {}
  ~line_in() override = default;

  virtual u64 render_pixels() const override {
    return internal_line_render(width, a, b);
  }

private:
  point a;
  point b;
  u64 width;
};

auto inherit_way(const std::vector<shape *> &v) {
  auto total_pixels = 0;
  for (const auto var : v)
    total_pixels += var->render_pixels();
  return total_pixels;
}

// call and measure
template <typename F, typename D>
void run(F f, const D &data, std::string name) {
  auto start = std::chrono::high_resolution_clock::now();
  auto pixels = f(data);
  auto end = std::chrono::high_resolution_clock::now();
  auto elapsed =
      std::chrono::duration_cast<std::chrono::microseconds>(end - start);
  std::cout << name << ": " << elapsed.count() << " " << pixels << " pixels "
            << std::endl;
}

constexpr auto howmany = 100000;

// contrived "random" data
inline std::vector<v_t> create_variant_data() {
  std::vector<v_t> data;
  data.reserve(howmany);
  for (u64 i = 0; i < howmany; i++) {
    switch (i % 2) {
    case 0: {
      const auto a = point{.x = 1 + i % 10, .y = 1 + i % 10};
      const auto b = point{.x = 2 + i % 20, .y = 2 + i % 20};
      data.emplace_back(line{.a = a, .b = b, .width = i % 25});
    } break;
    case 1: {
      data.emplace_back(
          circle{.xrad = i % 10, .yrad = i % 20, .center = {.x = i, .y = i}});
    } break;
    }
  }
  return data;
}

// contrived "random" data
inline std::vector<shape *> create_oop_data() {
  std::vector<shape *> data;
  data.reserve(howmany);
  for (u64 i = 0; i < howmany; i++) {
    switch (i % 2) {
    case 0: {
      const auto a = point{.x = 1 + i % 10, .y = 1 + i % 10};
      const auto b = point{.x = 2 + i % 20, .y = 2 + i % 20};
      data.push_back(new line_in{a, b, i % 25});
    } break;
    case 1: {
      data.push_back(new circle_in{i % 10, i % 20, {.x = i, .y = i}});
    } break;
    }
  }
  return data;
}

int main() {
  auto rng = std::default_random_engine{};
  {
    std::vector<v_t> v = create_variant_data();
    // we definitely want to shuffle, otherwise CPU predictors are going to
    // outsmart us and give us falsy results
    std::shuffle(v.begin(), v.end(), rng);
    run(variant_way, v, "variant");
  }
  {
    std::vector<shape *> v = create_oop_data();
    // we definitely want to shuffle, otherwise CPU predictors are going to
    // outsmart us and give us falsy results
    std::shuffle(v.begin(), v.end(), rng);
    run(inherit_way, v, "inherit_way");
  }
  return 0;
}

用clang ++ -std = C ++ 20 -O3或GCC等效汇编，

variant: 478 14148000 pixels 
inherit_way: 933 14148000 pixels

如sasenit_way可以波动的速度比std :: variant 。我怀疑这与洗牌最终的好处有关。但是，std :: variant切勿显示这类大波动。因此，可以说，对于这样的简单内容，STD ::变体的表现优于OOP。

但是，像这样写“基准”时必须非常小心，因为即使在我的示例中，我也很确定我错过了很多结果，这些事情最终会误导结果。例如，如果一个人不洗牌，则持有数据的向量，突然间，由于（可能）CPU能够推测：嘿，其他每个虚拟呼叫的间接都在此特定的特定方面，因此OOP转换略有差。地址的确定性很高。但是现实世界中的数据最终确实是这样布置的（顺便说一句，如果您知道这样的数据是这样的，那么人们甚至都不会使用std :: variant在此处使用2向量，一个持有行和一个holding circle）。

std :: variant的原因之一比这个人为示例的OOP方法快得多，与间接有关。对于std :: vector＆lt; shape*＆gt;的每个元素，至少有2个间接查找。一个用于元素的指针，一个用于虚拟表查找，而对于std :: variant，每个元素都在向量中连续布置，利用缓存和“我们”本质上是只需switch在某些类型值上，并根据哪种类型调用正确的功能。如结果所示，这比虚拟呼叫要快得多。

Several things are wrong with this question; and this is technically not necessarily an answer, but I think serves to provide some useful food for thought.

First of all, both the polymorphic classes and the types in the type union (variant) have no data. The sizes of these types therefore are 1 byte for the non-OOP "plain old data" types and 8 bytes for the OOPy types.

To construct at least a reasonable example, you would have to actually make the types hold on to some data; if not, what's the point? You're not measuring what you think you're measuring otherwise.

For instance; running this example on no optimizations, gives the same results as you've provided here.

But compiling with clang++ -std=c++20 -O3 suddenly, the results are shifted widely in std::variant's favor:

variant: 0
inherit_way: 167

I'm willing to bet that the compiler is smart enough to see that, hey, the variant case is just a vector of elements with 1 byte in size, but never changing in value, optimizing it out entirely and since the member function of the POD-types are just incrementing a global variable, that is never read, it literally has to do nothing more, therefore the 0 result.

Below, is a better "example question" - though an equally contrived one, but it actually spends time measuring what I think you're actually after.

#include <algorithm>
#include <chrono>
#include <cmath>
#include <cstdint>
#include <iostream>
#include <random>
#include <string>
#include <type_traits>
#include <variant>
#include <vector>

int i = 0;
using u64 = std::uint64_t;

static constexpr auto PI = 3.14;

// ignore this, if you don't understand it
template <class... T> constexpr bool always_false = false;

struct point {
  u64 x;
  u64 y;
};

// some imaginary call to an internal rendering function, returning pixels
// written
inline constexpr auto internal_circle_render(auto xrad, auto yrad) {
  return PI * xrad * yrad;
}

// some imaginary call to an internal rendering function, returning pixels
// written
inline constexpr auto internal_line_render(auto width, point p1, point p2) {
  return std::sqrt(std::pow((p2.x - p1.x), 2) + std::pow((p2.y - p1.y), 2)) *
         width;
}

// POD for the win
struct circle {
  u64 xrad;
  u64 yrad;
  point center;
};

struct line {
  point a;
  point b;
  u64 width;
};

template <typename T> constexpr inline auto render_shape(T &&shape) -> u64 {
  using type = std::decay_t<T>;
  if constexpr (std::is_same_v<type, line>) {
    return internal_line_render(shape.width, shape.a, shape.b);
  } else if constexpr (std::is_same_v<type, circle>) {
    return internal_circle_render(shape.xrad, shape.yrad);
  } else {
    static_assert(always_false<T>, "unknown type");
  }
}

using v_t = std::variant<circle, line>;
auto variant_way(const std::vector<v_t> &v) {
  auto total_pixels = 0;
  for (const auto &var : v)
    total_pixels += std::visit(
        [](const auto &shape) -> u64 { return render_shape(shape); }, var);
  return total_pixels;
}

// old school
class shape {
public:
  shape() = default;
  virtual ~shape() = default;
  virtual u64 render_pixels() const = 0;
};

class circle_in : public shape {
public:
  circle_in(u64 xrad, u64 yrad, point center)
      : shape(), xrad(xrad), yrad(yrad), center(center) {}
  ~circle_in() override = default;

  virtual u64 render_pixels() const override {
    return internal_circle_render(xrad, yrad);
  }

private:
  u64 xrad;
  u64 yrad;
  point center;
};

class line_in : public shape {
public:
  line_in(point a, point b, u64 width) : shape(), a(a), b(b), width(width) {}
  ~line_in() override = default;

  virtual u64 render_pixels() const override {
    return internal_line_render(width, a, b);
  }

private:
  point a;
  point b;
  u64 width;
};

auto inherit_way(const std::vector<shape *> &v) {
  auto total_pixels = 0;
  for (const auto var : v)
    total_pixels += var->render_pixels();
  return total_pixels;
}

// call and measure
template <typename F, typename D>
void run(F f, const D &data, std::string name) {
  auto start = std::chrono::high_resolution_clock::now();
  auto pixels = f(data);
  auto end = std::chrono::high_resolution_clock::now();
  auto elapsed =
      std::chrono::duration_cast<std::chrono::microseconds>(end - start);
  std::cout << name << ": " << elapsed.count() << " " << pixels << " pixels "
            << std::endl;
}

constexpr auto howmany = 100000;

// contrived "random" data
inline std::vector<v_t> create_variant_data() {
  std::vector<v_t> data;
  data.reserve(howmany);
  for (u64 i = 0; i < howmany; i++) {
    switch (i % 2) {
    case 0: {
      const auto a = point{.x = 1 + i % 10, .y = 1 + i % 10};
      const auto b = point{.x = 2 + i % 20, .y = 2 + i % 20};
      data.emplace_back(line{.a = a, .b = b, .width = i % 25});
    } break;
    case 1: {
      data.emplace_back(
          circle{.xrad = i % 10, .yrad = i % 20, .center = {.x = i, .y = i}});
    } break;
    }
  }
  return data;
}

// contrived "random" data
inline std::vector<shape *> create_oop_data() {
  std::vector<shape *> data;
  data.reserve(howmany);
  for (u64 i = 0; i < howmany; i++) {
    switch (i % 2) {
    case 0: {
      const auto a = point{.x = 1 + i % 10, .y = 1 + i % 10};
      const auto b = point{.x = 2 + i % 20, .y = 2 + i % 20};
      data.push_back(new line_in{a, b, i % 25});
    } break;
    case 1: {
      data.push_back(new circle_in{i % 10, i % 20, {.x = i, .y = i}});
    } break;
    }
  }
  return data;
}

int main() {
  auto rng = std::default_random_engine{};
  {
    std::vector<v_t> v = create_variant_data();
    // we definitely want to shuffle, otherwise CPU predictors are going to
    // outsmart us and give us falsy results
    std::shuffle(v.begin(), v.end(), rng);
    run(variant_way, v, "variant");
  }
  {
    std::vector<shape *> v = create_oop_data();
    // we definitely want to shuffle, otherwise CPU predictors are going to
    // outsmart us and give us falsy results
    std::shuffle(v.begin(), v.end(), rng);
    run(inherit_way, v, "inherit_way");
  }
  return 0;
}

Compiling this, with clang++ -std=c++20 -O3 or the gcc equivalent now produces results like

variant: 478 14148000 pixels 
inherit_way: 933 14148000 pixels

where inherit_way can fluctuate up to being ~3.5 times slower than std::variant. I'm suspecting this have to do with how favorable the shuffle ends up being. However, std::variant never shows these kinds of large fluctuations. So it's fair to say, for simple stuff like this, std::variant outperforms the OOP by a wide, wide margin.

But one has to be very careful when writing "benchmarks" like this, because, even in my example, I'm pretty sure I've missed a ton of things that ends up in the results being misleading. For instance, if one does not shuffle the vectors holding the data, suddenly the OOP-version wins by a slight margin, due to (probably) the CPU being able to speculate that: hey, every other virtual call indirection is landing at this particular address with an extremely high certainty. But data in the real world does end up being laid out like this (and btw, if you know data is laid out like that, one wouldn't even use a std::variant here, just use 2 vectors, one holding line and one holding circle).

One of the reasons why std::variant is so much faster than the OOP approach for this contrived example, has to do with indirection. For every element of std::vector<shape*> there are at least 2 indirection lookups. One for the pointer of the element, and one for the virtual table lookup, whereas for the std::variant, each element is laid out contiguously in the vector, utilizing the cache and "we" are essentially just switching on some type value and calling the correct function depending on what type. This is so much faster than the virtual call, as the results show.

回复收藏 0 原文

~没有更多了~