我可以在这里使用奇怪的重复模板模式(C++)吗?

发布于 2024-07-24 09:21:24 字数 1013 浏览 5 评论 0原文

我有一个 C++ 应用程序,可以简化为如下内容:

class AbstractWidget {
 public:
  virtual ~AbstractWidget() {}
  virtual void foo() {}
  virtual void bar() {}
  // (other virtual methods)
};

class WidgetCollection {
 private:
  vector<AbstractWidget*> widgets;

 public:
  void addWidget(AbstractWidget* widget) {
    widgets.push_back(widget);
  }

  void fooAll() {
    for (unsigned int i = 0; i < widgets.size(); i++) {
      widgets[i]->foo();
    }
  }

  void barAll() {
    for (unsigned int i = 0; i < widgets.size(); i++) {
      widgets[i]->bar();
    }
  }

  // (other *All() methods)
};

我的应用程序对性能至关重要。 该集合中通常有数千个小部件。 从 AbstractWidget 派生的类(其中有几十个)通常不会覆盖许多虚拟函数。 被覆盖的那些通常具有非常快的实现。

鉴于此,我觉得我可以通过一些巧妙的元编程来优化我的系统。 目标是利用函数内联并避免虚拟函数调用,同时保持代码的可管理性。 我研究了奇怪的重复模板模式(请参阅此处了解说明)。 这似乎几乎做我想要的,但不完全是。

有什么方法可以让 CRTP 在这里为我工作吗? 或者,还有其他人能想到的聪明的解决方案吗?

I have a C++ application that can be simplified to something like this:

class AbstractWidget {
 public:
  virtual ~AbstractWidget() {}
  virtual void foo() {}
  virtual void bar() {}
  // (other virtual methods)
};

class WidgetCollection {
 private:
  vector<AbstractWidget*> widgets;

 public:
  void addWidget(AbstractWidget* widget) {
    widgets.push_back(widget);
  }

  void fooAll() {
    for (unsigned int i = 0; i < widgets.size(); i++) {
      widgets[i]->foo();
    }
  }

  void barAll() {
    for (unsigned int i = 0; i < widgets.size(); i++) {
      widgets[i]->bar();
    }
  }

  // (other *All() methods)
};

My application is performance-critical. There are typically thousands of widgets in the collection. The classes derived from AbstractWidget (of which there are dozens) typically leave many of the virtual functions not overridden. The ones that are overridden typically have very fast implementations.

Given this, I feel I can optimize my system with some clever meta-programming. The goal is to leverage function inlining and to avoid virtual function calls, while keeping the code managable. I've looked into the Curiously Recurring Template Pattern (see here for description). This seems to almost do what I want, but not quite.

Is there any way to make the CRTP work for me here? Or, is there any other clever solution anyone can think of?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

假装爱人 2024-07-31 09:21:25

模拟动态绑定(CRTP 还有其他用途)适用于基类认为自己是多态的,但客户端实际上只关心一个特定的派生类。 例如,您可能拥有代表某些特定于平台的功能的接口的类,并且任何给定的平台只需要一个实现。 该模式的要点是将基类模板化,以便即使有多个派生类,基类也可以在编译时知道正在使用哪一个。

当您真正需要运行时多态性时,它对您没有帮助,例如当您有一个 AbstractWidget* 容器时,每个元素可以是多个派生类之一,并且您必须迭代它们。 在 CRTP(或任何模板代码)中,base<衍生1>base<衍生2> 是不相关的类。 因此,衍生1衍生2也是如此。 它们之间不存在动态多态性,除非它们有另一个公共基类,但随后您又回到了虚拟调用开始的地方。

通过将向量替换为多个向量,您可能会获得一些加速:一个向量用于您了解的每个派生类,一个通用向量用于稍后添加新的派生类并且不更新容器时。 然后 addWidget 执行一些(缓慢的)typeid 检查或对小部件的虚拟调用,以将小部件添加到正确的容器,并且当调用者知道运行时类时可能会进行一些重载。 请小心,不要意外地将 WidgetIKnowAbout 的子类添加到 WidgetIKnowAbout* 向量中。 fooAllbarAll 可以依次循环每个容器,(快速)调用非虚拟 fooImplbarImpl然后将被内联的函数。 然后,它们循环遍历希望小得多的 AbstractWidget* 向量,调用虚拟 foobar 函数。

这有点混乱,而且不是纯粹的面向对象,但如果几乎所有小部件都属于容器知道的类,那么您可能会看到性能的提高。

请注意,如果大多数小部件属于您的容器不可能知道的类(例如,因为它们位于不同的库中),那么您不可能进行内联(除非您的动态链接器可以内联。我的不能)。 您可以通过修改成员函数指针来降低虚拟调用开销,但收益几乎可以肯定是可以忽略不计的,甚至是负的。 虚拟调用的大部分开销都在调用本身,而不是虚拟查找,并且通过函数指针的调用不会被内联。

从另一个角度来看:如果要内联代码,这意味着不同类型的实际机器代码必须不同。 这意味着您需要多个循环,或者一个带有开关的循环,因为根据从集合中拉出的某个指针的类型,机器代码显然无法在每次循环时在 ROM 中更改。

好吧,我想也许该对象可能包含一些 asm 代码,循环将其复制到 RAM 中,标记可执行文件,然后跳转到其中。 但这不是 C++ 成员函数。 而且它不能便携地完成。 由于复制和 icache 失效,它甚至可能不会很快。 这就是虚拟通话存在的原因......

Simulated dynamic binding (there are other uses of CRTP) is for when the base class thinks of itself as being polymorphic, but clients only actually care about one particular derived class. So for instance you might have classes representing an interface into some platform-specific functionality, and any given platform will only ever need one implementation. The point of the pattern is to templatize the base class, so that even though there are multiple derived classes, the base class knows at compile time which one is in use.

It doesn't help you when you genuinely need runtime polymorphism, such as for example when you have a container of AbstractWidget*, each element can be one of several derived classes, and you have to iterate over them. In CRTP (or any template code), base<derived1> and base<derived2> are unrelated classes. Hence so are derived1 and derived2. There's no dynamic polymorphism between them unless they have another common base class, but then you're back where you started with virtual calls.

You might get some speedup by replacing your vector with several vectors: one for each of the derived classes that you know about, and one generic one for when you add new derived classes later and don't update the container. Then addWidget does some (slow) typeid checking or a virtual call to the widget, to add the widget to the correct container, and maybe has some overloads for when the caller knows the runtime class. Be careful not to accidentally add a subclass of WidgetIKnowAbout to the WidgetIKnowAbout* vector. fooAll and barAll can loop over each container in turn making (fast) calls to non-virtual fooImpl and barImpl functions that will then be inlined. They then loop over the hopefully much smaller AbstractWidget* vector, calling the virtual foo or bar functions.

It's a bit messy and not pure-OO, but if almost all your widgets belong to classes that your container knows about, then you might see a performance increase.

Note that if most widgets belong to classes that your container cannot possibly know about (because they're in different libraries, for example), then you can't possibly have inlining (unless your dynamic linker can inline. Mine can't). You could drop the virtual call overhead by messing about with member function pointers, but the gain would almost certainly be negligible or even negative. Most of the overhead of a virtual call is in the call itself, not the virtual lookup, and calls through function pointers will not be inlined.

Look at it another way: if the code is to be inlined, that means the actual machine code has to be different for the different types. This means you need either multiple loops, or a loop with a switch in it, because the machine code clearly can't change in ROM on each pass through the loop, according to the type of some pointer pulled out of a collection.

Well, I guess maybe the object could contain some asm code that the loop copies into RAM, marks executable, and jumps into. But that's not a C++ member function. And it can't be done portably. And it probably wouldn't even be fast, what with the copying and the icache invalidation. Which is why virtual calls exist...

一身软味 2024-07-31 09:21:25

CRTP 或编译时多态性适用于您在编译时知道所有类型的情况。 只要您使用 addWidget 在运行时收集小部件列表,并且只要 fooAllbarAll 就必须处理为了在运行时获得同质的小部件列表,您必须能够在运行时处理不同的类型。 因此,对于您提出的问题,我认为您陷入了使用运行时多态性的困境。

当然,一个标准答案是在尝试避免运行时多态性之前验证它的性能是否是一个问题......

如果您确实需要避免运行时多态性,那么以下解决方案之一可能会起作用。

选项 1:使用小部件的编译时集合

如果 WidgetCollection 的成员在编译时已知,那么您可以非常轻松地使用模板。

template<typename F>
void WidgetCollection(F functor)
{
  functor(widgetA);
  functor(widgetB);
  functor(widgetC);
}

// Make Foo a functor that's specialized as needed, then...

void FooAll()
{
  WidgetCollection(Foo);
}

选项 2:用自由函数替换运行时多态性

class AbstractWidget {
 public:
  virtual AbstractWidget() {}
  // (other virtual methods)
};

class WidgetCollection {
 private:
  vector<AbstractWidget*> defaultFooableWidgets;
  vector<AbstractWidget*> customFooableWidgets1;
  vector<AbstractWidget*> customFooableWidgets2;      

 public:
  void addWidget(AbstractWidget* widget) {
    // decide which FooableWidgets list to push widget onto
  }

  void fooAll() {
    for (unsigned int i = 0; i < defaultFooableWidgets.size(); i++) {
      defaultFoo(defaultFooableWidgets[i]);
    }
    for (unsigned int i = 0; i < customFooableWidgets1.size(); i++) {
      customFoo1(customFooableWidgets1[i]);
    }
    for (unsigned int i = 0; i < customFooableWidgets2.size(); i++) {
      customFoo2(customFooableWidgets2[i]);
    }
  }
};

丑陋,而且确实不是面向对象的。 模板可以通过减少列出每个特殊情况的需要来帮助解决这个问题; 尝试类似以下内容(完全未经测试),但在这种情况下您又回到了无内联状态。

class AbstractWidget {
 public:
  virtual AbstractWidget() {}
};

class WidgetCollection {
 private:
  map<void(AbstractWidget*), vector<AbstractWidget*> > fooWidgets;

 public:
  template<typename T>
  void addWidget(T* widget) {
    fooWidgets[TemplateSpecializationFunctionGivingWhichFooToUse<widget>()].push_back(widget);
  }

  void fooAll() {
    for (map<void(AbstractWidget*), vector<AbstractWidget*> >::const_iterator i = fooWidgets.begin(); i != fooWidgets.end(); i++) {
      for (unsigned int j = 0; j < i->second.size(); j++) {
        (*i->first)(i->second[j]);
      }
    }
  }
};

选项 3:消除 OO

OO 很有用,因为它有助于管理复杂性,并且有助于在面对变化时保持稳定性。 对于您似乎描述的情况 - 数千个小部件,其行为通常不会改变,并且其成员方法非常简单 - 您可能没有太多复杂性或需要管理的更改。 如果是这样的话,那么您可能不需要 OO。

此解决方案与运行时多态性相同,只是它要求您维护“虚拟”方法和已知子类(不是面向对象)的静态列表,并且它允许您使用内联函数的跳转表替换虚拟函数调用。

class AbstractWidget {
 public:
  enum WidgetType { CONCRETE_1, CONCRETE_2 };
  WidgetType type;
};

class WidgetCollection {
 private:
  vector<AbstractWidget*> mWidgets;

 public:
  void addWidget(AbstractWidget* widget) {
    widgets.push_back(widget);
  }

  void fooAll() {
    for (unsigned int i = 0; i < widgets.size(); i++) {
      switch(widgets[i]->type) {
        // insert handling (such as calls to inline free functions) here
      }
    }
  }
};

CRTP or compile-time polymorphism is for when you know all of your types at compile time. As long as you're using addWidget to collect a list of widgets at runtime and as long as fooAll and barAll then have to handle members of that homogenous list of widgets at runtime, you have to be able to handle different types at runtime. So for the problem you've presented, I think you're stuck using runtime polymorphism.

A standard answer, of course, is to verify that the performance of runtime polymorphism is a problem before you try to avoid it...

If you really need to avoid runtime polymorpism, then one of the following solutions may work.

Option 1: Use a compile-time collection of widgets

If your WidgetCollection's members are known at compile time, then you can very easily use templates.

template<typename F>
void WidgetCollection(F functor)
{
  functor(widgetA);
  functor(widgetB);
  functor(widgetC);
}

// Make Foo a functor that's specialized as needed, then...

void FooAll()
{
  WidgetCollection(Foo);
}

Option 2: Replace runtime polymorphism with free functions

class AbstractWidget {
 public:
  virtual AbstractWidget() {}
  // (other virtual methods)
};

class WidgetCollection {
 private:
  vector<AbstractWidget*> defaultFooableWidgets;
  vector<AbstractWidget*> customFooableWidgets1;
  vector<AbstractWidget*> customFooableWidgets2;      

 public:
  void addWidget(AbstractWidget* widget) {
    // decide which FooableWidgets list to push widget onto
  }

  void fooAll() {
    for (unsigned int i = 0; i < defaultFooableWidgets.size(); i++) {
      defaultFoo(defaultFooableWidgets[i]);
    }
    for (unsigned int i = 0; i < customFooableWidgets1.size(); i++) {
      customFoo1(customFooableWidgets1[i]);
    }
    for (unsigned int i = 0; i < customFooableWidgets2.size(); i++) {
      customFoo2(customFooableWidgets2[i]);
    }
  }
};

Ugly, and really not OO. Templates could help with this by reducing the need to list every special case; try something like the following (completely untested), but you're back to no inlining in this case.

class AbstractWidget {
 public:
  virtual AbstractWidget() {}
};

class WidgetCollection {
 private:
  map<void(AbstractWidget*), vector<AbstractWidget*> > fooWidgets;

 public:
  template<typename T>
  void addWidget(T* widget) {
    fooWidgets[TemplateSpecializationFunctionGivingWhichFooToUse<widget>()].push_back(widget);
  }

  void fooAll() {
    for (map<void(AbstractWidget*), vector<AbstractWidget*> >::const_iterator i = fooWidgets.begin(); i != fooWidgets.end(); i++) {
      for (unsigned int j = 0; j < i->second.size(); j++) {
        (*i->first)(i->second[j]);
      }
    }
  }
};

Option 3: Eliminate OO

OO is useful because it helps manage complexity and because it helps maintain stability in the face of change. For the circumstances you seem to be describing - thousands of widgets, whose behavior generally doesn't change, and whose member methods are very simple - you may not have much complexity or change to manage. If that's the case, then you may not need OO.

This solution is the same as runtime polymorphism, except that it requires that you maintain a static list of "virtual" methods and known subclasses (which is not OO) and it lets you replace virtual function calls with a jump table to inlined functions.

class AbstractWidget {
 public:
  enum WidgetType { CONCRETE_1, CONCRETE_2 };
  WidgetType type;
};

class WidgetCollection {
 private:
  vector<AbstractWidget*> mWidgets;

 public:
  void addWidget(AbstractWidget* widget) {
    widgets.push_back(widget);
  }

  void fooAll() {
    for (unsigned int i = 0; i < widgets.size(); i++) {
      switch(widgets[i]->type) {
        // insert handling (such as calls to inline free functions) here
      }
    }
  }
};
倒数 2024-07-31 09:21:25

最简洁的答案是不。

长答案(或者与其他一些答案相比仍然很短:-)

您正在动态地尝试找出在运行时执行的函数(这就是虚拟函数)。 如果您有一个向量(其成员在编译时无法确定),那么无论您尝试什么,您都无法弄清楚如何内联函数。

唯一需要注意的是,如果向量始终包含相同的元素(即您可以计算出编译时将在运行时执行的内容)。 然后您可以重新处理这个问题,但它需要向量以外的东西来保存元素(可能是一个将所有元​​素作为成员的结构)。

另外,您真的认为虚拟调度是瓶颈吗?
我个人对此非常怀疑。

The short answer is no.

The long answer (or still short campared to some other answers :-)

You are dynamically trying to figure out what function to execute at runtime (that is what virtual functions are). If you have a vector (whoses members can not be determined at compile time) then you can not work out how to inline the functions no matter what you try.

The only caviat to that is that if the vectors always contain the same elements (ie you could work out a compile time what is going to be executed at runtime). You could then re-work this but it would require somthing other than a vector to hold the elements (probably a structure with all the elements as members).

Also, do you really think that virtual dispatch is a bottleneck?
Personally I highly doubt it.

墨洒年华 2024-07-31 09:21:25

您将在这里遇到的问题是 WidgetCollection::widgets。 向量只能包含一种类型的项,并且使用 CRTP 要求每个 AbstractWidget 具有不同的类型,并由所需的派生类型模板化。 也就是说,您的 AbstractWidget 看起来像这样:

template< class Derived >
class AbstractWidget {
    ...
    void foo() {
        static_cast< Derived* >( this )->foo_impl();
    }        
    ...
}

这意味着具有不同 Derived 类型的每个 AbstractWidget 将构成不同的类型 <代码>AbstractWidget< 派生>。 将这些全部存储在单个向量中是行不通的。 所以看起来,在这种情况下,虚拟函数是正确的选择。

The problem that you will have here is with WidgetCollection::widgets. A vector can only contain items of one type, and using the CRTP requires that each AbstractWidget have a different type, templatized by the desired derived type. That is, you're AbstractWidget would look something like this:

template< class Derived >
class AbstractWidget {
    ...
    void foo() {
        static_cast< Derived* >( this )->foo_impl();
    }        
    ...
}

Which means that each AbstractWidget with a different Derived type would constitute a different type AbstractWidget< Derived >. Storing these all in a single vector won't work. So it looks like, in this case, virtual functions are the way to go.

伴我心暖 2024-07-31 09:21:25

如果您需要它们的向量,则不是。 STL 容器是完全同类的,这意味着如果您需要将 widgetA 和 widgetB 存储在同一个容器中,则它们必须从共同的父容器继承。 而且,如果 widgetA::bar() 执行的操作与 widgetB::bar() 不同,则必须将函数设为虚拟。

所有小部件都需要位于同一个容器中吗? 你可以做类似的事情

vector<widgetA> widget_a_collection;
vector<widgetB> widget_b_collection;

,然后函数就不需要是虚拟的了。

Not if you need a vector of them. The STL containers are completely homogeneous, which means that if you need to store a widgetA and a widgetB in the same container, they must be inherited from a common parent. And, if widgetA::bar() does something different than widgetB::bar(), you have to make the functions virtual.

Do all of the widgets need to be in the same container? You could do something like

vector<widgetA> widget_a_collection;
vector<widgetB> widget_b_collection;

And then the functions wouldn't need to be virtual.

·深蓝 2024-07-31 09:21:25

很可能在您完成所有这些努力之后,您将看不到性能差异。

这绝对是错误的优化方式。 您不会通过更改随机代码行来修复逻辑错误,对吗? 不,那太愚蠢了。 在您首先找到哪些行实际上导致问题之前,您不会“修复”代码。 那么为什么您会以不同的方式对待性能错误呢?

您需要分析您的应用程序并找到真正的瓶颈所在。 然后加速该代码并重新运行分析器。 重复此操作,直到性能错误(执行速度太慢)消失。

Odds are that after you go through all that effort, you will see no performance difference.

This is absolutely the wrong way to optimize. You wouldn't fix a logic bug by changing random lines of code would you? No, that's silly. You don't "fix" code until you first find which lines are actually causing your problem. So why would you treat performance bugs differently?

You need to profile your application and find where the real bottlenecks are. Then speed up that code and rerun the profiler. Repeat until the performance bug (too slow execution) is gone.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文