当前位置：文江博客话题详情

虚函数与指针转换的比较

发布于 2024-11-24 16:45:49 字数 1671 浏览 4 评论 0 原文

我正在使用的某些代码的当前版本使用了一种稍微奇怪的方式来实现我认为可以通过多态性实现的东西。更具体地说，我们目前使用类似的东西

for(int i=0; i<CObjList.size(); ++i)
{
   CObj* W = CObjList[i];
   if( W->type == someTypeA )
   {
       // do some things which also involve casts such as
       //  ((SomeClassA*) W->objectptr)->someFieldA
   }
   else if( W->type == someTypeB )
   {
       // do some things which also involve casting such as
       //  ((SomeClassB*) W->objectptr)->someFieldB
   }
}

来澄清；每个对象 W 都包含一个 void *objectptr; ，即指向任意位置的指针。字段 W->type 跟踪 objectptr 指向的对象类型，以便在 if/else 语句中我们可以强制转换 W->objectptr 为正确的类型并使用它的字段。

然而，从代码设计的角度来看，这似乎本质上很糟糕，原因如下：

我们无法保证 W->objectptr 指向的对象实际上与 W->type 中所述的内容匹配，因此强制转换本质上是不安全的。
每次我们希望添加另一种类型时，我们必须添加另一个 elseif 语句并确保 W->type 设置正确。

似乎用类似这样的方法可以更好地解决这个问题，

class CObj
{
public:
   virtual void doSomething(/* some params */)=0;
};

class SomeClassA : public CObj
{
public:
   virtual void doSomething(/* some params */);
   int someFieldA;
}

class SomeClassB : public CObj
{
public:
   virtual void doSomething(/* some params */);
   int someFieldB;
}

// sometime later...

for(int i=0; i<CObjList.size(); ++i)
{
   CObj* W = CObjList[i];
   W->doSomething(/* some params */);
}

有人说过，在这种情况下，性能很重要。该代码将从（相对）紧密的循环中调用。

那么我的问题是；改进的代码设计和可扩展性是否超过了一些 vtable 查找所增加的复杂性，这是否可能会严重影响性能？

编辑：我发现，由于缓存未命中等原因，以这种方式通过指针访问字段可能与 vtable 查找一样糟糕。对此有什么想法吗？

---- 编辑2：我还忘记提及（我知道这有点偏离原来的主题），if 语句内部有许多对周围类的成员函数的调用。您将如何设计结构以便能够从 doSomething() 内部调用它们？

原文

The current version of some code I'm using utilises a slightly odd way of acheiving something which I think could be acheived with polymorphism. More concretely we currently use something like

for(int i=0; i<CObjList.size(); ++i)
{
   CObj* W = CObjList[i];
   if( W->type == someTypeA )
   {
       // do some things which also involve casts such as
       //  ((SomeClassA*) W->objectptr)->someFieldA
   }
   else if( W->type == someTypeB )
   {
       // do some things which also involve casting such as
       //  ((SomeClassB*) W->objectptr)->someFieldB
   }
}

To clarify; each object W contains a void *objectptr; that is to say a pointer to an arbitrary location. The field W->type keeps track of what type of object objectptr points at so that inside our if/else statements we can cast W->objectptr to the correct type and use it's fields.

However, this seems inherently bad from a code design stand point for several reasons;

We have no guarantee that the object pointed to by W->objectptr actually matches what is said in W->type so the cast is inherently unsafe.
Every time we wish to add another type we must add another elseif statement and ensure W->type is set correctly.

It seems to be this would be much better solved with something like

class CObj
{
public:
   virtual void doSomething(/* some params */)=0;
};

class SomeClassA : public CObj
{
public:
   virtual void doSomething(/* some params */);
   int someFieldA;
}

class SomeClassB : public CObj
{
public:
   virtual void doSomething(/* some params */);
   int someFieldB;
}

// sometime later...

for(int i=0; i<CObjList.size(); ++i)
{
   CObj* W = CObjList[i];
   W->doSomething(/* some params */);
}

This having been said there is the proviso that in this setting performace is important. This code will be called from a (relatively) tight loop.

My question is then; is the added complexity of a few vtable lookups outweighed by the improved code design and extensibility and is this likely to affect performace alot?

EDIT: It occurs to me that accessing the fields through a pointer in this way could be as bad as vtable lookups anyway due to cache misses etc. Any thoughts on this?

---- EDIT 2: Also I forgot to mention (and I know it's a bit off the original topic), inside the if statements are many calls to member functions of the surrounding class. How would you design the structure so as to be able to call these from inside doSomething()?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

爱本泡沫多脆弱 2024-12-01 16:45:49

我将专门从性能角度回答，因为我在性能关键的环境中工作，不久前我碰巧对类似的案例进行了测量，以找出最快的解决方案。

如果您使用的是 x86、PPC 或 ARM 处理器，则在这种情况下您需要虚拟函数。调用虚函数的性能成本主要是管道气泡引起的错误预测间接分支。因为 CPU 的取指令阶段无法知道计算出的 jmp 去了哪里，所以在分支执行之前它无法开始从目标地址获取字节，因此管道中会出现与之间的阶段数相对应的停顿。第一个获取阶段和分支退出。（在我最了解的 PPC 上，大约是 25 个周期。）

您还存在加载 vtable 指针的延迟，但这通常会被指令重新排序所隐藏（编译器移动 load 指令，因此在您真正需要结果之前启动几个周期，CPU 会在数据缓存向您发送其电子时执行其他工作。）

使用 if 级联方法，您可以拥有一些n 的直接条件分支 —目标在编译时已知时间，但是否跳转是在运行时确定的。（即 jump-on-equal 操作码。）在这种情况下，CPU将猜测（预测）每个分支是否被采用，并开始相应地获取指令。因此，如果 CPU 猜测错误，就会出现泡沫。由于您可能每次都使用不同的输入调用此函数，因此它至少会错误预测这些分支之一，并且您将得到与虚拟值完全相同的气泡。事实上，您将会有更多的气泡——每个 if() 条件一个气泡！

对于虚拟函数，还存在加载 vtable 时额外的数据缓存未命中以及跳转目标上的 icache 未命中的风险。如果这个函数处于紧密循环中，那么您可能会多次查找和调用相同的子例程，因此 vtable 和函数体可能仍然在缓存中。如果您想真正确定的话，您可以测量。

I'm going to answer specifically on the performance angle, because I work in a perf-critical environment and a while ago I happened to run measurements on a similar case to work out the fastest solution.

If you are on an x86, PPC, or ARM processor, you want virtual functions in this situation. The performance cost of calling a virtual function is mostly the pipeline bubble induced by mispredicting an indirect branch. Because the instruction fetch stage of the CPU can't know where the computed jmp goes, it can't start fetching bytes from the target address until the branch executes, and thus you have a stall in the pipeline corresponding to the number of stages between the first fetch stage and the branch retire. (On the PPC I know best, that's something like 25 cycles.)

You also have the latency of loading the vtable pointer, but this is often hidden by instruction reordering (the compiler moves the load instruction so it starts several cycles before you actually need the result and the CPU does other work while the data cache sends you its electrons.)

With the if-cascade approach you instead have some number n of direct, conditional branches — where the target is known at compile time, but whether the jump is taken is determined at runtime. (ie, a jump-on-equal opcode.) In this case the CPU will make a guess (predict) at whether each branch is taken or not, and start fetching instructions accordingly. So, you will only have a bubble if the CPU guesses wrong. Since you are presumably calling this function with different input each time, it's going to mispredict at least one of these branches, and you'll have the exact same bubble that you would with virtuals. In fact, you'll have a whole lot more bubbles — one per if() conditional!

With virtual functions, there's also the risk of an additional data cache miss on loading the vtable, and an icache miss on the jump target. If this function is in a tight loop, then presumably you'll be looking up and calling the same subroutines a lot, and thus the vtable and function bodies will probably still be in cache. You could measure that if you wanted to be really sure.

回复收藏 0 原文

浅唱々樱花落 2024-12-01 16:45:49

使用虚函数，这种假设的优化没有任何意义。重要的是代码的可读性、可维护性和质量。

如果您确实需要调整热点，稍后可以借助分析器进行优化。用这种垃圾让你的代码无法维护就是一条通往失败的道路。

此外，虚拟函数将帮助您进行单元测试、模拟接口等。
编程就是管理复杂性......

回复收藏 0 原文

鲜肉鲜肉永远不皱 2024-12-01 16:45:49

那么我的问题是；改进的代码设计和可扩展性是否超过了一些 vtable 查找增加的复杂性，这是否可能会严重影响性能？

C++ 编译器应该能够非常有效地实现虚拟函数，因此我认为使用它们没有什么缺点。（当然还有巨大的可维护性/可读性优势！）但是您应该进行测量以确保。

它们通常的实现方式是每个对象都有一个 vtable 指针。（在多重继承的情况下使用多个指针，但现在让我们忘记这一点）与非虚函数相比，它具有以下相对成本。

数据空间：每个对象一个指针
数据空间：每个类一个虚函数表（不是每个对象！）
时间：最坏情况 = 每个函数调用两次内存读取（1 次获取虚函数表地址，1 次获取虚函数表内的函数地址）。 vtable 中的偏移量在编译时是已知的，因为您知道正在调用哪个函数。没有额外的跳跃。

将此与现有软件的非 OOP 方法的成本进行比较。

数据空间：每个对象一个类型 ID
代码空间：每次您希望根据对象类型调用一个函数时一个 if/else 树或 switch 语句
时间：必须评估 if/else 树或 switch 语句。

我投票支持虚拟函数方法，因为它实际上比非 OOP 方法更快，因为它不需要花时间弄清楚它是什么类型的对象。

回复收藏 0 原文

耀眼的星火 2024-12-01 16:45:49

我对一些大型（我认为超过 1M 行）科学计算代码有一些经验，这些代码使用类似的基于类型的开关构造。他们重构为适当的基于多态性的方法，并获得了显着的加速。和他们预想的完全相反！

事实证明，编译器能够更好地优化该结构中的某些内容。

然而，这是很久以前的事了（大约 8 年）..所以谁知道现代编译器会做什么。不要猜测 - 对其进行概要分析。

回复收藏 0 原文

淡看悲欢离合 2024-12-01 16:45:49

正如 piotr 所说，正确的答案可能是虚拟函数。你必须进行测试。

但要解决您对强制转换的担忧：

切勿在 C++ 程序中使用 C 风格强制转换，请使用 static_cast<>、dynamic_cast<> 等。等等..
在您的具体情况下，使用dynamic_cast<>。至少如果类型没有正确关联，你会得到一个异常，这比疯狂崩溃要好。

回复收藏 0 原文

荭秂 2024-12-01 16:45:49

对于此类情况，CRTP 将是一个好主意。

编辑：在您的情况下，

template<class T>
class CObj
{
public:
   void doSomething(/* some params */)
   {
     static_cast<T*>(this)->doSomething(...);
   }
};

class SomeClassA : public CObj<SomeClassA>
{
public:
   void doSomething(/* some params */);
   int someFieldA;
};

class SomeClassB : public CObj<<SomeClassB>
{
public:
   void doSomething(/* some params */);
   int someFieldB;
};

现在您可能必须以不同的方式选择循环代码以适应不同CObj类型的所有对象。

CRTP would be a great idea for such kind of cases.

Edit: In your case,

template<class T>
class CObj
{
public:
   void doSomething(/* some params */)
   {
     static_cast<T*>(this)->doSomething(...);
   }
};

class SomeClassA : public CObj<SomeClassA>
{
public:
   void doSomething(/* some params */);
   int someFieldA;
};

class SomeClassB : public CObj<<SomeClassB>
{
public:
   void doSomething(/* some params */);
   int someFieldB;
};

Now you may have to choose your loop code in different way to accommodate all objects of different CObj<T> type.

回复收藏 0 原文

~没有更多了~