vftable 性能损失 vs. switch 语句
C++问题在这里。我有一个系统,其中将拥有给定超类的数百个迷你子类。它们都会有一个“foo”方法来执行某些操作。或者...我将有一个带有名为“type”的整数的类,并使用一个巨大的 switch 语句来决定当我 foo 时要做什么。
性能是这里的一个重要考虑因素。极其重要。
问题是,与让 C++ 通过 vftable 执行 switch 语句相比,使用 switch 语句的性能优势/劣势是什么?如果我将它作为 switch 语句,我可以将常见的 foo 放在 switch 语句的顶部,而不太常见的放在底部,希望能缩短比较的时间。即使我能弄清楚如何做到这一点,尝试使用 vftable 获得这样的效果也必然依赖于编译器......
另一方面,如果没有这些丑陋的 switch 语句,我的代码会更容易处理。
C++ question here. I have a system where I'm going to have hundreds of mini-subclasses of a given superclass. They all will have a "foo" method that does something. Or... I'm going to have one class with an integer called "type" and use a giant switch statement to decide what to do when I foo.
Performance is a huge consideration here. Extremely important.
The question is, what are the performance benefits/penalties of using a switch statement vs. letting C++ do it via the vftable? If I have it as a switch statement, I can put the commonly occuring foo's up at the top of the switch statement and the less common ones at the bottom, hopefully shortcutting the comparison. Trying to get an effect like this with the vftable is bound to be compiler dependent even if I can figure out how to do it...
On the other hand, my code would be a lot easier to deal with without these ugly switch statements.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
在虚拟机设计领域已经有一些关于这个主题的研究。一般来说,switch 语句会更快,许多虚拟机使用 switch 语义而不是虚拟查找。从理论上讲,人们会假设虚拟表(作为一种恒定时间算法)会更快,但我们必须检查硬件如何看待虚拟表。
switch 语句更容易让编译器内联。这是一个巨大的考虑因素,调用虚拟函数的实际行为很少,但是,推入和弹出整个堆栈帧是必要的,因为编译器不知道在运行时将调用哪个函数。
尽管现代架构在预测虚拟调用方面做得越来越好,但 switch 语句上的分支预测和硬件预取应该更容易。
许多使用虚拟调度的代码需要使用基于堆的分配方案。动态内存分配是许多 C++ 应用程序的瓶颈。
There's been some research on this topic in the field of virtual machine design. Generally, a switch statement is going to be faster, a lot of virtual machines use switch semantics as opposed to virtual lookup. Theoretically, one would assume that a virtual table - being a constant time algorithm - will be faster, but we have to examine how the hardware sees a virtual table.
A switch statement is easier for the compiler to inline. This is a huge consideration, the actual act of calling a virtual function is minimal, however, pushing and popping the entire stack frame is necessary because the compiler has no idea which function will be called at run-time.
Branch prediction and hardware prefetch should be easier on a switch statement, although modern architectures are getting better at predicting virtual calls.
A lot of code that uses virtual dispatch requires the use of heap based allocation schemes. Dynamic memory allocation is a bottleneck in a lot C++ applications.
switch
语句通常编译为 跳转表,而不是正如您的问题所暗示的那样,是一个if-else
条件块。在实践中,虚拟表和switch
跳转表应该具有相似的性能,但如果您真的担心的话,请进行测试。A
switch
statement is generally compiled to a jump table rather than a block ofif-else
conditionals as your question implies. In practice, the virtual table and theswitch
jump table should have similar performance, though test if you're really concerned.编译器决定如何处理 switch 语句,但它们使用了一些基本技术。
case 语句位于 switch 语句中的情况在任何一种情况下都没有区别。
与直接调用相比,虚函数有一定的开销。它涉及额外的偏移量和指针查找。对于除了最极端的性能考虑之外的所有情况,此成本可以忽略不计。与交换机相比,开销不在于虚拟查找,而在于函数调用本身。因此,在每种情况下仅调用函数的 switch 语句的执行效果与虚拟函数基本相同。
因此,与虚拟函数调用相比,switch 语句(带有跳转表)的“调度语义”本质上几乎是无关紧要的。如果所有“foo”方法都相对较小并且可以内联,则 switch 语句将开始执行得更好。 switch 的另一个优点是您可以将通用代码放在 switch 之前并获得更好的寄存器/堆栈优化。
然而,存在大量的维护开销。这应该是您此时最关心的问题。为什么?因为代码中的性能瓶颈不太可能是切换登录,甚至不是函数调用,而是其他东西。在解决其他问题之前,解决这些低级性能问题是没有意义的。因此,请坚持使用目前提供更可维护代码的那个。
The compiler determines how the switch statements are handled, but there are a few basic techniques they use.
Where the case statements are located in the switch statement makes no difference in either case.
Virtual functions have an overhead compared to direct call. It involves an additional offset and pointer lookup. For all but the most extreme performance considerations this cost is negligible. When comparing to a switch the overhead is not in the virtual lookup, but the function call itself. So a switch statement that simply calls functions in each case will perform basically the same as virtual functions.
So essentially the "dispatch semantics" of a switch statement (with jump table) compared to a virtual function call are nearly irrelevant. If all your "foo" methods are relatively small and can be inlined the switch statement will start to perform better. The other advantage of switch is that you can put common code before the switch and get better register/stack optimizations.
However, there is a significant maintenance overhead. This should be your primary concern at this point. Why? Because the performance bottle-neck in your code is not likely the switching login, or even the function calls, but something else. Until you fix that something else there is no point in addressing these low-level performance issues. So stick with whichever provides more maintainable code at the moment.
对于这里的其他答案,我想再添加两个。
1) 对于编译器来说,跨虚拟函数调用接口执行经典优化(包括注册)比跨单个函数中 switch 语句中的 case 标记语句更困难且不太常见。
2) 调度中的任何性能差异都高度依赖于处理器的分支预测硬件。即使是虚拟函数调用目标地址(和返回)也可以被正确预测,并且在现代乱序处理器的管道中具有可以忽略不计的性能开销。
如果此操作的性能确实很重要,那么您确实必须在真实系统的上下文中尝试两种方法并对其进行测量。
快乐黑客!
To the other answers here I would add two more.
1) It is more difficult and less common for a compiler to perform classic optimizations (including enregistration) across a virtual function call interface than across case labeled statements in a switch statement in a single function.
2) Any performance difference in the dispatch is highly depedendent on the processor's branch prediction hardware. Even a virtual function call target address (and return) may be correctly predicted and have negligible performance overhead in the pipeline of a modern out-of-order processor.
If the performance of this operation really matters, you really have to try it both ways and measure it, in the context of the real system.
Happy hacking!
Vtable 在几乎所有情况下都应该更快,但如果性能如此关键,正确的问题是快多少。
Vtable调用是三重间接寻址(三次内存访问以获得目标CALL地址)。如果有很多调用,缓存未命中应该不是问题。因此,大约需要 2-3 次开关标签比较(尽管后者 CPU 缓存未命中的机会更少,但管道使用的机会更少)。
当然,您不应该依赖我在这里所说的任何内容,并在您的目标架构上使用真实的性能测量来测试所有内容。
Vtable should be faster in nearly all cases, but if performance is so critical, the right thing to ask is by how much.
Vtable call is triple indirection (three memory accesses to get the target CALL address). Cache misses should not be an issue if there're many calls. So, it is roughly 2-3 switch label comparisons (though the latter offer even less chance for CPU cache miss, but less for pipe usage).
You should of course not rely on anything I said here, and test it all with true performance measurements on your target architecture.