动态_cast的性能?

发布于 2024-09-29 22:10:28 字数 1003 浏览 9 评论 0原文

在阅读问题之前:
这个问题不是关于使用dynamic_cast有多大用处。这只是关于它的性能。

我最近开发了一种经常使用 dynamic_cast 的设计。
在与同事讨论时,几乎每个人都说不应该使用 dynamic_cast,因为它的性能很差(这些同事具有不同的背景,并且在某些情况下彼此不认识。我我在一家大公司工作)

我决定测试这种方法的性能,而不是仅仅相信它们。

使用了以下代码:

ptime firstValue( microsec_clock::local_time() );

ChildObject* castedObject = dynamic_cast<ChildObject*>(parentObject);

ptime secondValue( microsec_clock::local_time() );
time_duration diff = secondValue - firstValue;
std::cout << "Cast1 lasts:\t" << diff.fractional_seconds() << " microsec" << std::endl;

上面的代码使用 Linux 上的 boost::date_time 方法来获取可用值。
我在一次执行中完成了 3 个dynamic_cast,测量它们的代码是相同的。

1次执行结果如下:
Cast1 持续时间:74 微秒
Cast2 持续:2 微秒
Cast3 持续:1 微秒

第一次转换始终花费 74-111 微秒,同一执行中的后续转换花费 1-3 微秒。

最后我的问题是:
dynamic_cast 真的表现不佳吗?
根据测试结果并非如此。我的测试代码正确吗?
如果不是的话,为什么这么多开发人员认为它很慢?

Before reading the question:
This question is not about how useful it is to use dynamic_cast. Its just about its performance.

I've recently developed a design where dynamic_cast is used a lot.
When discussing it with co-workers almost everyone says that dynamic_cast shouldn't be used because of its bad performance (these are co-workers which have different backgrounds and in some cases do not know each other. I'm working in a huge company)

I decided to test the performance of this method instead of just believing them.

The following code was used:

ptime firstValue( microsec_clock::local_time() );

ChildObject* castedObject = dynamic_cast<ChildObject*>(parentObject);

ptime secondValue( microsec_clock::local_time() );
time_duration diff = secondValue - firstValue;
std::cout << "Cast1 lasts:\t" << diff.fractional_seconds() << " microsec" << std::endl;

The above code uses methods from boost::date_time on Linux to get usable values.
I've done 3 dynamic_cast in one execution, the code for measuring them is the same.

The results of 1 execution were the following:
Cast1 lasts: 74 microsec
Cast2 lasts: 2 microsec
Cast3 lasts: 1 microsec

The first cast always took 74-111 microsec, the following casts in the same execution took 1-3 microsec.

So finally my questions:
Is dynamic_cast really performing bad?
According to the testresults its not. Is my testcode correct?
Why do so much developers think that it is slow if it isn't?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

感受沵的脚步 2024-10-06 22:10:28

首先,您需要测量更多的性能,而不仅仅是几次迭代,因为您的结果将由计时器的分辨率决定。例如,尝试 100 万以上,以构建具有代表性的图片。此外,除非您将其与某些内容进行比较,即执行等效操作但不进行动态转换,否则此结果毫无意义。

其次,您需要通过优化同一指针上的多个动态转换来确保编译器不会给出错误结果(因此使用循环,但每次使用不同的输入指针)。

动态转换会比较慢,因为它需要访问对象的RTTI(运行时类型信息)表,并检查转换是否有效。然后,为了正确使用它,您需要添加错误处理代码来检查返回的指针是否为NULL。所有这些都占用周期。

我知道你不想谈论这个,但是“经常使用dynamic_cast的设计”可能表明你做错了什么......

Firstly, you need to measure the performance over a lot more than just a few iterations, as your results will be dominated by the resolution of the timer. Try e.g. 1 million+, in order to build up a representative picture. Also, this result is meaningless unless you compare it against something, i.e. doing the equivalent but without the dynamic casting.

Secondly, you need to ensure the compiler isn't giving you false results by optimising away multiple dynamic casts on the same pointer (so use a loop, but use a different input pointer each time).

Dynamic casting will be slower, because it needs to access the RTTI (run-time type information) table for the object, and check that the cast is valid. Then, in order to use it properly, you will need to add error-handling code that checks whether the returned pointer is NULL. All of this takes up cycles.

I know you didn't want to talk about this, but "a design where dynamic_cast is used a lot" is probably an indicator that you're doing something wrong...

莫多说 2024-10-06 22:10:28

如果不比较等效的功能,性能就没有意义。大多数人说dynamic_cast 很慢,而不与等效的行为进行比较。叫他们出来解决这个问题。换句话说:

如果“有效”不是必需的,我可以编写比你的代码更快失败的代码。

实现dynamic_cast的方法有很多种,有些方法比其他方法更快。例如,Stroustrup 发表了一篇关于使用素数来改进dynamic_cast的论文。不幸的是,控制编译器如何实现强制转换并不常见,但如果性能对您来说确实很重要,那么您确实可以控制使用哪个编译器。

然而,不使用dynamic_cast总是比使用它更快 - 但如果您实际上不需要dynamic_cast,那么就不要使用它!如果你确实需要动态查找,那么就会有一些开销,然后你可以比较各种策略。

Performance is meaningless without comparing equivalent functionality. Most people say dynamic_cast is slow without comparing to equivalent behavior. Call them out on this. Put another way:

If 'works' isn't a requirement, I can write code that fails faster than yours.

There are various ways to implement dynamic_cast, and some are faster than others. Stroustrup published a paper about using primes to improve dynamic_cast, for example. Unfortunately it's unusual to control how your compiler implements the cast, but if performance really matters to you, then you do have control over which compiler you use.

However, not using dynamic_cast will always be faster than using it — but if you don't actually need dynamic_cast, then don't use it! If you do need dynamic lookup, then there will be some overhead, and you can then compare various strategies.

花心好男孩 2024-10-06 22:10:28

以下是一些基准:
http://tinodidriksen.com/2010/04/14/cpp-dynamic-cast-performance /
http://www.nerdblog.com/2006/12/how-slow-is- 根据他们的说法, dynamic_cast

比reinterpret_cast 慢5-30 倍,而最好的替代方案的性能几乎与reinterpret_cast 相同。

我引用第一篇文章的结论:

  • 除了转换为基本类型之外,dynamic_cast 的速度都很慢;那
    特定的演员阵容已被优化
  • 继承级别对dynamic_cast影响很大
  • 成员变量+reinterpret_cast是最快可靠的方法
    确定类型;然而,这会带来更高的维护开销
    编码时

单次转换的绝对数约为 100 ns。像 74 毫秒这样的值似乎不太接近现实。

Here are a few benchmarks:
http://tinodidriksen.com/2010/04/14/cpp-dynamic-cast-performance/
http://www.nerdblog.com/2006/12/how-slow-is-dynamiccast.html

According to them, dynamic_cast is 5-30 times slower than reinterpret_cast, and the best alternative performs almost the same as reinterpret_cast.

I'll quote the conclusion from the first article:

  • dynamic_cast is slow for anything but casting to the base type; that
    particular cast is optimized out
  • the inheritance level has a big impact on dynamic_cast
  • member variable + reinterpret_cast is the fastest reliable way to
    determine type; however, that has a lot higher maintenance overhead
    when coding

Absolute numbers are on the order of 100 ns for a single cast. Values like 74 msec doesn't seem close to reality.

樱娆 2024-10-06 22:10:28

您的里程可能会有所不同,以轻描淡写的情况。

Dynamic_cast 的性能在很大程度上取决于您正在做什么,并且可能取决于类的名称(并且,相对于 reinterpet_cast 比较时间似乎很奇怪,因为在大多数情况下需要零指令出于实际目的,例如从 unsignedint 的转换)。

我一直在研究它在 clang/g++ 中的工作原理。假设您正在从 B*D* 进行动态转换,其中 B 是(直接或间接) D 基,并且忽略多基类复杂性,它似乎通过调用执行类似以下操作的库函数来工作:

for dynamic_cast<D*>(  p  )   where p is B*

type_info const * curr_typ = &typeid( *p );
while(1) {
     if( *curr_typ == typeid(D)) { return static_cast<D*>(p); } // success;
     if( *curr_typ == typeid(B)) return nullptr;   //failed
     curr_typ = get_direct_base_type_of(*curr_typ); // magic internal operation
}

所以,是的,当 *p 时它非常快 实际上是一个D;仅一次成功的 type_info 比较。
最坏的情况是转换失败,并且从实际类型到 B 需要很多步骤;在这种情况下,有很多失败的类型比较。

类型比较需要多长时间?它在 clang/g++ 上执行此操作:

compare_eq( type_info const &a, type_info const & b ){
   if( a.name() == b.name()) return true; // same string object
   return strcmp( a.name(), b.name())==0; // same string
}

需要 strcmp,因为可能有两个不同的字符串对象为同一类型提供 type_info.name() (尽管我很确定这只发生当一个位于共享库中,而另一个不在该库中时)。但是,在大多数情况下,当类型实际上相等时,它们引用相同的类型名称字符串;因此,大多数成功类型比较都非常快。

name() 方法仅返回一个指向包含类的损坏名称的固定字符串的指针。
因此还有另一个因素:如果从 DB 的过程中许多类的名称都以 MyAppNameSpace::AbstractSyntaxNode< 开头,那么失败的比较将比平时花费更长的时间; strcmp 不会失败,直到它达到损坏的类型名称的差异。

当然,由于整个操作正在遍历一堆表示类型层次结构的链接数据结构,因此时间将取决于这些内容在缓存中是否新鲜。因此,重复执行的相同演员可能会显示平均时间,但不一定代表该演员的典型表演。

Your mileage may vary, to understate the situation.

The performance of dynamic_cast depends a great deal on what you are doing, and can depend on what the names of classes are (and, comparing time relative to reinterpet_cast seems odd, since in most cases that takes zero instructions for practical purposes, as does e.g. a cast from unsigned to int).

I've been looking into how it works in clang/g++. Assuming that you are dynamic_casting from a B* to a D*, where B is a (direct or indirect) base of D, and disregarding multiple-base-class complications, It seems to work by calling a library function which does something like this:

for dynamic_cast<D*>(  p  )   where p is B*

type_info const * curr_typ = &typeid( *p );
while(1) {
     if( *curr_typ == typeid(D)) { return static_cast<D*>(p); } // success;
     if( *curr_typ == typeid(B)) return nullptr;   //failed
     curr_typ = get_direct_base_type_of(*curr_typ); // magic internal operation
}

So, yes, it's pretty fast when *p is actually a D; just one successful type_info compare.
The worst case is when the cast fails, and there are a lot of steps from the actual type to B; in this case there are a lot of failed type comparisons.

How long does type comparison take? it does this, on clang/g++:

compare_eq( type_info const &a, type_info const & b ){
   if( a.name() == b.name()) return true; // same string object
   return strcmp( a.name(), b.name())==0; // same string
}

The strcmp is needed since it's possible to have two distinct string objects providing the type_info.name() for the same type (although I'm pretty sure this only happens when one is in a shared library, and the other is not in that library). But, in most cases, when types are actually equal, they reference the same type name string; thus most successful type comparisons are very fast.

The name() method just returns a pointer to a fixed string containing the mangled name of the class.
So there's another factor: if many of the classes on the way from D to B have names starting with MyAppNameSpace::AbstractSyntaxNode<, then the failing compares are going to take longer than usual; the strcmp won't fail until it reaches a difference in the mangled type names.

And, of course, since the operation as a whole is traversing a bunch of linked data structures representing the type hierarchy, the time will depend on whether those things are fresh in the cache or not. So the same cast done repeatedly is likely to show an average time which doesn't necessarily represent the typical performance for that cast.

夜深人未静 2024-10-06 22:10:28

很抱歉这么说,但是您的测试对于确定演员是否缓慢几乎没有用。微秒分辨率还远远不够好。我们讨论的操作即使在最坏的情况下,也不会超过 100 个时钟周期,或者在典型 PC 上少于 50 纳秒。

毫无疑问,动态转换将比静态转换或重新解释转换慢,因为在程序集级别上,后两者将相当于分配(非常快,1个时钟周期的顺序),而动态转换需要用于检查对象以确定其真实类型的代码。

我不能立即说它到底有多慢,这可能因编译器而异,我需要查看为该行代码生成的汇编代码。但是,正如我所说,每次调用 50 纳秒是合理预期的上限。

Sorry to say this, but your test is virtually useless for determining whether the cast is slow or not. Microsecond resolution is nowhere near good enough. We're talking about an operation that, even in the worst case scenario, shouldn't take more than, say, 100 clock ticks, or less than 50 nanoseconds on a typical PC.

There's no doubt that the dynamic cast will be slower than a static cast or a reinterpret cast, because, on the assembly level, the latter two will amount to an assignment (really fast, order of 1 clock tick), and the dynamic cast requires the code to go and inspect the object to determine its real type.

I can't say off-hand how slow it really is, that would probably vary from compiler to compiler, I'd need to see the assembly code generated for that line of code. But, like I said, 50 nanoseconds per call is the upper limit of what expect to be reasonable.

计㈡愣 2024-10-06 22:10:28

该问题没有提到替代方案。
在RTTI被广泛使用之前,或者只是为了避免使用RTTI,传统的方法是使用虚方法来检查类的类型,然后根据情况进行static_cast。这样做的缺点是它不适用于多重继承,但优点是它也不必花时间检查多重继承层次结构!

在我的测试中:

  • dynamic_cast 运行时间约为 14.4953 纳秒
  • 检查虚拟方法和static_cast运行速度大约是两倍,6.55936 纳秒

这是为了以 1:1 的有效:无效转换比例进行测试,使用以下代码并禁用优化。我使用 Windows 进行性能检查。

#include <iostream>
#include <windows.h>


struct BaseClass
{
    virtual int GetClass() volatile
    { return 0; }
};

struct DerivedClass final : public BaseClass
{
    virtual int GetClass() volatile final override
    { return 1; }
};


volatile DerivedClass *ManualCast(volatile BaseClass *lp)
{
    if (lp->GetClass() == 1)
    {
        return static_cast<volatile DerivedClass *>(lp);
    }

    return nullptr;
}

LARGE_INTEGER perfFreq;
LARGE_INTEGER startTime;
LARGE_INTEGER endTime;

void PrintTime()
{
    float seconds = static_cast<float>(endTime.LowPart - startTime.LowPart) / static_cast<float>(perfFreq.LowPart);
    std::cout << "T=" << seconds << std::endl;
}

BaseClass *Make()
{
    return new BaseClass();
}

BaseClass *Make2()
{
    return new DerivedClass();
}


int main()
{
    volatile BaseClass *base = Make();
    volatile BaseClass *derived = Make2();
    int unused = 0;
    const int t = 1000000000;

    QueryPerformanceFrequency(&perfFreq);
    QueryPerformanceCounter(&startTime);

    for (int n = 0; n < t; ++n)
    {
        volatile DerivedClass *alpha = dynamic_cast<volatile DerivedClass *>(base);
        volatile DerivedClass *beta = dynamic_cast<volatile DerivedClass *>(derived);
        unused += alpha ? 1 : 0;
        unused += beta ? 1 : 0;
    }


    QueryPerformanceCounter(&endTime);
    PrintTime();
    QueryPerformanceCounter(&startTime);

    for (int n = 0; n < t; ++n)
    {
        volatile DerivedClass *alpha = ManualCast(base);
        volatile DerivedClass *beta = ManualCast(derived);
        unused += alpha ? 1 : 0;
        unused += beta ? 1 : 0;
    }

    QueryPerformanceCounter(&endTime);
    PrintTime();

    std::cout << unused;

    delete base;
    delete derived;
}

The question doesn't mention the alternative.
Prior to RTTI being widely available, or simply to avoid using RTTI, the traditional method is to use a virtual method to check the type of the class, and then static_cast as appropriate. This has the disadvantage that it doesn't work for multiple inheritance, but has the advantage that it doesn't have to spend time checking a multiple inheritance hierarchy either!

In my tests:

  • dynamic_cast runs at about 14.4953 nanoseconds.
  • Checking a virtual method and static_casting runs at about twice the speed, 6.55936 nanoseconds.

This is for testing with a 1:1 ratio of valid:invalid casts, using the following code with optimisations disabled. I used Windows for performance checking.

#include <iostream>
#include <windows.h>


struct BaseClass
{
    virtual int GetClass() volatile
    { return 0; }
};

struct DerivedClass final : public BaseClass
{
    virtual int GetClass() volatile final override
    { return 1; }
};


volatile DerivedClass *ManualCast(volatile BaseClass *lp)
{
    if (lp->GetClass() == 1)
    {
        return static_cast<volatile DerivedClass *>(lp);
    }

    return nullptr;
}

LARGE_INTEGER perfFreq;
LARGE_INTEGER startTime;
LARGE_INTEGER endTime;

void PrintTime()
{
    float seconds = static_cast<float>(endTime.LowPart - startTime.LowPart) / static_cast<float>(perfFreq.LowPart);
    std::cout << "T=" << seconds << std::endl;
}

BaseClass *Make()
{
    return new BaseClass();
}

BaseClass *Make2()
{
    return new DerivedClass();
}


int main()
{
    volatile BaseClass *base = Make();
    volatile BaseClass *derived = Make2();
    int unused = 0;
    const int t = 1000000000;

    QueryPerformanceFrequency(&perfFreq);
    QueryPerformanceCounter(&startTime);

    for (int n = 0; n < t; ++n)
    {
        volatile DerivedClass *alpha = dynamic_cast<volatile DerivedClass *>(base);
        volatile DerivedClass *beta = dynamic_cast<volatile DerivedClass *>(derived);
        unused += alpha ? 1 : 0;
        unused += beta ? 1 : 0;
    }


    QueryPerformanceCounter(&endTime);
    PrintTime();
    QueryPerformanceCounter(&startTime);

    for (int n = 0; n < t; ++n)
    {
        volatile DerivedClass *alpha = ManualCast(base);
        volatile DerivedClass *beta = ManualCast(derived);
        unused += alpha ? 1 : 0;
        unused += beta ? 1 : 0;
    }

    QueryPerformanceCounter(&endTime);
    PrintTime();

    std::cout << unused;

    delete base;
    delete derived;
}

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文