树迭代器,你能进一步优化它吗?

发布于 2024-08-02 12:29:02 字数 1236 浏览 4 评论 0原文

作为我最初关于一小段代码的问题的后续行动,我决定提出后续行动,看看您是否可以做得比我们迄今为止提出的更好。

下面的代码迭代二叉树(左/右=子/下一个)。我确实相信这里有空间容纳一个较少的条件(down 布尔值)。最快回答者获胜!

  1. cnt 语句可以是多个语句,因此请确保它仅出现一次
  2. child()next() 成员函数大约为 30 倍与 hasChild() 和 hasNext() 操作一样慢。
  3. 保持迭代<--放弃了这个要求,因为提出的递归解决方案更快。
  4. 这是 C++ 代码,
  5. 节点的访问顺序必须保持如下例所示。 (首先击中父节点,然后击中子节点,然后击中“下一个”节点)。
  6. BaseNodePtr 是 boost::shared_ptr,因此分配速度很慢,请避免任何临时 BaseNodePtr 变量。

目前,此代码需要 5897 毫秒来访问测试树中的 62200000 个节点,调用此函数 200,000 次。

void processTree (BaseNodePtr current, unsigned int & cnt )
{
    bool down = true;

    while ( true )
    {
        if ( down )
        {
            while (true) {

                cnt++; // this can/will be multiple statesments

               if (!current->hasChild()) break;
               current = current->child();
            }
        }

        if ( current->hasNext() )
        {
            down = true;
            current = current->next();
        }
        else
        {
            down = false;
            current = current->parent();
            if (!current)
                return; // done.
        }
    }
}

As a follow up to my original question about a small piece of this code I decided to ask a follow up to see if you can do better then what we came up with so far.

The code below iterates over a binary tree (left/right = child/next ). I do believe there is room for one less conditional in here (the down boolean). The fastest answer wins!

  1. The cnt statement can be multiple statements so lets make sure this appears only once
  2. The child() and next() member functions are about 30x as slow as the hasChild() and hasNext() operations.
  3. Keep it iterative <-- dropped this requirement as the recursive solution presented was faster.
  4. This is C++ code
  5. visit order of the nodes must stay as they are in the example below. ( hit parents first then the children then the 'next' nodes).
  6. BaseNodePtr is a boost::shared_ptr as thus assignments are slow, avoid any temporary BaseNodePtr variables.

Currently this code takes 5897ms to visit 62200000 nodes in a test tree, calling this function 200,000 times.

void processTree (BaseNodePtr current, unsigned int & cnt )
{
    bool down = true;

    while ( true )
    {
        if ( down )
        {
            while (true) {

                cnt++; // this can/will be multiple statesments

               if (!current->hasChild()) break;
               current = current->child();
            }
        }

        if ( current->hasNext() )
        {
            down = true;
            current = current->next();
        }
        else
        {
            down = false;
            current = current->parent();
            if (!current)
                return; // done.
        }
    }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

死开点丶别碍眼 2024-08-09 12:29:02

为什么不采用递归解决方案呢?

void processTree (const BaseNodePtr ¤t, unsigned int & cnt )
{
  cnt++;

  if (current->hasChild())
    processTree(current->child());
  if (current->hasNext())
    processTree(current->next());
}

既然 shared_ptr 似乎是你的瓶颈,为什么不改进它呢?你使用线程吗?如果没有,则取消定义符号 BOOST_HAS_THREADSshared_ptr 引用计数由互斥体保护,这可能是性能缓慢的原因。

为什么不更改数据结构以完全不使用 shared_ptr 呢?自己管理原始指针?也许可以使用 scoped_ptr 来代替?

Why not a recursive solution?

void processTree (const BaseNodePtr ¤t, unsigned int & cnt )
{
  cnt++;

  if (current->hasChild())
    processTree(current->child());
  if (current->hasNext())
    processTree(current->next());
}

Since shared_ptr seems to be your bottleneck, why not improve it? Are you using threads? If not, then undefine the symbol BOOST_HAS_THREADS. The shared_ptr reference count is guarded by a mutex which is probably the cause of the slow performance.

Why not change your data structure to not use shared_ptr altogether? Manage the raw pointers yourself? Maybe use scoped_ptr instead?

↘紸啶 2024-08-09 12:29:02

为了最终加快速度,您需要做的就是对内存中的节点进行排序,以便它们按照您访问它们的顺序存储在连续的块中。

例如,如果您有一棵定义如下的树。

        1
       / \
      2   3
     / \  /\
    4   5 6 7
   /\    /  /\
  8  9  10 11 12
 / \           \
13 14          15

然后,所描述的访问函数将按以下顺序访问节点

1
 2
  4
   8
    13
    14
   9
  5
 3
  6
   10
  7
   11
   12
     15

现在,如果您将内存中的节点排序为 15 个分配的连续块,并按照上面演示的顺序存储节点,那么您通常会访问具有“空间局部性"。这可以提高缓存命中率,具体取决于节点结构的大小,从而使运行速度更快。

创建一种快速迭代方法,仅访问树中的所有节点一次并且无需递归。

unsigned int g_StackDepth = 0;
BaseNodePtr* g_Stack[MAX_STACK_DEPTH];

void processTree (BaseNodePtr root, unsigned int & cnt )
{
    g_Stack[g_StackDepth++] = root;
    while( g_StackDepth > 0 )
    {
        BaseNodePtr curr = g_Stack[--g_StackDepth];
        cnt++;

        if ( curr->HasNext() )
        {
            g_Stack[g_StackDepth++] = curr->Next();
        }


        if ( curr->HasChild() )
        {
            g_Stack[g_StackDepth++] = curr->Child();
        }

    }
}

据我所知,结合上述顺序,您应该可以获得所能达到的最佳速度。

显然这有局限性,因为你必须提前知道你的筹码可能会增长到多大。尽管您可以通过使用 std::vector 来解决这个问题。然而,使用 std::vector 会消除上面迭代方法提供的所有优点。

希望有一些帮助:)

For the ultimate speed up what you need to do is order the nodes in memory so that they are stored in a contiguous block in the order that you visit them.

e.g If you have a tree defined as follows.

        1
       / \
      2   3
     / \  /\
    4   5 6 7
   /\    /  /\
  8  9  10 11 12
 / \           \
13 14          15

Then the visit function as described will visit the nodes in the following order

1
 2
  4
   8
    13
    14
   9
  5
 3
  6
   10
  7
   11
   12
     15

Now if you order the nodes in memory as a contiguous block of 15 allocations and store the nodes in the order demonstrated above then you will generally visiting a node that has "spatial locality". This can improve your cache hits, depending upon the size of your node structure and thus make things run faster.

To create a quick iterative method of visiting all the nodes in a tree only once and with no recursion.

unsigned int g_StackDepth = 0;
BaseNodePtr* g_Stack[MAX_STACK_DEPTH];

void processTree (BaseNodePtr root, unsigned int & cnt )
{
    g_Stack[g_StackDepth++] = root;
    while( g_StackDepth > 0 )
    {
        BaseNodePtr curr = g_Stack[--g_StackDepth];
        cnt++;

        if ( curr->HasNext() )
        {
            g_Stack[g_StackDepth++] = curr->Next();
        }


        if ( curr->HasChild() )
        {
            g_Stack[g_StackDepth++] = curr->Child();
        }

    }
}

Combined with the above ordering you should get just about the best speed you CAN get, to my knowledge.

Obviously this has limitations as you have to know how big your stack can possibly grow in advance. Though you could get around this by using a std::vector instead. Using a std::vector however would eliminate all the advantages the iterative method above provides, however.

Hope that is some help :)

浅紫色的梦幻 2024-08-09 12:29:02

讨厌当答案用“不要那样做”来驳回问题时,但我走了......

假设有一种方法可以删除下布尔......这真的会对执行时间产生真正的影响吗?我们讨论的是一小部分 CPU 操作和堆栈上的一些额外字节。

如果您需要速度,请专注于使 child() 和 Parent() 调用更快。否则你就是在浪费时间(恕我直言)。

编辑:
也许遍历树(使用这个“慢”代码)一次并按照所需的顺序将指针数组构建到树中。稍后使用这个“索引”。

我想说的是,我认为你从错误的角度进行优化。

I HATE when answers dismiss the question with a "don't do that" but here I go...

Say there was a way to remove the down bool... will that really make any REAL difference in execution time? We're talking about a small handful of CPU operations and a few byte extra on the stack.

Focus on making the child() and parent() calls faster if you need speed. Otherwise you're wasting your time (IMOHO).

EDIT:
Maybe walk the tree (w/ this "slow" code) ONCE and build an array of pointers into the tree in the desired order. Use this "index" later.

What I'm saying is I think you're approaching optimization from the wrong angle.

唐婉 2024-08-09 12:29:02

以下是如何只进行一次递归调用而不是两次:

void processTree (const BaseNodePtr ¤t, unsigned int & cnt )
{
  for(bool gotNext = true; gotNext; current = current->next()) { 
    cnt++;
    if (current->hasChild())
      processTree(current->child());
    gotNext = current->hasNext();
  }
}

Here is how to have only one recursion call instead of two:

void processTree (const BaseNodePtr ¤t, unsigned int & cnt )
{
  for(bool gotNext = true; gotNext; current = current->next()) { 
    cnt++;
    if (current->hasChild())
      processTree(current->child());
    gotNext = current->hasNext();
  }
}
滥情稳全场 2024-08-09 12:29:02

创建一个“nextvisit”函数,并继续调用它,以简化代码;接下来,对共享指针使用 const 引用 iso 值语义...这可能会为您节省宝贵的共享指针副本:

// define the order of visitation in here
BaseNodePtr& next( const BaseNodePtr& p ) {
    if( p->hasChild() ) return p->child();
    if( p->hasNext() ) return p->next();
    BaseNodePtr ancestor = p->parent();
    while( ancestor != 0 && !ancestor->hasNext() ) ancestor = ancestor->parent();
    return ancestor;
}

void processTree( const BaseNodePtr& p, unsigned int& cnt ) {
   while( p != NULL ) {
     ++cnt;
     p = next(p);
   }        
}

但是为了可读性、清晰度、可维护性,...看在上帝的份上,请使用递归。除非你的筹码不够大。

Create a "nextvisit" function, and keep calling that, to simplify the code; next to that, use const references iso value-semantics for the shared-pointers... this may save you valuable shared-ptr copies:

// define the order of visitation in here
BaseNodePtr& next( const BaseNodePtr& p ) {
    if( p->hasChild() ) return p->child();
    if( p->hasNext() ) return p->next();
    BaseNodePtr ancestor = p->parent();
    while( ancestor != 0 && !ancestor->hasNext() ) ancestor = ancestor->parent();
    return ancestor;
}

void processTree( const BaseNodePtr& p, unsigned int& cnt ) {
   while( p != NULL ) {
     ++cnt;
     p = next(p);
   }        
}

But for readability, clarity, maintainability, ... for god's sake, use recursion. Unless your stack isn't big enough.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文