KD树,慢速树构建

发布于 2024-10-02 17:37:44 字数 3190 浏览 0 评论 0原文

我正在尝试构建 KD 树(静态情况)。我们假设点在 x 和 y 坐标上排序。

为了获得均匀的递归深度,该集合被分成两个子集,其中一条垂直线穿过中值 x 坐标。

对于奇数递归深度,该集合被分成两个子集,其中一条水平线穿过中值 y 坐标。

中位数可以根据 x / y 坐标从排序集中确定。我在每次分割集合之前执行此步骤。我认为这导致了树的缓慢构建。

  1. 请您帮我检查并优化代码吗?
  2. 我找不到第 k 个最近邻居,有人可以帮我编写代码吗?

非常感谢您的帮助和耐心...

请参阅示例代码:

class KDNode
{
private:
Point2D *data;
KDNode *left;
KDNode *right;
    ....
};

void KDTree::createKDTree(Points2DList *pl)
{
//Create list
KDList kd_list;

//Create KD list (all input points)
for (unsigned int i = 0; i < pl->size(); i++)
{
kd_list.push_back((*pl)[i]);
}

//Sort points by x
std::sort(kd_list.begin(), kd_list.end(), sortPoints2DByY());

//Build KD Tree
root = buildKDTree(&kd_list, 1);
}


KDNode * KDTree::buildKDTree(KDList *kd_list, const unsigned int depth)
{
//Build KD tree
const unsigned int n = kd_list->size();

 //No leaf will be built
 if (n == 0)
 {
  return NULL;
 }

 //Only one point: create leaf of KD Tree
 else if (n == 1)
 {
  //Create one leaft
  return new KDNode(new Point2D ((*kd_list)[0]));
 }

 //At least 2 points: create one leaf, split tree into left and right subtree
 else
 {
  //New KD node
  KDNode *node = NULL;

  //Get median index
  const unsigned int median_index = n/2;

  //Create new KD Lists
  KDList kd_list1, kd_list2;

  //The depth is even, process by x coordinate
  if (depth%2 == 0)
  {
   //Create new median node
   node = new KDNode(new Point2D( (*kd_list)[median_index]));

   //Split list
   for (unsigned int i = 0; i < n; i++)
   {
    //Geta actual point
    Point2D *p = &(*kd_list)[i];

    //Add point to the first list: x < median.x
    if (p->getX() < (*kd_list)[median_index].getX())
    {
     kd_list1.push_back(*p);
    }

    //Add point to the second list: x > median.x
    else if (p->getX() > (*kd_list)[median_index].getX())
    {
     kd_list2.push_back(*p);
    }
   }

   //Sort points by y for the next recursion step: slow construction of the tree???
   std::sort(kd_list1.begin(), kd_list1.end(), sortPoints2DByY());
   std::sort(kd_list2.begin(), kd_list2.end(), sortPoints2DByY());

  }

  //The depth is odd, process by y coordinates
  else
  {

   //Create new median node
   node = new KDNode(new Point2D((*kd_list)[median_index]));

   //Split list
   for (unsigned int i = 0; i < n; i++)
   {
    //Geta actual point
    Point2D *p = &(*kd_list)[i];

    //Add point to the first list: y < median.y
    if (p->getY() < (*kd_list)[median_index].getY())
    {
     kd_list1.push_back(*p);
    }

    //Add point to the second list: y < median.y
    else if (p->getY() >(*kd_list)[median_index].getY())
    {
     kd_list2.push_back(*p);
    }
   }

   //Sort points by x for the next recursion step: slow construction of the tree???
   std::sort(kd_list1.begin(), kd_list1.end(), sortPoints2DByX());
   std::sort(kd_list2.begin(), kd_list2.end(), sortPoints2DByX());

  }

  //Build left subtree
  node->setLeft( buildKDTree(&kd_list1, depth +1 ) );

  //Build right subtree
  node->setRight( buildKDTree(&kd_list2, depth + 1 ) );

  //Return new node 
  return node; 
 }
}

I am trying to build KD Tree (static case). We assume points are sorted on both x and y coordinates.

For even depth of recursion the set is split into two subsets with a vertical line going through median x coordinate.

For odd depth of recursion the set is split into two subsets with a horizontal line going through median y coordinate.

The median can be determined from sorted set according to x / y coordinate. This step I am doing before each splitting of the set. And I think that it causes the slow construction of the tree.

  1. Please could you help me check any and optimize the code?
  2. I can not find the k-th nearest neighbor, could somebody help me with the code?

Thank you very much for your help and patience...

Please see the sample code:

class KDNode
{
private:
Point2D *data;
KDNode *left;
KDNode *right;
    ....
};

void KDTree::createKDTree(Points2DList *pl)
{
//Create list
KDList kd_list;

//Create KD list (all input points)
for (unsigned int i = 0; i < pl->size(); i++)
{
kd_list.push_back((*pl)[i]);
}

//Sort points by x
std::sort(kd_list.begin(), kd_list.end(), sortPoints2DByY());

//Build KD Tree
root = buildKDTree(&kd_list, 1);
}


KDNode * KDTree::buildKDTree(KDList *kd_list, const unsigned int depth)
{
//Build KD tree
const unsigned int n = kd_list->size();

 //No leaf will be built
 if (n == 0)
 {
  return NULL;
 }

 //Only one point: create leaf of KD Tree
 else if (n == 1)
 {
  //Create one leaft
  return new KDNode(new Point2D ((*kd_list)[0]));
 }

 //At least 2 points: create one leaf, split tree into left and right subtree
 else
 {
  //New KD node
  KDNode *node = NULL;

  //Get median index
  const unsigned int median_index = n/2;

  //Create new KD Lists
  KDList kd_list1, kd_list2;

  //The depth is even, process by x coordinate
  if (depth%2 == 0)
  {
   //Create new median node
   node = new KDNode(new Point2D( (*kd_list)[median_index]));

   //Split list
   for (unsigned int i = 0; i < n; i++)
   {
    //Geta actual point
    Point2D *p = &(*kd_list)[i];

    //Add point to the first list: x < median.x
    if (p->getX() < (*kd_list)[median_index].getX())
    {
     kd_list1.push_back(*p);
    }

    //Add point to the second list: x > median.x
    else if (p->getX() > (*kd_list)[median_index].getX())
    {
     kd_list2.push_back(*p);
    }
   }

   //Sort points by y for the next recursion step: slow construction of the tree???
   std::sort(kd_list1.begin(), kd_list1.end(), sortPoints2DByY());
   std::sort(kd_list2.begin(), kd_list2.end(), sortPoints2DByY());

  }

  //The depth is odd, process by y coordinates
  else
  {

   //Create new median node
   node = new KDNode(new Point2D((*kd_list)[median_index]));

   //Split list
   for (unsigned int i = 0; i < n; i++)
   {
    //Geta actual point
    Point2D *p = &(*kd_list)[i];

    //Add point to the first list: y < median.y
    if (p->getY() < (*kd_list)[median_index].getY())
    {
     kd_list1.push_back(*p);
    }

    //Add point to the second list: y < median.y
    else if (p->getY() >(*kd_list)[median_index].getY())
    {
     kd_list2.push_back(*p);
    }
   }

   //Sort points by x for the next recursion step: slow construction of the tree???
   std::sort(kd_list1.begin(), kd_list1.end(), sortPoints2DByX());
   std::sort(kd_list2.begin(), kd_list2.end(), sortPoints2DByX());

  }

  //Build left subtree
  node->setLeft( buildKDTree(&kd_list1, depth +1 ) );

  //Build right subtree
  node->setRight( buildKDTree(&kd_list2, depth + 1 ) );

  //Return new node 
  return node; 
 }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

中二柚 2024-10-09 17:37:44

寻找中位数的排序可能是这里最糟糕的罪魁祸首,因为这是 O(nlogn),而问题可以在 O(n) 时间内解决。您应该使用 nth_element 代替: http://www.cplusplus.com/reference/algorithm/nth_element/ 。这将找到平均线性时间的中位数,之后您可以在线性时间中分割向量。

向量中的内存管理也可能需要花费大量时间,尤其是对于大型向量,因为每次向量的大小加倍时,所有元素都必须移动。您可以使用向量的保留方法为新创建的节点中的向量保留足够的空间,因此它们不需要随着push_back添加新内容而动态增加。

如果您绝对需要最佳性能,则应该使用较低级别的代码,取消向量并保留普通数组。第 N 个元素或“选择”算法很容易获得,并且自己编写并不难: http://en.wikipedia .org/wiki/Selection_algorithm

The sorting to find the median is probably the worst culprit here, since that is O(nlogn) while the problem is solvable in O(n) time. You should use nth_element instead: http://www.cplusplus.com/reference/algorithm/nth_element/. That'll find the median in linear time on average, after which you can split the vector in linear time.

Memory management in vector is also something that can take a lot of time, especially with large vectors, since every time the vector's size is doubled all the elements have to be moved. You can use the reserve method of vector to reserve exactly enough space for the vectors in the newly created nodes, so they need not increase dynamically as new stuff is added with push_back.

And if you absolutely need the best performance, you should use lower level code, doing away with vector and reserving plain arrays instead. Nth element or 'selection' algorithms are readily available and not too hard to write yourself: http://en.wikipedia.org/wiki/Selection_algorithm

韵柒 2024-10-09 17:37:44

优化 kd 树的一些提示:

  • 使用线性时间中值查找算法,例如 QuickSelect。
  • 避免实际使用“节点”对象。您可以使用点来存储整个树,并且附加信息为零。本质上只需对对象数组进行排序即可。根节点将位于中间。将根放在第一位,然后使用堆布局的重新排列可能对查询时的 CPU 内存缓存更好,但构建起来更棘手。

Some hints on optimizing the kd-tree:

  • Use a linear time median finding algorithm, such as QuickSelect.
  • Avoid actually using "node" objects. You can store whole tree using the points only, with ZERO additional information. Essentially by just sorting an array of objects. The root node will then be in the middle. A rearrangement that puts the root first, then uses a heap layout will likely be nicer to the CPU memory cache on query time, but more tricky to build.
靖瑶 2024-10-09 17:37:44

并不是您问题的真正答案,但我强烈推荐 http://ompf.org/forum 的论坛/
他们在那里对各种情况下的快速 kd 树构建进行了一些精彩的讨论。也许你会在那里找到一些灵感。

编辑:
OMPF 论坛已经关闭,尽管目前可以在 http://ompf2.com/ 上找到直接替代者

Not really an answer to your questions, but I would highly recommend the forum at http://ompf.org/forum/
They have some great discussions over there for fast kd-tree constructions in various contexts. Perhaps you'll find some inspiration over there.

Edit:
The OMPF forums have since gone down, although a direct replacement is currently available at http://ompf2.com/

昔日梦未散 2024-10-09 17:37:44

第一个罪魁祸首是排序以找到中位数。这几乎总是 Kd 树构建的瓶颈,在这里使用更高效的算法确实会得到回报。

但是,每次分割并向它们传输元素时,您还会构造一对可变大小的向量。

在这里我推荐很好的单链表。链表的美妙之处在于,您只需将 next 指针更改为指向子级的根指针而不是父级的根指针,就可以将元素从父级传输到子级。

这意味着在构造期间将元素从父节点传输到子节点时没有任何堆开销,仅聚合要插入到根的初始元素列表。这也应该会产生奇迹,但如果您想要更快,您可以使用固定分配器来有效地为链表(以及树)分配节点,并具有更好的连续性/缓存命中。

最后但并非最不重要的一点是,如果您参与需要 Kd 树的密集计算任务,则需要一个分析器。测量您的代码,您将准确地看到罪魁祸首以及准确的时间分布。

Your first culprit is sorting to find the median. This is almost always the bottleneck for K-d tree construction, and using more efficient algorithms here will really pay off.

However, you're also constructing a pair of variable-sized vectors each time you split and transferring elements to them.

Here I recommend the good ol' singly-linked list. The beauty of the linked list is that you can transfer elements from parent to child by simply changing next pointers to point at the child's root pointer instead of the parent's.

That means no heap overhead whatsoever during construction to transfer elements from parent nodes to child nodes, only to aggregate the initial list of elements to insert to the root. That should do wonders as well, but if you want even faster, you can use a fixed allocator to efficiently allocate nodes for the linked list (as well as for the tree) and with better contiguity/cache hits.

Last but not least, if you're involved in intensive computing tasks that call for K-d trees, you need a profiler. Measure your code and you'll see exactly what lies at the culprit, and with exact time distributions.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文