基于欧几里德距离的 3D 连接点标记
目前,我正在开发一个项目,该项目试图通过将连通性指定为最小欧几里德距离来对数据集中的 3d 点进行分组。我现在的算法只是简单的洪水填充的 3D 改编。
size_t PointSegmenter::growRegion(size_t & seed, size_t segNumber) {
size_t numPointsLabeled = 0;
//alias for points to avoid retyping
vector<Point3d> & points = _img.points;
deque<size_t> ptQueue;
ptQueue.push_back(seed);
points[seed].setLabel(segNumber);
while (!ptQueue.empty()) {
size_t currentIdx = ptQueue.front();
ptQueue.pop_front();
points[currentIdx].setLabel(segNumber);
numPointsLabeled++;
vector<int> newPoints = _img.queryRadius(currentIdx, SEGMENT_MAX_DISTANCE, MATCH_ACCURACY);
for (int i = 0; i < (int)newPoints.size(); i++) {
int newIdx = newPoints[i];
Point3d &newPoint = points[newIdx];
if(!newPoint.labeled()) {
newPoint.setLabel(segNumber);
ptQueue.push_back(newIdx);
}
}
}
//NOTE to whoever wrote the other code, the compiler optimizes i++
//to ++i in cases like these, so please don't change them just for speed :)
for (size_t i = seed; i < points.size(); i++) {
if(!points[i].labeled()) {
//search for an unlabeled point to serve as the next seed.
seed = i;
return numPointsLabeled;
}
}
return numPointsLabeled;
}
再次运行此代码片段以获取新种子,并且 _img.queryRadius() 是使用 ANN 库的固定半径搜索:
vector<int> Image::queryRadius(size_t index, double range, double epsilon) {
int k = kdTree->annkFRSearch(dataPts[index], range*range, 0);
ANNidxArray nnIdx = new ANNidx[k];
kdTree->annkFRSearch(dataPts[index], range*range, k, nnIdx);
vector<int> outPoints;
outPoints.reserve(k);
for(int i = 0; i < k; i++) {
outPoints.push_back(nnIdx[i]);
}
delete[] nnIdx;
return outPoints;
}
我对这段代码的问题是,对于大型数据集来说,它运行 waaaaaaaaaaaaaaaay 太慢了。如果我没记错的话,这段代码将对每个点进行搜索,并且搜索时间复杂度为 O(NlogN),时间复杂度为 (N^2*log(N))。
除此之外,如果我没记错的话,删除 KD 树的成本相对较高,而且不删除点也会产生问题,因为每个点可以被靠近它的每个邻居搜索数百次。
所以我的问题是,有没有更好的方法来做到这一点?特别是随着数据集线性增长的方式?
感谢您提供的任何帮助
编辑 我尝试过使用像 dash-tom-bang 所说的简单排序列表,但结果甚至比我之前使用的还要慢。我不确定这是否是实现的问题,或者它只是太慢了,无法迭代每个点并检查欧几里德距离(即使仅使用平方距离。
人们可能还有其他想法吗?老实说,我被难住了现在。
Currently, I am working on a project that is trying to group 3d points from a dataset by specifying connectivity as a minimum euclidean distance. My algorithm right now is simply a 3d adaptation of the naive flood fill.
size_t PointSegmenter::growRegion(size_t & seed, size_t segNumber) {
size_t numPointsLabeled = 0;
//alias for points to avoid retyping
vector<Point3d> & points = _img.points;
deque<size_t> ptQueue;
ptQueue.push_back(seed);
points[seed].setLabel(segNumber);
while (!ptQueue.empty()) {
size_t currentIdx = ptQueue.front();
ptQueue.pop_front();
points[currentIdx].setLabel(segNumber);
numPointsLabeled++;
vector<int> newPoints = _img.queryRadius(currentIdx, SEGMENT_MAX_DISTANCE, MATCH_ACCURACY);
for (int i = 0; i < (int)newPoints.size(); i++) {
int newIdx = newPoints[i];
Point3d &newPoint = points[newIdx];
if(!newPoint.labeled()) {
newPoint.setLabel(segNumber);
ptQueue.push_back(newIdx);
}
}
}
//NOTE to whoever wrote the other code, the compiler optimizes i++
//to ++i in cases like these, so please don't change them just for speed :)
for (size_t i = seed; i < points.size(); i++) {
if(!points[i].labeled()) {
//search for an unlabeled point to serve as the next seed.
seed = i;
return numPointsLabeled;
}
}
return numPointsLabeled;
}
Where this code snippet is ran again for the new seed, and _img.queryRadius() is a fixed radius search with the ANN library:
vector<int> Image::queryRadius(size_t index, double range, double epsilon) {
int k = kdTree->annkFRSearch(dataPts[index], range*range, 0);
ANNidxArray nnIdx = new ANNidx[k];
kdTree->annkFRSearch(dataPts[index], range*range, k, nnIdx);
vector<int> outPoints;
outPoints.reserve(k);
for(int i = 0; i < k; i++) {
outPoints.push_back(nnIdx[i]);
}
delete[] nnIdx;
return outPoints;
}
My problem with this code is that it runs waaaaaaaaaaaaaaaay too slow for large datasets. If I'm not mistaken, this code will do a search for every single point, and the searches are O(NlogN), giving this a time complexity of (N^2*log(N)).
In addition to that, deletions are relatively expensive if I remember right from KD trees, but also not deleting points creates problems in that each point can be searched hundreds of times, by every neighbor close to it.
So my question is, is there a better way to do this? Especially in a way that will grow linearly with the dataset?
Thanks for any help you may be able to provide
EDIT
I have tried using a simple sorted list like dash-tom-bang said, but the result was even slower than what I was using before. I'm not sure if it was the implementation, or it was just simply too slow to iterate through every point and check euclidean distance (even when just using squared distance.
Is there any other ideas people may have? I'm honestly stumped right now.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我提出以下算法:
计算数据点的 3D Delaunay 三角剖分。
删除所有比阈值距离长的边,与步骤 3 结合时为 O(N)。
瓶颈是步骤 1,根据此页面,可以在 O(N2) 甚至 O(N log N) 内完成 http://www.ncgia.ucsb.edu/conf/SANTA_FE_CD-ROM/sf_papers/lattuada_roberto/paper.html 。但它绝对不是 100 行算法。
I propose the following algorithm:
Compute 3D Delaunay triangulation of your data points.
Remove all the edges that are longer than your threshold distance, O(N) when combined with step 3.
Find connected components in the resulting graph which is O(N) in size, this is done in O(N α(N)).
The bottleneck is step 1 which can be done in O(N2) or even O(N log N) according to this page http://www.ncgia.ucsb.edu/conf/SANTA_FE_CD-ROM/sf_papers/lattuada_roberto/paper.html. However it's definitely not a 100 lines algorithm.
当我沿着这些思路做一些事情时,我在数据集之外的某个地方选择了一个“原点”,并按所有点到该原点的距离对它们进行排序。然后,我在每一步都有一组小得多的点可供选择,而且我只需要遍历所考虑的点周围的“洋葱皮”区域。您将检查相邻点,直到到最近点的距离小于您正在检查的范围的宽度。
虽然这对我来说效果很好,但可以通过沿一个轴对所有点进行排序(这代表无限远的“原点”)然后再次检查点直到您的“搜索宽度”超过来实现类似的版本到目前为止找到的最近点的距离。
When I did something along these lines, I chose an "origin" outside of the dataset somewhere and sorted all of the points by their distance to that origin. Then I had a much smaller set of points to choose from at each step, and I only had to go through the "onion skin" region around the point being considered. You would check neighboring points until the distance to the closest point is less than the width of the range you're checking.
While that worked well for me, a similar version of that can be achieved by sorting all of your points along one axis (which would represent the "origin" being infinitely far away) and then just checking points again until your "search width" exceeds the distance to the closest point so far found.
点应该更好地组织。为了更有效地搜索而不是向量,您需要某种哈希映射,其中哈希冲突意味着两个点彼此靠近(因此您可以使用哈希冲突来发挥自己的优势)。例如,您可以将空间划分为大小等于 SEGMENT_MAX_DISTANCE 的立方体,并使用返回整数三元组而不只是整数的哈希函数,其中三元组的每个部分都计算为点。/SEGMENT_MAX_DISTANCE。
现在,对于这个新集合中的每个点,您仅搜索同一立方体以及相邻空间立方体中的点。这大大减少了搜索空间。
Points should be better organized. To search more efficiently instead of a
vector<Point3d>
you need some sort of a hash map where hash collision implies that two points are close to each other (so you use hash collisions to your advantage). You can for instance divide the space into cubes of size equal to SEGMENT_MAX_DISTANCE, and use a hash function that returns a triplet of ints instead of just an int, where each part of a triplet is calculated aspoint.<corresponding_dimension> / SEGMENT_MAX_DISTANCE
.Now for each point in this new set you search only for points in the same cube, and in adjacent cubes of space. This greatly reduces the search space.