基于欧几里德距离的 3D 连接点标记

发布于 2024-09-18 14:14:13 字数 2379 浏览 3 评论 0原文

目前，我正在开发一个项目，该项目试图通过将连通性指定为最小欧几里德距离来对数据集中的 3d 点进行分组。我现在的算法只是简单的洪水填充的 3D 改编。

size_t PointSegmenter::growRegion(size_t & seed, size_t segNumber) {
    size_t numPointsLabeled = 0;

    //alias for points to avoid retyping
    vector<Point3d> & points = _img.points;
    deque<size_t> ptQueue;
    ptQueue.push_back(seed);
    points[seed].setLabel(segNumber);
    while (!ptQueue.empty()) {
        size_t currentIdx = ptQueue.front();
        ptQueue.pop_front();
        points[currentIdx].setLabel(segNumber);
        numPointsLabeled++;
        vector<int> newPoints = _img.queryRadius(currentIdx, SEGMENT_MAX_DISTANCE, MATCH_ACCURACY);
        for (int i = 0; i < (int)newPoints.size(); i++) {
            int newIdx = newPoints[i];
            Point3d &newPoint = points[newIdx];
            if(!newPoint.labeled()) {
                newPoint.setLabel(segNumber);
                ptQueue.push_back(newIdx);
            }
        }
    }

    //NOTE to whoever wrote the other code, the compiler optimizes i++ 
    //to ++i in cases like these, so please don't change them just for speed :)
    for (size_t i = seed; i < points.size(); i++) {
        if(!points[i].labeled()) {
            //search for an unlabeled point to serve as the next seed.
            seed = i;
            return numPointsLabeled;
        }
    }
    return numPointsLabeled;
}

再次运行此代码片段以获取新种子，并且 _img.queryRadius() 是使用 ANN 库的固定半径搜索：

vector<int> Image::queryRadius(size_t index, double range, double epsilon) {
    int k = kdTree->annkFRSearch(dataPts[index], range*range, 0);
    ANNidxArray nnIdx = new ANNidx[k];
    kdTree->annkFRSearch(dataPts[index], range*range, k, nnIdx);
    vector<int> outPoints;
    outPoints.reserve(k);
    for(int i = 0; i < k; i++) {
        outPoints.push_back(nnIdx[i]);
    }
    delete[] nnIdx;
    return outPoints;
}

我对这段代码的问题是，对于大型数据集来说，它运行 waaaaaaaaaaaaaaaay 太慢了。如果我没记错的话，这段代码将对每个点进行搜索，并且搜索时间复杂度为 O(NlogN)，时间复杂度为 (N^2*log(N))。

除此之外，如果我没记错的话，删除 KD 树的成本相对较高，而且不删除点也会产生问题，因为每个点可以被靠近它的每个邻居搜索数百次。

所以我的问题是，有没有更好的方法来做到这一点？特别是随着数据集线性增长的方式？

感谢您提供的任何帮助

编辑我尝试过使用像 dash-tom-bang 所说的简单排序列表，但结果甚至比我之前使用的还要慢。我不确定这是否是实现的问题，或者它只是太慢了，无法迭代每个点并检查欧几里德距离（即使仅使用平方距离。

人们可能还有其他想法吗？老实说，我被难住了现在。

原文

Currently, I am working on a project that is trying to group 3d points from a dataset by specifying connectivity as a minimum euclidean distance. My algorithm right now is simply a 3d adaptation of the naive flood fill.

size_t PointSegmenter::growRegion(size_t & seed, size_t segNumber) {
    size_t numPointsLabeled = 0;

    //alias for points to avoid retyping
    vector<Point3d> & points = _img.points;
    deque<size_t> ptQueue;
    ptQueue.push_back(seed);
    points[seed].setLabel(segNumber);
    while (!ptQueue.empty()) {
        size_t currentIdx = ptQueue.front();
        ptQueue.pop_front();
        points[currentIdx].setLabel(segNumber);
        numPointsLabeled++;
        vector<int> newPoints = _img.queryRadius(currentIdx, SEGMENT_MAX_DISTANCE, MATCH_ACCURACY);
        for (int i = 0; i < (int)newPoints.size(); i++) {
            int newIdx = newPoints[i];
            Point3d &newPoint = points[newIdx];
            if(!newPoint.labeled()) {
                newPoint.setLabel(segNumber);
                ptQueue.push_back(newIdx);
            }
        }
    }

    //NOTE to whoever wrote the other code, the compiler optimizes i++ 
    //to ++i in cases like these, so please don't change them just for speed :)
    for (size_t i = seed; i < points.size(); i++) {
        if(!points[i].labeled()) {
            //search for an unlabeled point to serve as the next seed.
            seed = i;
            return numPointsLabeled;
        }
    }
    return numPointsLabeled;
}

Where this code snippet is ran again for the new seed, and _img.queryRadius() is a fixed radius search with the ANN library:

vector<int> Image::queryRadius(size_t index, double range, double epsilon) {
    int k = kdTree->annkFRSearch(dataPts[index], range*range, 0);
    ANNidxArray nnIdx = new ANNidx[k];
    kdTree->annkFRSearch(dataPts[index], range*range, k, nnIdx);
    vector<int> outPoints;
    outPoints.reserve(k);
    for(int i = 0; i < k; i++) {
        outPoints.push_back(nnIdx[i]);
    }
    delete[] nnIdx;
    return outPoints;
}

My problem with this code is that it runs waaaaaaaaaaaaaaaay too slow for large datasets. If I'm not mistaken, this code will do a search for every single point, and the searches are O(NlogN), giving this a time complexity of (N^2*log(N)).

In addition to that, deletions are relatively expensive if I remember right from KD trees, but also not deleting points creates problems in that each point can be searched hundreds of times, by every neighbor close to it.

So my question is, is there a better way to do this? Especially in a way that will grow linearly with the dataset?

Thanks for any help you may be able to provide

EDIT
I have tried using a simple sorted list like dash-tom-bang said, but the result was even slower than what I was using before. I'm not sure if it was the implementation, or it was just simply too slow to iterate through every point and check euclidean distance (even when just using squared distance.

Is there any other ideas people may have? I'm honestly stumped right now.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

怪我入戏太深 2024-09-25 14:14:13

我提出以下算法：

计算数据点的 3D Delaunay 三角剖分。
删除所有比阈值距离长的边，与步骤 3 结合时为 O(N)。
中查找大小为 O(N) 的连通分量，这需要 O(N α(N))。

瓶颈是步骤 1，根据此页面，可以在 O(N²) 甚至 O(N log N) 内完成 http://www.ncgia.ucsb.edu/conf/SANTA_FE_CD-ROM/sf_papers/lattuada_roberto/paper.html 。但它绝对不是 100 行算法。

回复收藏 0 原文

悲喜皆因你 2024-09-25 14:14:13

当我沿着这些思路做一些事情时，我在数据集之外的某个地方选择了一个“原点”，并按所有点到该原点的距离对它们进行排序。然后，我在每一步都有一组小得多的点可供选择，而且我只需要遍历所考虑的点周围的“洋葱皮”区域。您将检查相邻点，直到到最近点的距离小于您正在检查的范围的宽度。

虽然这对我来说效果很好，但可以通过沿一个轴对所有点进行排序（这代表无限远的“原点”）然后再次检查点直到您的“搜索宽度”超过来实现类似的版本到目前为止找到的最近点的距离。

回复收藏 0 原文

不必在意 2024-09-25 14:14:13

点应该更好地组织。为了更有效地搜索而不是向量，您需要某种哈希映射，其中哈希冲突意味着两个点彼此靠近（因此您可以使用哈希冲突来发挥自己的优势）。例如，您可以将空间划分为大小等于 SEGMENT_MAX_DISTANCE 的立方体，并使用返回整数三元组而不只是整数的哈希函数，其中三元组的每个部分都计算为点。/SEGMENT_MAX_DISTANCE。

现在，对于这个新集合中的每个点，您仅搜索同一立方体以及相邻空间立方体中的点。这大大减少了搜索空间。

回复收藏 0 原文

~没有更多了~