2D 中所有 k 个最近邻,C++
我需要为数据集的每个点找到其所有最近的邻居。该数据集包含大约。 1000 万个 2D 点。数据接近网格,但不形成精确的网格...
此选项排除(在我看来)KD 树的使用,其中基本假设是没有点具有相同的 x 坐标和 y 坐标。
我需要一个快速算法 O(n) 或更好的算法(但实现起来不太困难:-)))来解决这个问题...由于 boost 没有标准化,我不想使用它...
感谢您的回答或代码示例...
I need to find for each point of the data set all its nearest neighbors. The data set contains approx. 10 million 2D points. The data are close to the grid, but do not form a precise grid...
This option excludes (in my opinion) the use of KD Trees, where the basic assumption is no points have same x coordinate and y coordinate.
I need a fast algorithm O(n) or better (but not too difficult for implementation :-)) ) to solve this problem ... Due to the fact that boost is not standardized, I do not want to use it ...
Thanks for your answers or code samples...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我会执行以下操作:
在点之上创建一个更大的网格。
线性地遍历这些点,对于每个点,找出它属于哪个大“单元”(并将这些点添加到与该单元关联的列表中)。
(这可以在每个点的恒定时间内完成,只需对点的坐标进行整数除法即可。)
现在再次线性地遍历这些点。要找到 10 个最近的邻居,您只需查看相邻的较大单元格中的点。
由于您的点分布相当均匀,因此您可以按照与每个(大)单元格中的点数成比例的时间执行此操作。
这是描述情况的(丑陋的)图片:
单元格必须足够大以容纳(中心)相邻单元格包含最近的 10 个点,但足够小以加快计算速度。您可以将其视为“哈希函数”,您可以在同一个存储桶中找到最近的点。
(请注意,严格来说,它不是 O(n),但通过调整较大单元格的大小,您应该足够接近。:-)
I would do the following:
Create a larger grid on top of the points.
Go through the points linearly, and for each one of them, figure out which large "cell" it belongs to (and add the points to a list associated with that cell).
(This can be done in constant time for each point, just do an integer division of the coordinates of the points.)
Now go through the points linearly again. To find the 10 nearest neighbors you only need to look at the points in the adjacent, larger, cells.
Since your points are fairly evenly scattered, you can do this in time proportional to the number of points in each (large) cell.
Here is an (ugly) pic describing the situation:
The cells must be large enough for (the center) and the adjacent cells to contain the closest 10 points, but small enough to speed up the computation. You could see it as a "hash-function" where you'll find the closest points in the same bucket.
(Note that strictly speaking it's not O(n) but by tweaking the size of the larger cells, you should get close enough. :-)
我使用了一个名为 ANN (近似最近邻)的库,取得了巨大成功。它确实使用了 Kd 树方法,尽管有不止一种算法可供尝试。我用它来定位三角表面上的点。你可能会有一些运气。它很小,只需添加其源代码即可轻松包含在我的库中。
祝这个有趣的任务好运!
I have used a library called ANN (Approximate Nearest Neighbour) with great success. It does use a Kd-tree approach, although there was more than one algorithm to try. I used it for point location on a triangulated surface. You might have some luck with it. It is minimal and was easy to include in my library just by dropping in its source.
Good luck with this interesting task!