给定两个(大)点集,我如何有效地找到彼此最接近的点对?
我需要解决一个计算问题,该问题归结为搜索两个集合之间最接近的点对。问题是这样的:
给定欧几里德空间中的一组点 A 和一组点 B,找到所有对 (a,b),使得 b 是 B 中与 a 最近的点,a 是 A 中最近的点至 b.
集合 A 和 B 的大小大致相等,我们将这个大小称为 N。对于我的特定问题,N 大约为 250,000。
蛮力解决方案是将每个点与其他每个点进行比较,其时间复杂度为二次。有没有更有效的算法来做到这一点?
I need to solve a computational problem that boils down to searching for reciprocally-nearest pairs of points between two sets. The problem goes something like this:
Given a set of points A and a set of points B in euclidean space, find all pairs (a,b) such that b is the closest point in B to a and a is the closest point in A to b.
The sets A and B are of approximately equal size, and we will call this size N. For my particular problem N is approximately 250,000.
The brute force solution is to compare every point against every other point, which has quadratic time complexity. Is there any more efficient algorithm for doing this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
当我必须进行最近邻搜索时,我发现一个非常有用的数据结构是k-d-tree。 Wikipedia 有很好的概述,这是一个很好的深入讨论如果您正在实现自己的算法(尽管很可能已经存在一个库 - 您没有提及您正在使用哪种语言)。 kd 树最重要的一点是它允许在 O(log N) 时间内执行最近邻搜索。
这样,您可以在 O(N log N) 时间内生成两个列表:A 的成员及其在 B 中的最近邻居以及 B 的成员及其在 A 中的最近邻居。然后,您可以比较列表以查看哪些对匹配。如果天真地这样做,那就是 O(N^2),尽管您可能可以想出一种更快的方法。
[编辑]你让我思考;这是我的第二个想法:
根据我的计算,这是 O(N log N)。
A data structure I found very useful when I had to do nearest neighbour searches was a k-d-tree. Wikipedia has a nice overview and this is an excellent in-depth discussion of the algorithm if you're implementing your own (although a library may well exist already - you don't mention which language you're using). The most important thing about a kd-tree is that it allows nearest-neghbour searches to be performed in O(log N) time.
In that way, you could produce two lists of - the members of A and their nearest neighbour in B and the members of B and their nearest neighbour in A - in O(N log N) time. Then, you could compare the lists to see which pairs match. Done naively, that's O(N^2), though you might be able to think of a way to do it faster.
[edit] You've got me thinking; here's my second thought:
By my reckoning, that's O(N log N).
抱歉,我选择了一个相当旧的线程,但我只是想添加一个我在算法设计课教科书中找到的解决方案:
有一种分而治之(想想合并排序)的方法来解决这个问题,应该是
O(n logn)
,我只看到它用于查找一组点内的最短距离,但它应该很容易适应要求每个配对由来自不同集合的点组成。d
)p
) 并检查之间所有点的距离p_x
和p_x + d
,如果其中任何一个距离短于d
(即要返回的d
),否则返回d
。Sorry for picking up a rather old thread but I just wanted to add a solution I've found in my textbook for an Algorithm Design class:
There is a divide-and-conquer (think merge-sort) approach to solve this problem that should be
O(n logn)
, I've only seen it for finding the shortest distance within one set of points but it should be easily adapted to require each pairing to consist of points from different sets.d
)p
) in the left half and check the distance for all points betweenp_x
andp_x + d
, if any of these distances are shorter thand
that is thed
to return, otherwise returnd
.老线程,但我看到有一个最近的评论。
我相信对于 n 维点集,可以通过查找到集合差值原点的近点来找到两个集合之间的近点。您可以查找贝尔实验室的 Phillip Wolfe 撰写的论文,他在其中阐述了该算法。你可以这样想:在集合 A 中取一个随机点,在集合 B 中找到最近的点,然后在集合 B 中找到距离该点最近的点,依此类推。 http://link.springer.com/article/10.1007%2FBF01580381
Old thread, but I see there is a pretty recent comment.
I believe for an n dimensional set of points the near point between two sets can be found by finding the near point to the origin of the set difference. You can seek out the paper by Phillip Wolfe of Bell Labs where he lays out the algorithm. You can think of it by taking a random point in set A, finding the closest point in set B, then finding the closest point to the point in set B and so on. http://link.springer.com/article/10.1007%2FBF01580381