计算两组 k 维向量的最小距离的快速方法
I 两组 k 维向量,其中 k 约为 500,向量数量通常较少。我想计算两组之间的(任意定义的)最小距离。 一个简单的方法是这样的:
(loop for a in set1
for b in set2
minimizing (distance a b))
然而,这需要 O(n² * distance) 计算。有没有更快的方法来做到这一点?
I two sets of k-dimensional vectors, where k is around 500 and the number of vectors is usually smaller. I want to compute the (arbitrarily defined) minimal distance between the two sets.
A naive approach would be this:
(loop for a in set1
for b in set2
minimizing (distance a b))
However, this requires O(n² * distance) computations. Is there a faster way of doing this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
当距离是任意的时,我认为你不能做得比 O(n^2) 更好(你必须检查每个可能的距离!)。对于给定的距离函数,我们也许能够利用该函数的属性,但不会有任何通用算法可以比 O(n^2) 更好地处理任何距离函数(即o(n^2) :注意小哦)。
如果您的数据是动态的,并且您必须在不同时间不断获取最接近的点对,对于任意距离函数,Eppstein 的以下论文可能会有所帮助(其中具有特殊的更新操作,以便快速找到最接近的点对) :
http://www .ics.uci.edu/~eppstein/projects/pairs/Papers/Epp-SODA-98.pdf。 [O(nlog^2(n)) 更新时间]
http:// academic.research.microsoft.com/Paper/1847461.aspx
您将能够将上述一组算法调整为两组算法(例如,通过定义同一组点之间的距离)为无穷大)。
对于欧几里德类型 (L^p) 距离,有已知的 O(nlogn) 时间算法,该算法适用于给定的点集(即您不需要任何特殊的更新算法):
当然,L^p 适用于一组,但您也许可以将其调整为适用于两组。
如果您提供距离函数,我们可能会更容易为您提供帮助。
希望有帮助。祝你好运!
I don't think you can do better than O(n^2) when the distance is arbitrary (you have to examine each of the possible distances!). For a given distance function we might be able to exploit the properties of the function, but there won't be any general algorithm which works with any distance function in better than O(n^2) (i.e. o(n^2) : note smallOh).
If your data is dynamic and you have to keep obtaining the closest pair of points at different times, for arbitrary distance function the following papers by Eppstein will probably help (which have special update operations in order to make finding the closest pair of points quick):
http://www.ics.uci.edu/~eppstein/projects/pairs/Papers/Epp-SODA-98.pdf. [O(nlog^2(n)) update time]
http://academic.research.microsoft.com/Paper/1847461.aspx
You will be able to adapt the above one set algorithms to a two set algorithm (for instance, by defining distance between points of same set to be infinity).
For Euclidean type (L^p) distance, there are known O(nlogn) time algorithms, which work with a given set of points (i.e. you dont need to have any special update algorithms):
Of course, the L^p is for one set, but you might be able to adapt it for two sets.
If you give your distance function, it might be easier for us to help you.
Hope it helps. Good luck!
如果向量的分量是标量,我猜想对于中等 k=500 的情况,O(n²) 方法可能是您能得到的最快的。您可以通过最小化距离²来简化计算。另外,距离(A_i, B_i) = 距离(B_i, A_i),因此请确保只比较它们一次(您只有 500!/(500-2)! 对,而不是 500²)。
如果分量是 m 维向量 A 和 B,则可以将向量 A 的分量存储在 R-tree 或 kd-tree 然后找到最接近的通过迭代向量 B 的所有分量并从 A 中找到最接近的伙伴来配对——这将是 O(n)。不要忘记 big-O 代表 n-> 无穷大,因此树可能带有一些相当昂贵的常数项(即,这种方法可能只对大 k 或向量 A 始终相同才有意义)。
If the components of your vectors are scalars I would guess that for your case of a moderate k=500 the O(n²) approach is probably as fast as you can get. You can simplify your calculation by minimizing distance². Also, the distance(A_i, B_i) = distance(B_i, A_i), so make sure you only compare them once (you only have 500!/(500-2)! pairs, not 500²).
If the components are m-dimensional vectors A and B instead, you could store the components of vector A in a R-tree or a kd-tree and then find the closest pair by iterating over all components of vector B and finding its closest partner from A--- this would be O(n). Don't forget that big-O is for n->infinity, so the trees might come with some pretty expensive constant term (i.e. this approach might only make sense for large k or if vector A is always the same).
将两组坐标放入空间索引,例如KD-tree。
然后计算这两个索引的交集。
Put the two sets of coordinates into a Spatial Index, e.g. a KD-tree.
You then compute the intersection of these two indices.