给定欧氏距离范围划分邻居点
给定两个点 P,Q 和一个 delta,我定义了等价关系 ~=,其中如果 EuclideanDistance(P,Q)
<= delta,则 P ~= Q。现在,给定一组 S 的 n 个点,在示例中 S = (A, B, C, D, E, F) 且 n = 6(事实点实际上线段的端点可以忽略不计),是否有一种算法在平均情况下比 O(n^2) 复杂度更好地找到集合的分区(子集的代表元素并不重要)?
到目前为止,试图找到这个问题的理论定义并不成功:k 均值聚类、最近邻搜索和其他对我来说似乎是不同的问题。该图显示了我在应用程序中需要执行的操作。
有什么提示吗?谢谢
编辑:而实际问题(簇附近的点给出了某种不变量)在一般情况下,应该可以比 O(n^2) 更好地解决,我的问题定义中有一个严重的缺陷: =~ 不是等价关系,因为它不是一个简单的事实尊重传递性。我认为这是这个问题不容易解决并且需要先进技术的主要原因。我很快就会发布我的实际解决方案:当近点都满足定义的 =~ 时应该有效。当极点分开的点不遵守这种关系但它们与聚集点的重心有关时,可能会失败。它适用于我的输入数据空间,但可能不适用于您的输入数据空间。有谁知道这个问题的完整正式处理(以及解决方案)?
Given two points P,Q and a delta, I defined the equivalence relation ~=, where P ~= Q if EuclideanDistance(P,Q)
<= delta. Now, given a set S of n points, in the example S = (A, B, C, D, E, F) and n = 6 (the fact points are actually endpoints of segments is negligible), is there an algorithm that has complexity better than O(n^2) in the average case to find a partition of the set (the representative element of the subsets is unimportant)?
Attempts to find theoretical definitions of this problem were unsuccessful so far: k-means clustering, nearest neighbor search and others seems to me different problems. The picture shows what I need to do in my application.
Any hint? Thanks
EDIT: while the actual problem (cluster near points given some kind of invariant) should be solvable in better better than O(n^2) in the average case, there's a serious flaw in my problem definition: =~ is not a equivalence relation because of the simple fact it doesn't respect the transitive property. I think this is the main reason this problem is not easy to solve and need advanced techiques. Will post very soon my actual solution: should work when near points all satisfy the =~ as defined. Can fail when poles apart points doesn't respect the relation but they are in relation with the center of gravity of clustered points. It works well with my input data space, may not with yours. Do anyone know a full formal tratment of this problem (with solution)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
重述该问题的一种方法如下:给定一组
n
2D 点,对于每个点p
找到包含在直径 < 的圆中的点集code>delta 以p
为中心。简单的线性搜索给出了您提到的
O(n^2)
算法。在我看来,这是在最坏的情况下可以做的最好的事情。当集合中的所有点都包含在直径 <=
delta
的圆内时,每个n
查询都必须返回O(n)
点,总体复杂度为O(n^2)
。然而,人们应该能够在更合理的数据集上做得更好。
看看这个(特别是关于空间分区的部分)和KD 树。在合理的情况下,后者应该为您提供一个子
O(n^2)
算法。可能有一种不同的方式来看待问题,这种方式会带来更好的复杂性;我脑子里想不出任何东西。
One way to restate the problem is as follows: given a set of
n
2D points, for each pointp
find the set of points that are contained with the circle of diameterdelta
centred atp
.A naive linear search gives the
O(n^2)
algorithm you allude to.It seems to me that this is the best one can do in the worst case. When all points in the set are contained within a circle of diameter <=
delta
, each ofn
queries would have to returnO(n)
points, giving anO(n^2)
overall complexity.However, one should be able to do better on more reasonable datasets.
Take a look at this (esp. the section on space partitioning) and KD-trees. The latter should give you a sub-
O(n^2)
algorithm in reasonable cases.There might be a different way of looking at the problem, one that would give better complexity; I can't think of anything off the top of my head.
绝对是 Quadtree 的问题。
您还可以尝试对每个坐标进行排序并使用这两个列表(排序是
n*log(n)
,并且您可以仅检查满足dx <= delta & 的点。另外,您可以将它们放入具有两层指针的排序列表中:一层用于解析 OX,另一层用于解析 OY。
Definitely a problem for Quadtree.
You could also try sorting on each coordonate and playing with these two lists (sorting is
n*log(n)
, and you can check only the points that satisfiesdx <= delta && dy <= delta
. Also, you could put them in a sorted list with two levels of pointers: one for parsing on OX and another for OY.对于每个点,计算距原点的距离 D(n),这是一个 O(n) 操作。
使用 O(n^2) 算法查找 D(ab) < 的匹配项delta,跳过 D(a)-D(b) >三角洲。
由于跳过的数字(希望很大),平均而言,结果必须优于 O(n^2)。
For each point, calculate the distance D(n) from the origin, this is an O(n) operation.
Use a O(n^2) algorithm to find matches where D(a-b) < delta, skipping D(a)-D(b) > delta.
The result, on average, must be better than O(n^2) due to the (hopefully large) number skipped.
这是一个 C# KdTree 实现,应该解决“查找 delta 内点 P 的所有邻居”问题。它大量使用函数式编程技术(是的,我喜欢 Python)。它已经过测试,但我对
_TreeFindNearest()
的理解仍然存在疑问。解决问题的代码(或伪代码)“在平均情况下,在给定 ~= 关系的情况下,划分一组 n 点,其效果优于 O(n^2)”,该问题发布在另一个答案中。This is a C# KdTree implementation that should solve the "Find all neighbors of a point P within a delta". It makes heavy use of functional programming techniques (yes, I love Python). It's tested but I still have doubts doubts in understanding
_TreeFindNearest()
. The code (or pseudo code) to solve the problem "Partition a set of n points given a ~= relation in better than O(n^2) in the average case" is posted in another answer.以下 C# 方法与 KdTree 类、Join()(枚举作为参数传递的所有集合)和 Shuffled()(返回传递集合的打乱版本)方法一起解决了我的问题。当
referenceVectors
与vectorsToRelocate
相同的向量时,可能会出现一些有缺陷的情况(请阅读问题中的编辑),正如我在问题中所做的那样。The following C# method, together with KdTree class, Join() (enumerate all collections passed as argument) and Shuffled() (returns a shuffled version of the passed collection) methods solve the problem of my question. There may be some flawed cases (read EDITs in the question) when
referenceVectors
are the same vectors asvectorsToRelocate
, as I do in my problem.