距离矩阵的近似估计
我有一组 N 个对象,我想计算 NxN 距离矩阵。有时,我的 N 个对象集非常大,我想通过仅计算距离比较的子集来计算 NxN 距离矩阵的近似值。
谁能指出我计算全距离矩阵近似值的方向?我心里有一些想法,但我想避免重新发明轮子。
编辑:该算法类型的一个示例将利用以下事实:如果对象 A 和对象 B 之间的距离非常小,并且对象 B 和对象 C 之间的距离非常小,则必须存在某种程度的距离。物体 A 和 C 之间的距离较短。
I have a set of N objects, and I'd like to compute a NxN distance matrix. Sometimes my set of N objects is very large, and I'd like to compute an approximation to the NxN distance matrix by only computing a subset of the distance comparisons.
Can anyone point me in the direction of something that calculates approximations to a full distance matrix? I have some ideas in mind, but I'd like to avoid re-inventing the wheel.
Edit: An example of the type of algorithm would take advantage of the fact that if there is a very small distance between object A and object B, and there is a very small distance between object B and object C, there has to be a somewhat short distance between objects A and C.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我有同样的问题,最终为其编写了Python代码:
https://github.com/jpeterbaker/lazyDistance< /a>
README.md 解释了如何使用三角不等式来更新每个距离的上限和下限。
只需将 Python 文件作为二维空间中示例的脚本运行即可。绘制的线是实际计算的唯一距离。
在我的版本中,节省时间并不是因为拥有大量对象。正如我所写的,它是一个 O(n^4) 算法,因此如果对象数量很大,它实际上比仅仅计算所有距离更糟糕。但是,当对象数量适中并且距离函数的计算成本非常昂贵时,我的方法将节省时间。它假设执行多个 O(n^2) 操作比单个距离测量更快。
如果 n 很大,您可以寻找更便宜的方法来决定下一步计算哪个距离(不涉及距离边界矩阵的 n^2 项的算术)。您也可能不需要每次执行此代码时都更新所有 2*n^2 边界。
I had this same question and ended up writing Python code for it:
https://github.com/jpeterbaker/lazyDistance
README.md explains how the triangle inequality can be used to update upper and lower bounds for each distance.
Just run the Python file as a script for an example in 2-dimensional space. The plotted lines are the only distances that were actually calculated.
In my version, the time savings aren't about having a large number of objects. As I've written it, it's a O(n^4) algorithm, so it's actually worse than just calculating all distances if the number of objects is large. But my method will save time when you have a modest number of objects and the distance function is very expensive to calculate. It assumes that it is faster to do several O(n^2) operations rather than a single distance measurement.
If n is large, you could look for cheaper methods to decide which distance to calculate next (that don't involve arithmetic with n^2 entries of distance bounds matrices). You also may not need to update all 2*n^2 bounds every time that this code does.
老实说,我认为这取决于您希望近似值有多接近以及您的子集有多大。如果您只是想要对矩阵的外观有一些总体感觉,您可以对随机子集(包括最大和最小节点)进行简单的线性插值,以获得相当准确的(tm)结果。
我认为这里真正的技巧是找出启发式(线性、二次、等插值)和子集大小。您还可以计算出各个子集的距离矩阵,然后使用某种方法(线性、球线性、立方)对这些矩阵进行插值。
根据您的初始样本,这几乎是一种启发式试验和错误,直到您“哦,这足以满足我的需要”。
Honestly, I think it depends how close you want your approximation to be and how big your subset is. If you just want some overall feel of what the matrix will look like, you can do simple linear interpolation on a random subset (including the maximal and minimal nodes) getting pretty accurate (tm) results.
I think the real trick here is figuring out the heuristic (linear, quadratic, etc interpolation) and the subset size. You could also figure out the distance matrices of various subsets and then interpolate those matrices with some method (linear, spherical linear, cubic).
Depending on your initial sample, it's pretty much an heuristic trial and error until you go "oh that's good enough for what I need".
您的“对象”在网络上吗?如果对象位于网络中,您可以使用 this 或 this 产生所有对最短路径。如果没有,我认为您几乎只能计算所有 nxn 距离。
Are your "objects" on a network? If the objects are in a network, you can use this or this that yields the all-pairs shortest paths. If not, you're pretty much stuck with calculated all the n x n distances, I think.
您需要的解决方案类似于我们在图中常见的解决方案,您可以使用 Allpair最短路径用于查找距离,您还可以查看约翰逊算法
The solution you require is similar to what we commonly see in a graph, you can use All pair shortest path for finding the distance, you can also look at johnson's algorithm