两个不同 Numpy 数组中的点之间的最小欧氏距离,不在范围内
我有两个 x-y 坐标数组,我想找到一个数组中 每个 点之间的最小欧几里得距离 另一个数组中的所有点。数组的大小不一定相同。例如:
xy1=numpy.array(
[[ 243, 3173],
[ 525, 2997]])
xy2=numpy.array(
[[ 682, 2644],
[ 277, 2651],
[ 396, 2640]])
我当前的方法循环遍历 xy1 中的每个坐标 xy 并计算该坐标与其他坐标之间的距离。
mindist=numpy.zeros(len(xy1))
minid=numpy.zeros(len(xy1))
for i,xy in enumerate(xy1):
dists=numpy.sqrt(numpy.sum((xy-xy2)**2,axis=1))
mindist[i],minid[i]=dists.min(),dists.argmin()
有没有办法消除 for 循环并以某种方式在两个数组之间进行逐元素计算?我设想生成一个距离矩阵,我可以在其中找到每行或每列中的最小元素。
另一种看待问题的方式。假设我将 xy1
(长度 m)和 xy2
(长度 p)连接成 xy
(长度n),我存储原始数组的长度。理论上,我应该能够从这些坐标生成一个 nx n 距离矩阵,从中我可以获取 mx p 子矩阵。有没有办法有效地生成这个子矩阵?
I have two arrays of x-y coordinates, and I would like to find the minimum Euclidean distance between each point in one array with all the points in the other array. The arrays are not necessarily the same size. For example:
xy1=numpy.array(
[[ 243, 3173],
[ 525, 2997]])
xy2=numpy.array(
[[ 682, 2644],
[ 277, 2651],
[ 396, 2640]])
My current method loops through each coordinate xy
in xy1
and calculates the distances between that coordinate and the other coordinates.
mindist=numpy.zeros(len(xy1))
minid=numpy.zeros(len(xy1))
for i,xy in enumerate(xy1):
dists=numpy.sqrt(numpy.sum((xy-xy2)**2,axis=1))
mindist[i],minid[i]=dists.min(),dists.argmin()
Is there a way to eliminate the for loop and somehow do element-by-element calculations between the two arrays? I envision generating a distance matrix for which I could find the minimum element in each row or column.
Another way to look at the problem. Say I concatenate xy1
(length m) and xy2
(length p) into xy
(length n), and I store the lengths of the original arrays. Theoretically, I should then be able to generate a n x n distance matrix from those coordinates from which I can grab an m x p submatrix. Is there a way to efficiently generate this submatrix?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
(几个月后)
scipy.spatial.distance.cdist(X, Y)
给出所有距离对,
对于 X 和 Y 2 暗淡、3 暗淡 ...
它还做了22种不同的规范,详细
此处 。
(Months later)
scipy.spatial.distance.cdist( X, Y )
gives all pairs of distances,
for X and Y 2 dim, 3 dim ...
It also does 22 different norms, detailed
here .
要计算 m × p 距离矩阵,这应该可行:
.outer
调用生成两个这样的矩阵(沿两个轴的标量差),.hypot
调用将它们转换为相同形状的矩阵(标量欧氏距离)。To compute the m by p matrix of distances, this should work:
the
.outer
calls make two such matrices (of scalar differences along the two axes), the.hypot
calls turns those into a same-shape matrix (of scalar euclidean distances).接受的答案并未完全解决该问题,该问题要求找到两组点之间的最小距离,而不是两组中每个点之间的距离。
尽管对原始问题的直接解决方案确实包括计算每对之间的距离,然后找到最小值,但如果只对最小值感兴趣,则没有必要这样做距离。对于后一个问题存在一个更快的解决方案。
所有建议的解决方案的运行时间都为
m*p = len(xy1)*len(xy2)
。这对于小型数据集来说是可以的,但是可以编写一个缩放为m*log(p)
的最佳解决方案,从而为大型xy2
数据集节省大量成本。这种最佳执行时间缩放可以使用 scipy.spatial 来实现.KDTree 如下,
其中
mindist
是xy1
中每个点与xy2
中点集之间的最小距离The accepted answer does not fully address the question, which requests to find the minimum distance between the two sets of points, not the distance between every point in the two sets.
Although a straightforward solution to the original question indeed consists of computing the distance between every pair and subsequently finding the minimum one, this is not necessary if one is only interested in the minimum distances. A much faster solution exists for the latter problem.
All the proposed solutions have a running time that scales as
m*p = len(xy1)*len(xy2)
. This is OK for small datasets, but an optimal solution can be written that scales asm*log(p)
, producing huge savings for largexy2
datasets.This optimal execution time scaling can be achieved using scipy.spatial.KDTree as follows
where
mindist
is the minimum distance between each point inxy1
and the set of points inxy2
对于您想要执行的操作:
编辑:您可以使用
numpy.hypot
,而不是调用sqrt
、做平方等:For what you're trying to do:
Edit: Instead of calling
sqrt
, doing squares, etc., you can usenumpy.hypot
:我认为以下功能也有效。
说明
假设每一行
X
和Y
都是两组点的坐标。设它们的大小分别为
m X p
和p X n
。结果将生成一个大小为
m X n
的 numpy 数组,其中第(i, j)
条目是i
之间的距离分别是X
和Y
的第 code> 行和第j
行。I think the following function also works.
Explanation
Suppose each row of
X
andY
are coordinates of the two sets of points.Let their sizes be
m X p
andp X n
respectively.The result will produce a numpy array of size
m X n
with the(i, j)
-th entry being the distance between thei
-th row and thej
-th row ofX
andY
respectively.我强烈建议使用一些内置方法来计算平方,并且根是为优化计算方式而定制的,并且非常安全,可以防止溢出。
@alex 下面的答案在溢出方面是最安全的,而且也应该非常快。另外,对于单点,您可以使用 math.hypot,它现在支持超过 2 个维度。
安全问题
overflow/underflow/speeds
I highly recommend using some inbuilt method for calculating squares, and roots for they are customized for optimized way to calculate and very safe against overflows.
@alex answer below is the most safest in terms of overflow and should also be very fast. Also for single points you can use math.hypot which now supports more than 2 dimensions.
Safety concerns
overflow/underflow/speeds
我认为最直接和高效的解决方案是这样做:
I think that the most straightforward and efficient solution is to do it like this:
虽然这里的很多答案都很棒,但是还有另一种方法这里没有提到,使用 numpy 的向量化/广播属性来计算每个点之间的距离两个不同长度的不同数组(以及,如果需要,最接近的匹配)。我在这里发布它是因为它可以非常方便地掌握广播,并且它还优雅地解决了这个问题,同时保持非常高效。
假设您有两个像这样的数组:
您无法执行操作
ab
:numpy 抱怨操作数无法与形状一起广播 (6,2) (4,2).允许广播的技巧是手动添加 numpy 广播的维度。通过将维度
2
保留在两个重构数组中,numpy 知道它必须在此维度上执行操作。distance_matrix
的形状为(6,4)
:对于a
中的每个点,到b
中所有点的距离code> 被计算。然后,如果您想要“一个数组中的每个点与另一个数组中的所有点之间的最小欧几里得距离”,您可以这样做:这将返回
b
中最接近的点的索引a
的每个点。Although many answers here are great, there is another way which has not been mentioned here, using
numpy
's vectorization / broadcasting properties to compute the distance between each points of two different arrays of different length (and, if wanted, the closest matches). I publish it here because it can be very handy to master broadcasting, and it also solves this problem elengantly while remaining very efficient.Assuming you have two arrays like so:
You can't do the operation
a-b
: numpy complains withoperands could not be broadcast together with shapes (6,2) (4,2)
. The trick to allow broadcasting is to manually add a dimension for numpy to broadcast along to. By leaving the dimension2
in both reshaped arrays, numpy knows that it must perform the operation over this dimension.The
distance_matrix
has a shape(6,4)
: for each point ina
, the distances to all points inb
are computed. Then, if you want the "minimum Euclidean distance between each point in one array with all the points in the other array", you would do :This returns the index of the point in
b
that is closest to each point ofa
.