在numpy中的最快方法以获得阵列中n对的产品的距离

发布于 2025-01-28 11:21:52 字数 426 浏览 2 评论 0原文

例如,我有n点数:

A = [2, 3]
B = [3, 4]
C = [3, 3]
.
.
.

它们在类似的数组中:

arr = np.array([[2, 3], [3, 4], [3, 3]])

我需要在bfs中的所有成对距离(广度bfs(广度首次搜索)跟踪哪个距离是:a-> b,a-> c,b-> c。对于上述示例数据,结果将为[1.41、1.0、1.0]

编辑:我必须用numpy或核心库来完成它。

I have N number of points, for example:

A = [2, 3]
B = [3, 4]
C = [3, 3]
.
.
.

And they're in an array like so:

arr = np.array([[2, 3], [3, 4], [3, 3]])

I need as output all pairwise distances in BFS (Breadth First Search) order to track which distance is which, like: A->B, A->C, B->C. For the above example data, the result would be [1.41, 1.0, 1.0].

EDIT: I have to accomplish it with numpy or core libraries.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

我一直都在从未离去 2025-02-04 11:21:53

作为另一种方法,但类似于 ddejohn 答案,我们可以使用np。 triu_indices仅返回矩阵中的上三角形索引,这可能更具内存效率:

np.linalg.norm(arr - arr[:, None], axis=-1)[np.triu_indices(arr.shape[0], 1)]

这不需要其他模块,例如扁平和索引。它的性能类似于上述大数据的答案(例如,您可以通过arr = np.random.rand(10000,2)在COLAB上进行检查,这两个两者都将在4.6 s接近;它可能会在较大的数据中击败np.triuFlatten)。

我已经通过以下内存使用者测试了一次内存使用量,但是如果在内存使用方面很重要,则必须再次检查它(我不确定):

​试图将计算仅限于上三角形,该计算在测试阵列上加快了2至3次的代码。 数组大小的增长,该循环与先前方法之间的性能差

ind = np.arange(arr.shape[0] - 1)
sub_ind = ind + 1
result = np.zeros(sub_ind.sum())
j = 0
for i in range(ind.shape[0]):
    result[j:j+ind[-1-i]+1] = np.linalg.norm(arr[ind[i]] - arr[sub_ind[i]:], axis=-1)
    j += ind[-1-i]+1

随着 方式,减少内存消耗至少〜x4。 方法使得可以在更大的阵列上工作。

因此,这种 nofollow noreferrer“>基准:

# arr = np.random.rand(100, 2)
100 loops, best of 5: 459  µs per loop     (ddejohns --> np.triu & np.flatten) 
100 loops, best of 5: 528  µs per loop     (mine --> np.triu_indices) 
100 loops, best of 5: 1.42 ms per loop     (This method)
--------------------------------------
# arr = np.random.rand(1000, 2)
10  loops, best of 5: 49.9 ms per loop
10  loops, best of 5: 49.7 ms per loop
10  loops, best of 5: 30.4 ms per loop      (~x1.7)  The fastest
--------------------------------------
# arr = np.random.rand(10000, 2)
2   loops, best of 5: 4.56 s  per loop
2   loops, best of 5: 4.6  s  per loop
2   loops, best of 5: 1.85 s  per loop      (~x2.5)  The fastest

As an alternative method, but similar to ddejohn answer, we can use np.triu_indices which return just the upper triangular indices in the matrix, which may be more memory-efficient:

np.linalg.norm(arr - arr[:, None], axis=-1)[np.triu_indices(arr.shape[0], 1)]

This doesn't need additional modules like flattening and indexing. Its performance is similar to the aforementioned answer for large data (e.g. you can check it by arr = np.random.rand(10000, 2) on colab, which will be done near 4.6 s for both; It may beats the np.triu and flatten in larger data).

I have tested the memory usage one time by memory-profiler as follows, but it must be checked again if it be important in terms of memory usage (I'm not sure):

enter image description here

Update:

I have tried to limit the calculations just to the upper triangle, that speed the code up 2 to 3 times on the tested arrays. As array size grows, the performance difference between this loop and the previous methods by np.triu_indices or np.triu grows and be more obvious:

ind = np.arange(arr.shape[0] - 1)
sub_ind = ind + 1
result = np.zeros(sub_ind.sum())
j = 0
for i in range(ind.shape[0]):
    result[j:j+ind[-1-i]+1] = np.linalg.norm(arr[ind[i]] - arr[sub_ind[i]:], axis=-1)
    j += ind[-1-i]+1

Also, through this way, the memory consumption is reduced at least ~x4. So, this method made it possible to work on larger arrays and more quickly.

Benchmarks:

# arr = np.random.rand(100, 2)
100 loops, best of 5: 459  µs per loop     (ddejohns --> np.triu & np.flatten) 
100 loops, best of 5: 528  µs per loop     (mine --> np.triu_indices) 
100 loops, best of 5: 1.42 ms per loop     (This method)
--------------------------------------
# arr = np.random.rand(1000, 2)
10  loops, best of 5: 49.9 ms per loop
10  loops, best of 5: 49.7 ms per loop
10  loops, best of 5: 30.4 ms per loop      (~x1.7)  The fastest
--------------------------------------
# arr = np.random.rand(10000, 2)
2   loops, best of 5: 4.56 s  per loop
2   loops, best of 5: 4.6  s  per loop
2   loops, best of 5: 1.85 s  per loop      (~x2.5)  The fastest
战皆罪 2025-02-04 11:21:52

如果您可以使用它,Scipy具有一个功能:

In [2]: from scipy.spatial.distance import pdist

In [3]: pdist(arr)
Out[3]: array([1.41421356, 1.        , 1.        ])

If you can use it, SciPy has a function for this:

In [2]: from scipy.spatial.distance import pdist

In [3]: pdist(arr)
Out[3]: array([1.41421356, 1.        , 1.        ])
小苏打饼 2025-02-04 11:21:52

这是一个仅使用数字的解决方案(公平警告:它需要大量内存,不同于pdist)...

dists = np.triu(np.linalg.norm(arr - arr[:, None], axis=-1)).flatten()
dists = dists[dists != 0]

演示:

In [4]: arr = np.array([[2, 3], [3, 4], [3, 3], [5, 2], [4, 5]])

In [5]: pdist(arr)
Out[5]:
array([1.41421356, 1.        , 3.16227766, 2.82842712, 1.        ,
       2.82842712, 1.41421356, 2.23606798, 2.23606798, 3.16227766])

In [6]: dists = np.triu(np.linalg.norm(arr - arr[:, None], axis=-1)).flatten()

In [7]: dists = dists[dists != 0]

In [8]: dists
Out[8]:
array([1.41421356, 1.        , 3.16227766, 2.82842712, 1.        ,
       2.82842712, 1.41421356, 2.23606798, 2.23606798, 3.16227766])

timings(上面的解决方案包装在称为triu ):

In [9]: %timeit pdist(arr)
7.27 µs ± 738 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [10]: %timeit triu(arr)
25.5 µs ± 4.58 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Here's a numpy-only solution (fair warning: it requires a lot of memory, unlike pdist)...

dists = np.triu(np.linalg.norm(arr - arr[:, None], axis=-1)).flatten()
dists = dists[dists != 0]

Demo:

In [4]: arr = np.array([[2, 3], [3, 4], [3, 3], [5, 2], [4, 5]])

In [5]: pdist(arr)
Out[5]:
array([1.41421356, 1.        , 3.16227766, 2.82842712, 1.        ,
       2.82842712, 1.41421356, 2.23606798, 2.23606798, 3.16227766])

In [6]: dists = np.triu(np.linalg.norm(arr - arr[:, None], axis=-1)).flatten()

In [7]: dists = dists[dists != 0]

In [8]: dists
Out[8]:
array([1.41421356, 1.        , 3.16227766, 2.82842712, 1.        ,
       2.82842712, 1.41421356, 2.23606798, 2.23606798, 3.16227766])

Timings (with the solution above wrapped in a function called triu):

In [9]: %timeit pdist(arr)
7.27 µs ± 738 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [10]: %timeit triu(arr)
25.5 µs ± 4.58 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文