NumPy:如何快速规范化许多向量?

发布于 2024-09-02 07:29:42 字数 502 浏览 5 评论 0原文

在 NumPy 中如何优雅地规范化向量列表?

下面是一个不起作用的示例:

from numpy import *

vectors = array([arange(10), arange(10)])  # All x's, then all y's
norms = apply_along_axis(linalg.norm, 0, vectors)

# Now, what I was expecting would work:
print vectors.T / norms  # vectors.T has 10 elements, as does norms, but this does not work

最后一个操作产生“形状不匹配:对象无法广播到单个形状”。

如何使用 NumPy 优雅地完成向量中二维向量的标准化?

编辑:为什么上述方法不起作用,而向norms添加维度却有效(根据我下面的回答)?

How can a list of vectors be elegantly normalized, in NumPy?

Here is an example that does not work:

from numpy import *

vectors = array([arange(10), arange(10)])  # All x's, then all y's
norms = apply_along_axis(linalg.norm, 0, vectors)

# Now, what I was expecting would work:
print vectors.T / norms  # vectors.T has 10 elements, as does norms, but this does not work

The last operation yields "shape mismatch: objects cannot be broadcast to a single shape".

How can the normalization of the 2D vectors in vectors be elegantly done, with NumPy?

Edit: Why does the above not work while adding a dimension to norms does work (as per my answer below)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

(り薆情海 2024-09-09 07:29:42

幅度

计算我遇到这个问题的 并对你的标准化方法感到好奇。我使用不同的方法来计算幅度。 注意:我通常还会计算最后一个索引(在本例中为行,而不是列)的范数。

magnitudes = np.sqrt((vectors ** 2).sum(-1))[..., np.newaxis]

但是,通常情况下,我只是像这样进行标准化:

vectors /= np.sqrt((vectors ** 2).sum(-1))[..., np.newaxis]

时间比较

我运行了一个测试来比较时间,并发现我的方法快了很多,但 Freddie Witherdon 的建议甚至更快。

import numpy as np    
vectors = np.random.rand(100, 25)

# OP's
%timeit np.apply_along_axis(np.linalg.norm, 1, vectors)
# Output: 100 loops, best of 3: 2.39 ms per loop

# Mine
%timeit np.sqrt((vectors ** 2).sum(-1))[..., np.newaxis]
# Output: 10000 loops, best of 3: 13.8 us per loop

# Freddie's (from comment below)
%timeit np.sqrt(np.einsum('...i,...i', vectors, vectors))
# Output: 10000 loops, best of 3: 6.45 us per loop

但请注意,正如 StackOverflow 答案 所指出的,einsum 并未进行一些安全检查,因此您应该确保向量的 dtype 足以足够准确地存储幅度的平方。

Computing the magnitude

I came across this question and became curious about your method for normalizing. I use a different method to compute the magnitudes. Note: I also typically compute norms across the last index (rows in this case, not columns).

magnitudes = np.sqrt((vectors ** 2).sum(-1))[..., np.newaxis]

Typically, however, I just normalize like so:

vectors /= np.sqrt((vectors ** 2).sum(-1))[..., np.newaxis]

A time comparison

I ran a test to compare the times, and found that my method is faster by quite a bit, but Freddie Witherdon's suggestion is even faster.

import numpy as np    
vectors = np.random.rand(100, 25)

# OP's
%timeit np.apply_along_axis(np.linalg.norm, 1, vectors)
# Output: 100 loops, best of 3: 2.39 ms per loop

# Mine
%timeit np.sqrt((vectors ** 2).sum(-1))[..., np.newaxis]
# Output: 10000 loops, best of 3: 13.8 us per loop

# Freddie's (from comment below)
%timeit np.sqrt(np.einsum('...i,...i', vectors, vectors))
# Output: 10000 loops, best of 3: 6.45 us per loop

Beware though, as this StackOverflow answer notes, there are some safety checks not happening with einsum, so you should be sure that the dtype of vectors is sufficient to store the square of the magnitudes accurately enough.

燕归巢 2024-09-09 07:29:42

好吧,除非我错过了什么,否则这确实有效:

vectors / norms

您建议中的问题是广播规则。

vectors  # shape 2, 10
norms  # shape 10

形状的长度不一样!因此,规则是首先将小形状在左侧扩展一:

norms  # shape 1,10

您可以通过调用手动执行此操作:

vectors / norms.reshape(1,-1)  # same as vectors/norms

如果您想计算vectors.T/norms,您可以必须手动进行重塑,如下所示:

vectors.T / norms.reshape(-1,1)  # this works

Well, unless I missed something, this does work:

vectors / norms

The problem in your suggestion is the broadcasting rules.

vectors  # shape 2, 10
norms  # shape 10

The shape do not have the same length! So the rule is to first extend the small shape by one on the left:

norms  # shape 1,10

You can do that manually by calling:

vectors / norms.reshape(1,-1)  # same as vectors/norms

If you wanted to compute vectors.T/norms, you would have to do the reshaping manually, as follows:

vectors.T / norms.reshape(-1,1)  # this works
書生途 2024-09-09 07:29:42

好吧:NumPy 的数组形状广播将维度添加到数组形状的左侧,而不是右侧。然而,可以指示 NumPy 在 norms 数组的右侧添加一个维度:

print vectors.T / norms[:, newaxis]

确实有效!

Alright: NumPy's array shape broadcast adds dimensions to the left of the array shape, not to its right. NumPy can however be instructed to add a dimension to the right of the norms array:

print vectors.T / norms[:, newaxis]

does work!

吐个泡泡 2024-09-09 07:29:42

scikit learn 中已经有一个函数:

import sklearn.preprocessing as preprocessing
norm =preprocessing.normalize(m, norm='l2')*

更多信息位于:

http://scikit-learn.org/stable/modules /预处理.html

there is already a function in scikit learn:

import sklearn.preprocessing as preprocessing
norm =preprocessing.normalize(m, norm='l2')*

More info at:

http://scikit-learn.org/stable/modules/preprocessing.html

提笔落墨 2024-09-09 07:29:42

我首选的向量归一化方法是使用 numpy 的 inner1d 来计算它们的大小。以下是迄今为止与inner1d 相比的建议

import numpy as np
from numpy.core.umath_tests import inner1d
COUNT = 10**6 # 1 million points

points = np.random.random_sample((COUNT,3,))
A      = np.sqrt(np.einsum('...i,...i', points, points))
B      = np.apply_along_axis(np.linalg.norm, 1, points)   
C      = np.sqrt((points ** 2).sum(-1))
D      = np.sqrt((points*points).sum(axis=1))
E      = np.sqrt(inner1d(points,points))

print [np.allclose(E,x) for x in [A,B,C,D]] # [True, True, True, True]

:使用cProfile 测试性能:

import cProfile
cProfile.run("np.sqrt(np.einsum('...i,...i', points, points))**0.5") # 3 function calls in 0.013 seconds
cProfile.run('np.apply_along_axis(np.linalg.norm, 1, points)')       # 9000018 function calls in 10.977 seconds
cProfile.run('np.sqrt((points ** 2).sum(-1))')                       # 5 function calls in 0.028 seconds
cProfile.run('np.sqrt((points*points).sum(axis=1))')                 # 5 function calls in 0.027 seconds
cProfile.run('np.sqrt(inner1d(points,points))')                      # 2 function calls in 0.009 seconds

inner1d 计算幅度的速度比einsum 快。因此,使用inner1d进行规范化:

n = points/np.sqrt(inner1d(points,points))[:,None]
cProfile.run('points/np.sqrt(inner1d(points,points))[:,None]') # 2 function calls in 0.026 seconds

针对scikit进行测试:

import sklearn.preprocessing as preprocessing
n_ = preprocessing.normalize(points, norm='l2')
cProfile.run("preprocessing.normalize(points, norm='l2')") # 47 function calls in 0.047 seconds
np.allclose(n,n_) # True

结论:使用inner1d似乎是最好的选择

My preferred way to normalize vectors is by using numpy's inner1d to calculate their magnitudes. Here's what's been suggested so far compared to inner1d

import numpy as np
from numpy.core.umath_tests import inner1d
COUNT = 10**6 # 1 million points

points = np.random.random_sample((COUNT,3,))
A      = np.sqrt(np.einsum('...i,...i', points, points))
B      = np.apply_along_axis(np.linalg.norm, 1, points)   
C      = np.sqrt((points ** 2).sum(-1))
D      = np.sqrt((points*points).sum(axis=1))
E      = np.sqrt(inner1d(points,points))

print [np.allclose(E,x) for x in [A,B,C,D]] # [True, True, True, True]

Testing performance with cProfile:

import cProfile
cProfile.run("np.sqrt(np.einsum('...i,...i', points, points))**0.5") # 3 function calls in 0.013 seconds
cProfile.run('np.apply_along_axis(np.linalg.norm, 1, points)')       # 9000018 function calls in 10.977 seconds
cProfile.run('np.sqrt((points ** 2).sum(-1))')                       # 5 function calls in 0.028 seconds
cProfile.run('np.sqrt((points*points).sum(axis=1))')                 # 5 function calls in 0.027 seconds
cProfile.run('np.sqrt(inner1d(points,points))')                      # 2 function calls in 0.009 seconds

inner1d computed the magnitudes a hair faster than einsum. So using inner1d to normalize:

n = points/np.sqrt(inner1d(points,points))[:,None]
cProfile.run('points/np.sqrt(inner1d(points,points))[:,None]') # 2 function calls in 0.026 seconds

Testing against scikit:

import sklearn.preprocessing as preprocessing
n_ = preprocessing.normalize(points, norm='l2')
cProfile.run("preprocessing.normalize(points, norm='l2')") # 47 function calls in 0.047 seconds
np.allclose(n,n_) # True

Conclusion: using inner1d seems to be the best option

通知家属抬走 2024-09-09 07:29:42

对于二维情况,使用 np.hypot(vectors[:,0],vectors[:,1]) 看起来比 Freddie Witherden 的 np.sqrt(np.einsum) 更快('...i,...i', 矢量, 矢量)) 用于计算幅度。 (参考 Geoff 的答案)

import numpy as np

# Generate array of 2D vectors.
vectors = np.random.random((1000,2))

# Using Freddie's
%timeit np.sqrt(np.einsum('...i,...i', vectors, vectors))
# Output: 11.1 µs ± 173 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

# Using numpy.hypot()
%timeit np.hypot(vectors[:,0], vectors[:,1])
# Output: 6.81 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

要获得归一化向量,请执行以下操作:

vectors /= np.hypot(vectors[:,0], vectors[:,1])

For the two-dimensional case, using np.hypot(vectors[:,0],vectors[:,1]) looks to be faster than Freddie Witherden's np.sqrt(np.einsum('...i,...i', vectors, vectors)) for calculating the magnitudes. (Referencing answer by Geoff)

import numpy as np

# Generate array of 2D vectors.
vectors = np.random.random((1000,2))

# Using Freddie's
%timeit np.sqrt(np.einsum('...i,...i', vectors, vectors))
# Output: 11.1 µs ± 173 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

# Using numpy.hypot()
%timeit np.hypot(vectors[:,0], vectors[:,1])
# Output: 6.81 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

To get the normalised vectors then do:

vectors /= np.hypot(vectors[:,0], vectors[:,1])
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文