在返回向量的函数上使用 Numpy Vectorize
numpy.vectorize 接受函数 f:a->b 并将其转换为 g:a[]->b[]。
当 a
和 b
是标量时,这可以正常工作,但我想不出为什么它不能将 b 作为 ndarray
使用的原因> 或列表,即 f:a->b[] 和 g:a[]->b[][]
例如:
import numpy as np
def f(x):
return x * np.array([1,1,1,1,1], dtype=np.float32)
g = np.vectorize(f, otypes=[np.ndarray])
a = np.arange(4)
print(g(a))
这会产生:
array([[ 0. 0. 0. 0. 0.],
[ 1. 1. 1. 1. 1.],
[ 2. 2. 2. 2. 2.],
[ 3. 3. 3. 3. 3.]], dtype=object)
好的,这样会给出正确的值,但给出错误的数据类型。更糟糕的是:
g(a).shape
产量:
(4,)
所以这个数组几乎没有用。我知道我可以将它转换为:
np.array(map(list, a), dtype=np.float32)
给我我想要的东西:
array([[ 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1.],
[ 2., 2., 2., 2., 2.],
[ 3., 3., 3., 3., 3.]], dtype=float32)
但这既不高效也不Pythonic。你们中有人能找到一种更干净的方法来做到这一点吗?
numpy.vectorize
takes a function f:a->b and turns it into g:a[]->b[].
This works fine when a
and b
are scalars, but I can't think of a reason why it wouldn't work with b as an ndarray
or list, i.e. f:a->b[] and g:a[]->b[][]
For example:
import numpy as np
def f(x):
return x * np.array([1,1,1,1,1], dtype=np.float32)
g = np.vectorize(f, otypes=[np.ndarray])
a = np.arange(4)
print(g(a))
This yields:
array([[ 0. 0. 0. 0. 0.],
[ 1. 1. 1. 1. 1.],
[ 2. 2. 2. 2. 2.],
[ 3. 3. 3. 3. 3.]], dtype=object)
Ok, so that gives the right values, but the wrong dtype. And even worse:
g(a).shape
yields:
(4,)
So this array is pretty much useless. I know I can convert it doing:
np.array(map(list, a), dtype=np.float32)
to give me what I want:
array([[ 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1.],
[ 2., 2., 2., 2., 2.],
[ 3., 3., 3., 3., 3.]], dtype=float32)
but that is neither efficient nor pythonic. Can any of you guys find a cleaner way to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
np.vectorize
只是一个方便的函数。它实际上并没有使代码运行得更快。如果使用np.vectorize
不方便,只需编写您自己的函数即可按照您的意愿工作。np.vectorize 的目的是将不支持 numpy 的函数(例如,将浮点数作为输入并返回浮点数作为输出)转换为可以操作(并返回)numpy 数组的函数。
您的函数
f
已经支持 numpy - 它在定义中使用 numpy 数组并返回一个 numpy 数组。因此np.vectorize
不太适合您的用例。因此,解决方案就是推出您自己的函数
f
,使其按照您想要的方式工作。np.vectorize
is just a convenience function. It doesn't actually make code run any faster. If it isn't convenient to usenp.vectorize
, simply write your own function that works as you wish.The purpose of
np.vectorize
is to transform functions which are not numpy-aware (e.g. take floats as input and return floats as output) into functions that can operate on (and return) numpy arrays.Your function
f
is already numpy-aware -- it uses a numpy array in its definition and returns a numpy array. Sonp.vectorize
is not a good fit for your use case.The solution therefore is just to roll your own function
f
that works the way you desire.1.12.0 中的新参数
signature
完全符合您的要求。然后
g(np.arange(4)).shape
将给出(4L, 5L)
。这里指定了
f
的签名。(n)
是返回值的形状,()
是标量参数的形状。并且参数也可以是数组。对于更复杂的签名,请参阅 广义通用函数API。A new parameter
signature
in 1.12.0 does exactly what you what.Then
g(np.arange(4)).shape
will give(4L, 5L)
.Here the signature of
f
is specified. The(n)
is the shape of the return value, and the()
is the shape of the parameter which is scalar. And the parameters can be arrays too. For more complex signatures, see Generalized Universal Function API.这应该可以解决问题,并且无论您的输入大小如何,它都会起作用。 “地图”仅适用于一维输入。使用“.tolist()”并创建一个新的 ndarray 可以更完整、更好地解决问题(我相信)。希望这有帮助。
This should fix the problem and it will work regardless of what size your input is. "map" only works for one dimentional inputs. Using ".tolist()" and creating a new ndarray solves the problem more completely and nicely(I believe). Hope this helps.
您想要对函数进行矢量化
假设您想要获得单个
np.float32
数组作为结果,则必须将其指定为otype
。然而,在您的问题中,您指定了otypes=[np.ndarray]
,这意味着您希望每个元素都是np.ndarray
。因此,您正确地得到了dtype=object
的结果。正确的调用是
对于这样一个简单的函数,最好利用 numpy 的 ufunctions;
np.vectorize
只是循环遍历它。因此,在您的情况下,只需重写您的函数即可,这样速度更快,产生的错误也更少(但请注意,如果您传递复数或四边形,结果
dtype
将取决于x
精度数,结果也是如此)。You want to vectorize the function
Assuming that you want to get single
np.float32
arrays as result, you have to specify this asotype
. In your question you specified howeverotypes=[np.ndarray]
which means you want every element to be annp.ndarray
. Thus, you correctly get a result ofdtype=object
.The correct call would be
For such a simple function it is however better to leverage
numpy
's ufunctions;np.vectorize
just loops over it. So in your case just rewrite your function asThis is faster and produces less obscure errors (note however, that the results
dtype
will depend onx
if you pass a complex or quad precision number, so will be the result).我写了一个函数,看起来很适合你的需要。
让我们尝试
输出
为了方便起见,您也可以用 lambda 或partial 包装它
请注意
vectorize
的文档字符串说因此,我们期望此处的
amap
具有与vectorize
类似的性能。我没有检查,欢迎任何性能测试。如果性能确实很重要,您应该考虑其他方法,例如使用
reshape
和broadcast
进行直接数组计算,以避免纯 python 中的循环(都是vectorize
> 和amap
是后一种情况)。I've written a function, it seems fits to your need.
Let try
Outputs
You may also wrap it with lambda or partial for convenience
Note the docstring of
vectorize
saysThus we would expect the
amap
here have similar performance asvectorize
. I didn't check it, Any performance test are welcome.If the performance is really important, you should consider something else, e.g. direct array calculation with
reshape
andbroadcast
to avoid loop in pure python (bothvectorize
andamap
are the later case).解决这个问题的最佳方法是使用二维 NumPy 数组(在本例中为列数组)作为原始函数的输入,然后该函数将生成一个二维输出,其中结果我相信你已经预料到了。
代码如下:
这是一种更简单、更不易出错的完成操作的方法。此方法不是尝试使用 numpy.vectorize 转换函数,而是依赖于 NumPy 广播数组的天然能力。诀窍是确保数组之间至少一个维度的长度相等。
The best way to solve this would be to use a 2-D NumPy array (in this case a column array) as an input to the original function, which will then generate a 2-D output with the results I believe you were expecting.
Here is what it might look like in code:
This is a much simpler and less error prone way to complete the operation. Rather than trying to transform the function with numpy.vectorize, this method relies on NumPy's natural ability to broadcast arrays. The trick is to make sure that at least one dimension has an equal length between the arrays.