作为某些数据争吵的结果,我得到了一系列的浮标,其中一些是NAN。我知道其有效值是某些对象的整数ID。因此,我想让它们作为整数。但是,NAN不能被施加到 int
上,因此我尝试仅施放整数有限值。另外,如果出于某种奇怪的原因,我会得到一个具有非零小数的价值,我想将其视为无效,并将其抛向NAN。
基准逻辑是作为列表理解实现的,并且可以按预期实现:
>>> [int(val) if float(val).is_integer() else np.nan for val in np.array([1, 2, 3, 4])]
[1, 2, 3, 4]
>>>
>>> [int(val) if float(val).is_integer() else np.nan for val in np.array([1, 2, 3, 3.55])]
[1, 2, 3, nan]
>>>
>>> [int(val) if float(val).is_integer() else np.nan for val in np.array([1, np.nan, 3, 3.55])]
[1, nan, 3, nan]
只要通过列表理解完成计算,将铸造的值的逻辑封装在lambda中也可以工作:
>>> func = lambda val: int(val) if float(val).is_integer() else np.nan
>>>
>>> [func(i) for i in np.array([1, 2, 3, 4])]
[1, 2, 3, 4]
>>>
>>> [func(i) for i in np.array([1, 2, 3, 3.55])]
[1, 2, 3, nan]
>>>
>>> [func(i) for i in np.array([1, np.nan, 3, 3.55])]
[1, nan, 3, nan]
尽管如此,似乎不能为某些逻辑化某些逻辑。原因:
>>> func = lambda val: int(val) if float(val).is_integer() else np.nan
>>>
>>> np.vectorize(func)(np.array([1, 2, 3, 4]))
array([1, 2, 3, 4])
>>>
>>> np.vectorize(func)(np.array([1, 2, 3, 3.55]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2328, in __call__
return self._vectorize_call(func=func, args=vargs)
File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2414, in _vectorize_call
res = asanyarray(outputs, dtype=otypes[0])
ValueError: cannot convert float NaN to integer
>>>
>>> np.vectorize(func)(np.array([1, 2, 3, np.nan]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2328, in __call__
return self._vectorize_call(func=func, args=vargs)
File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2414, in _vectorize_call
res = asanyarray(outputs, dtype=otypes[0])
ValueError: cannot convert float NaN to integer
我试图找到一个解释,说明为什么可能是这种情况,但找不到任何东西。我的第一个想法是,lambda的内容是从左到右在 val
上执行的,因此, int(val)
甚至在检查 flaot(val).is_integer()。这与也崩溃的事实相匹配
>>> np.vectorize(lambda val: int(val))(np.array([1, 2, 3, np.nan]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2328, in __call__
return self._vectorize_call(func=func, args=vargs)
File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2411, in _vectorize_call
outputs = ufunc(*inputs)
File "<stdin>", line 1, in <lambda>
ValueError: cannot convert float NaN to integer
。但是,这不会解释为什么 int(3.55)
会崩溃。实际上,仅 int(val)
的行为是按预期的:
>>> np.vectorize(lambda val: int(val))(np.array([1, 2, 3, 3.55]))
array([1, 2, 3, 3])
此外,将逻辑定义为 def
函数(其中大概是 float(val).is_integer ()
将首先评估),也会导致一个例外:
>>> def func2(val):
... if float(val).is_integer():
... return int(val)
... return np.nan
...
>>>
>>> np.vectorize(func2)(np.array([1, 2, 3, 3.55]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2328, in __call__
return self._vectorize_call(func=func, args=vargs)
File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2414, in _vectorize_call
res = asanyarray(outputs, dtype=otypes[0])
ValueError: cannot convert float NaN to integer
>>>
>>> np.vectorize(func2)(np.array([1, 2, 3, np.nan]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2328, in __call__
return self._vectorize_call(func=func, args=vargs)
File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2414, in _vectorize_call
res = asanyarray(outputs, dtype=otypes[0])
ValueError: cannot convert float NaN to integer
有人可以向我解释为什么此 int
与后备的铸造无法矢量化?
提前致谢。
As an outcome of some data wrangling, I get an array of floats, some of them are NaNs. I know the valid values of this are integer IDs of some objects. Therefore, I want to have them as integers. Nevertheless, NaNs cannot be cast to int
, so I attempt to cast to integers the finite values only. Also, if for some strange reason upstream I would get a value that has non-zero decimals, I want to regard it as invalid and also cast it to NaN.
The benchmark logic is implemented as list comprehension, and it works as expected:
>>> [int(val) if float(val).is_integer() else np.nan for val in np.array([1, 2, 3, 4])]
[1, 2, 3, 4]
>>>
>>> [int(val) if float(val).is_integer() else np.nan for val in np.array([1, 2, 3, 3.55])]
[1, 2, 3, nan]
>>>
>>> [int(val) if float(val).is_integer() else np.nan for val in np.array([1, np.nan, 3, 3.55])]
[1, nan, 3, nan]
Encapsulating the logic of the value casting into a lambda also works as long as the calculation is done via list comprehension:
>>> func = lambda val: int(val) if float(val).is_integer() else np.nan
>>>
>>> [func(i) for i in np.array([1, 2, 3, 4])]
[1, 2, 3, 4]
>>>
>>> [func(i) for i in np.array([1, 2, 3, 3.55])]
[1, 2, 3, nan]
>>>
>>> [func(i) for i in np.array([1, np.nan, 3, 3.55])]
[1, nan, 3, nan]
Nevertheless, it appears that this logic cannot be vectorized for some reason:
>>> func = lambda val: int(val) if float(val).is_integer() else np.nan
>>>
>>> np.vectorize(func)(np.array([1, 2, 3, 4]))
array([1, 2, 3, 4])
>>>
>>> np.vectorize(func)(np.array([1, 2, 3, 3.55]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2328, in __call__
return self._vectorize_call(func=func, args=vargs)
File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2414, in _vectorize_call
res = asanyarray(outputs, dtype=otypes[0])
ValueError: cannot convert float NaN to integer
>>>
>>> np.vectorize(func)(np.array([1, 2, 3, np.nan]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2328, in __call__
return self._vectorize_call(func=func, args=vargs)
File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2414, in _vectorize_call
res = asanyarray(outputs, dtype=otypes[0])
ValueError: cannot convert float NaN to integer
I have tried to find an explanation as to why this might be the case, but have been unable to find anything. My first thought was that the content of the lambda is executed on the val
from left to right and, thus, int(val)
will be attempted before even checking if flaot(val).is_integer()
. This matches with the fact that
>>> np.vectorize(lambda val: int(val))(np.array([1, 2, 3, np.nan]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2328, in __call__
return self._vectorize_call(func=func, args=vargs)
File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2411, in _vectorize_call
outputs = ufunc(*inputs)
File "<stdin>", line 1, in <lambda>
ValueError: cannot convert float NaN to integer
also crashes. However, this would not explain why int(3.55)
would crash. Indeed, the behavior for just int(val)
behaves as expected:
>>> np.vectorize(lambda val: int(val))(np.array([1, 2, 3, 3.55]))
array([1, 2, 3, 3])
Moreover, defining the logic as a def
function (where presumably the float(val).is_integer()
would be evaluated first), also leads to a exception:
>>> def func2(val):
... if float(val).is_integer():
... return int(val)
... return np.nan
...
>>>
>>> np.vectorize(func2)(np.array([1, 2, 3, 3.55]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2328, in __call__
return self._vectorize_call(func=func, args=vargs)
File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2414, in _vectorize_call
res = asanyarray(outputs, dtype=otypes[0])
ValueError: cannot convert float NaN to integer
>>>
>>> np.vectorize(func2)(np.array([1, 2, 3, np.nan]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2328, in __call__
return self._vectorize_call(func=func, args=vargs)
File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2414, in _vectorize_call
res = asanyarray(outputs, dtype=otypes[0])
ValueError: cannot convert float NaN to integer
Could someone please explain to me why this int
casting with fallback cannot be vectorized?
Thanks in advance.
发布评论
评论(1)
看来我是今天早上的咖啡,更多地得到了答案。我在此处的文章中找到了一个答案:
使用NP中的错误.NAN是矢量化函数
上面的问题在于符合vectorize lambda函数提供的不一致的返回类型。
numpy.Dectorization
提供了一个参数otypes
明确指示矢量化函数应与之兼容的输出类型。如果未使用,则通过第一个输入来自动确定otypes
。在上面的示例中,这是返回int
,因此矢量化函数仅用于处理int
返回。对于以后的输入,func
逻辑导致float
返回类型,返回类型不兼容会产生异常。因此,可以通过例如:
It seems I was today in the morning after some coffee more inspired to get to an answer. I have found an answer in this post here on SO:
Error in using np.NaN is vectorize functions
The problem above lies in the inconsistent return type provided by the to-vectorize lambda function.
numpy.vectorization
provides a parameterotypes
for explicitly indicating the output types the vectorized function should be compatible with. If not used, theotypes
are determined automatically by runningfunc
with the first input. In my examples above, this was returning anint
, and so the vectorized function was only set for dealing withint
returns. When, for later inputs, thefunc
logic lead to afloat
return type, the return type incompatibility produced the exception.Therefore, the problem can be solved by e.g.: