python型铸造不能被数字化

发布于 2025-02-13 11:53:23 字数 5192 浏览 2 评论 0 原文

作为某些数据争吵的结果,我得到了一系列的浮标,其中一些是NAN。我知道其有效值是某些对象的整数ID。因此,我想让它们作为整数。但是,NAN不能被施加到 int 上,因此我尝试仅施放整数有限值。另外,如果出于某种奇怪的原因,我会得到一个具有非零小数的价值,我想将其视为无效,并将其抛向NAN。

基准逻辑是作为列表理解实现的,并且可以按预期实现:

>>> [int(val) if float(val).is_integer() else np.nan for val in np.array([1, 2, 3, 4])]
[1, 2, 3, 4]
>>> 
>>> [int(val) if float(val).is_integer() else np.nan for val in np.array([1, 2, 3, 3.55])]
[1, 2, 3, nan]
>>> 
>>> [int(val) if float(val).is_integer() else np.nan for val in np.array([1, np.nan, 3, 3.55])]
[1, nan, 3, nan]


只要通过列表理解完成计算,将铸造的值的逻辑封装在lambda中也可以工作:

>>> func = lambda val: int(val) if float(val).is_integer() else np.nan
>>> 
>>> [func(i) for i in np.array([1, 2, 3, 4])]
[1, 2, 3, 4]
>>> 
>>> [func(i) for i in np.array([1, 2, 3, 3.55])]
[1, 2, 3, nan]
>>> 
>>> [func(i) for i in np.array([1, np.nan, 3, 3.55])]
[1, nan, 3, nan]

尽管如此,似乎不能为某些逻辑化某些逻辑。原因:

>>> func = lambda val: int(val) if float(val).is_integer() else np.nan
>>> 
>>> np.vectorize(func)(np.array([1, 2, 3, 4]))
array([1, 2, 3, 4])
>>> 
>>> np.vectorize(func)(np.array([1, 2, 3, 3.55]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2328, in __call__
    return self._vectorize_call(func=func, args=vargs)
  File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2414, in _vectorize_call
    res = asanyarray(outputs, dtype=otypes[0])
ValueError: cannot convert float NaN to integer
>>> 
>>> np.vectorize(func)(np.array([1, 2, 3, np.nan]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2328, in __call__
    return self._vectorize_call(func=func, args=vargs)
  File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2414, in _vectorize_call
    res = asanyarray(outputs, dtype=otypes[0])
ValueError: cannot convert float NaN to integer

我试图找到一个解释,说明为什么可能是这种情况,但找不到任何东西。我的第一个想法是,lambda的内容是从左到右在 val 上执行的,因此, int(val)甚至在检查 flaot(val).is_integer()。这与也崩溃的事实相匹配

>>> np.vectorize(lambda val: int(val))(np.array([1, 2, 3, np.nan]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2328, in __call__
    return self._vectorize_call(func=func, args=vargs)
  File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2411, in _vectorize_call
    outputs = ufunc(*inputs)
  File "<stdin>", line 1, in <lambda>
ValueError: cannot convert float NaN to integer

。但是,这不会解释为什么 int(3.55)会崩溃。实际上,仅 int(val)的行为是按预期的:

>>> np.vectorize(lambda val: int(val))(np.array([1, 2, 3, 3.55]))
array([1, 2, 3, 3])

此外,将逻辑定义为 def 函数(其中大概是 float(val).is_integer ()将首先评估),也会导致一个例外:

>>> def func2(val):
...     if float(val).is_integer():
...         return int(val)
...     return np.nan
... 
>>> 
>>> np.vectorize(func2)(np.array([1, 2, 3, 3.55]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2328, in __call__
    return self._vectorize_call(func=func, args=vargs)
  File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2414, in _vectorize_call
    res = asanyarray(outputs, dtype=otypes[0])
ValueError: cannot convert float NaN to integer
>>> 
>>> np.vectorize(func2)(np.array([1, 2, 3, np.nan]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2328, in __call__
    return self._vectorize_call(func=func, args=vargs)
  File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2414, in _vectorize_call
    res = asanyarray(outputs, dtype=otypes[0])
ValueError: cannot convert float NaN to integer

有人可以向我解释为什么此 int 与后备的铸造无法矢量化?

提前致谢。

As an outcome of some data wrangling, I get an array of floats, some of them are NaNs. I know the valid values of this are integer IDs of some objects. Therefore, I want to have them as integers. Nevertheless, NaNs cannot be cast to int, so I attempt to cast to integers the finite values only. Also, if for some strange reason upstream I would get a value that has non-zero decimals, I want to regard it as invalid and also cast it to NaN.

The benchmark logic is implemented as list comprehension, and it works as expected:

>>> [int(val) if float(val).is_integer() else np.nan for val in np.array([1, 2, 3, 4])]
[1, 2, 3, 4]
>>> 
>>> [int(val) if float(val).is_integer() else np.nan for val in np.array([1, 2, 3, 3.55])]
[1, 2, 3, nan]
>>> 
>>> [int(val) if float(val).is_integer() else np.nan for val in np.array([1, np.nan, 3, 3.55])]
[1, nan, 3, nan]


Encapsulating the logic of the value casting into a lambda also works as long as the calculation is done via list comprehension:

>>> func = lambda val: int(val) if float(val).is_integer() else np.nan
>>> 
>>> [func(i) for i in np.array([1, 2, 3, 4])]
[1, 2, 3, 4]
>>> 
>>> [func(i) for i in np.array([1, 2, 3, 3.55])]
[1, 2, 3, nan]
>>> 
>>> [func(i) for i in np.array([1, np.nan, 3, 3.55])]
[1, nan, 3, nan]

Nevertheless, it appears that this logic cannot be vectorized for some reason:

>>> func = lambda val: int(val) if float(val).is_integer() else np.nan
>>> 
>>> np.vectorize(func)(np.array([1, 2, 3, 4]))
array([1, 2, 3, 4])
>>> 
>>> np.vectorize(func)(np.array([1, 2, 3, 3.55]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2328, in __call__
    return self._vectorize_call(func=func, args=vargs)
  File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2414, in _vectorize_call
    res = asanyarray(outputs, dtype=otypes[0])
ValueError: cannot convert float NaN to integer
>>> 
>>> np.vectorize(func)(np.array([1, 2, 3, np.nan]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2328, in __call__
    return self._vectorize_call(func=func, args=vargs)
  File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2414, in _vectorize_call
    res = asanyarray(outputs, dtype=otypes[0])
ValueError: cannot convert float NaN to integer

I have tried to find an explanation as to why this might be the case, but have been unable to find anything. My first thought was that the content of the lambda is executed on the val from left to right and, thus, int(val) will be attempted before even checking if flaot(val).is_integer(). This matches with the fact that

>>> np.vectorize(lambda val: int(val))(np.array([1, 2, 3, np.nan]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2328, in __call__
    return self._vectorize_call(func=func, args=vargs)
  File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2411, in _vectorize_call
    outputs = ufunc(*inputs)
  File "<stdin>", line 1, in <lambda>
ValueError: cannot convert float NaN to integer

also crashes. However, this would not explain why int(3.55) would crash. Indeed, the behavior for just int(val) behaves as expected:

>>> np.vectorize(lambda val: int(val))(np.array([1, 2, 3, 3.55]))
array([1, 2, 3, 3])

Moreover, defining the logic as a def function (where presumably the float(val).is_integer() would be evaluated first), also leads to a exception:

>>> def func2(val):
...     if float(val).is_integer():
...         return int(val)
...     return np.nan
... 
>>> 
>>> np.vectorize(func2)(np.array([1, 2, 3, 3.55]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2328, in __call__
    return self._vectorize_call(func=func, args=vargs)
  File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2414, in _vectorize_call
    res = asanyarray(outputs, dtype=otypes[0])
ValueError: cannot convert float NaN to integer
>>> 
>>> np.vectorize(func2)(np.array([1, 2, 3, np.nan]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2328, in __call__
    return self._vectorize_call(func=func, args=vargs)
  File "/home/dmg/.local/share/virtualenvs/ganot-SqvSo3bL/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2414, in _vectorize_call
    res = asanyarray(outputs, dtype=otypes[0])
ValueError: cannot convert float NaN to integer

Could someone please explain to me why this int casting with fallback cannot be vectorized?

Thanks in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

月依秋水 2025-02-20 11:53:23

看来我是今天早上的咖啡,更多地得到了答案。我在此处的文章中找到了一个答案:

使用NP中的错误.NAN是矢量化函数

上面的问题在于符合vectorize lambda函数提供的不一致的返回类型。

numpy.Dectorization 提供了一个参数 otypes 明确指示矢量化函数应与之兼容的输出类型。如果未使用,则通过第一个输入来自动确定 otypes 。在上面的示例中,这是返回 int ,因此矢量化函数仅用于处理 int 返回。对于以后的输入, func 逻辑导致 float 返回类型,返回类型不兼容会产生异常。

因此,可以通过例如:

>>> np.vectorize(func, otypes=[object])(np.array([1, 2, 3, 3.55]))
array([1, 2, 3, nan], dtype=object)
>>> 
>>> np.vectorize(func, otypes=[object])(np.array([1, 2, 3, np.nan]))
array([1, 2, 3, nan], dtype=object)

It seems I was today in the morning after some coffee more inspired to get to an answer. I have found an answer in this post here on SO:

Error in using np.NaN is vectorize functions

The problem above lies in the inconsistent return type provided by the to-vectorize lambda function.

numpy.vectorization provides a parameter otypes for explicitly indicating the output types the vectorized function should be compatible with. If not used, the otypes are determined automatically by running func with the first input. In my examples above, this was returning an int, and so the vectorized function was only set for dealing with int returns. When, for later inputs, the func logic lead to a float return type, the return type incompatibility produced the exception.

Therefore, the problem can be solved by e.g.:

>>> np.vectorize(func, otypes=[object])(np.array([1, 2, 3, 3.55]))
array([1, 2, 3, nan], dtype=object)
>>> 
>>> np.vectorize(func, otypes=[object])(np.array([1, 2, 3, np.nan]))
array([1, 2, 3, nan], dtype=object)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文