如何检查 NaN 值

发布于 2024-07-24 07:04:34 字数 57 浏览 7 评论 0原文

float('nan') 表示 NaN(不是数字)。 但我该如何检查呢?

float('nan') represents NaN (not a number). But how do I check for it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(20

烟雨扶苏 2024-07-31 07:04:34

使用 math.isnan

>>> import math
>>> x = float('nan')
>>> math.isnan(x)
True

Use math.isnan:

>>> import math
>>> x = float('nan')
>>> math.isnan(x)
True
世界和平 2024-07-31 07:04:34

测试 NaN 的常用方法是查看它是否等于其自身:

def isNaN(num):
    return num != num

The usual way to test for a NaN is to see if it's equal to itself:

def isNaN(num):
    return num != num
骑趴 2024-07-31 07:04:34

numpy.isnan(number) 告诉您它是否为 NaN

numpy.isnan(number) tells you if it's NaN or not.

记忆消瘦 2024-07-31 07:04:34

您可以通过以下三种方法来测试变量是否为“NaN”。

import pandas as pd
import numpy as np
import math

# For single variable all three libraries return single boolean
x1 = float("nan")

print(f"It's pd.isna: {pd.isna(x1)}")
print(f"It's np.isnan: {np.isnan(x1)}}")
print(f"It's math.isnan: {math.isnan(x1)}}")

输出:

It's pd.isna: True
It's np.isnan: True
It's math.isnan: True

Here are three ways where you can test a variable is "NaN" or not.

import pandas as pd
import numpy as np
import math

# For single variable all three libraries return single boolean
x1 = float("nan")

print(f"It's pd.isna: {pd.isna(x1)}")
print(f"It's np.isnan: {np.isnan(x1)}}")
print(f"It's math.isnan: {math.isnan(x1)}}")

Output:

It's pd.isna: True
It's np.isnan: True
It's math.isnan: True
╰つ倒转 2024-07-31 07:04:34

编者注:下面的计时是有缺陷的,例如,它们没有考虑名称查找时间。 请参阅评论。


似乎检查它是否等于自身 (x != x) 是最快的。

import pandas as pd 
import numpy as np 
import math 

x = float('nan')

%timeit x != x
44.8 ns ± 0.152 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit math.isnan(x)
94.2 ns ± 0.955 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit pd.isna(x)
281 ns ± 5.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit np.isnan(x)
1.38 µs ± 15.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Editor's note: The below timings are flawed, for example, they have not factored out name lookup time. See the comments.


It seems that checking if it's equal to itself (x != x) is the fastest.

import pandas as pd 
import numpy as np 
import math 

x = float('nan')

%timeit x != x
44.8 ns ± 0.152 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit math.isnan(x)
94.2 ns ± 0.955 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit pd.isna(x)
281 ns ± 5.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit np.isnan(x)
1.38 µs ± 15.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
此生挚爱伱 2024-07-31 07:04:34

答案

  • 这是一个与IEEE 754 标准相关的 NaN 实现的
    • 即:python 的 NaN:float('nan')numpy.nan...
  • 任何其他对象:字符串或其他对象(如果遇到,不会引发异常) )

按照标准实现的 NaN 是与其自身进行不等式比较应返回 True 的唯一值:

def is_nan(x):
    return (x != x)

以及一些示例:

import numpy as np
values = [float('nan'), np.nan, 55, "string", lambda x : x]
for value in values:
    print(f"{repr(value):<8} : {is_nan(value)}")

输出:

nan      : True
nan      : True
55       : False
'string' : False
<function <lambda> at 0x000000000927BF28> : False

here is an answer working with:

  • NaN implementations respecting IEEE 754 standard
    • ie: python's NaN: float('nan'), numpy.nan...
  • any other objects: string or whatever (does not raise exceptions if encountered)

A NaN implemented following the standard, is the only value for which the inequality comparison with itself should return True:

def is_nan(x):
    return (x != x)

And some examples:

import numpy as np
values = [float('nan'), np.nan, 55, "string", lambda x : x]
for value in values:
    print(f"{repr(value):<8} : {is_nan(value)}")

Output:

nan      : True
nan      : True
55       : False
'string' : False
<function <lambda> at 0x000000000927BF28> : False
旧时浪漫 2024-07-31 07:04:34

实际上我刚刚遇到了这个,但对我来说它是检查 nan、-inf 或 inf。 我刚刚使用了

if float('-inf') < float(num) < float('inf'):

This is true for numeric, false for nan 和 inf,并且会引发字符串或其他类型之类的异常(这可能是一件好事)。 此外,这不需要导入任何库,如 math 或 numpy(numpy 太大了,它使任何编译的应用程序的大小增加了一倍)。

I actually just ran into this, but for me it was checking for nan, -inf, or inf. I just used

if float('-inf') < float(num) < float('inf'):

This is true for numbers, false for nan and both inf, and will raise an exception for things like strings or other types (which is probably a good thing). Also this does not require importing any libraries like math or numpy (numpy is so damn big it doubles the size of any compiled application).

习惯成性 2024-07-31 07:04:34

math.isnan()

或将数字与其本身进行比较。 NaN 总是 != NaN,否则(例如,如果它一个数字)比较应该成功。

math.isnan()

or compare the number to itself. NaN is always != NaN, otherwise (e.g. if it is a number) the comparison should succeed.

烈酒灼喉 2024-07-31 07:04:34

好吧,我输入了这篇文章,因为我对该函数有一些问题:

math.isnan()

运行此代码时出现问题:

a = "hello"
math.isnan(a)

它引发异常。
我的解决方案是进行另一次检查:

def is_nan(x):
    return isinstance(x, float) and math.isnan(x)

Well I entered this post, because i've had some issues with the function:

math.isnan()

There are problem when you run this code:

a = "hello"
math.isnan(a)

It raises exception.
My solution for that is to make another check:

def is_nan(x):
    return isinstance(x, float) and math.isnan(x)
聚集的泪 2024-07-31 07:04:34

如果你陷入<2.6,你没有 numpy,并且没有 IEEE 754 支持,还有另一种方法:

def isNaN(x):
    return str(x) == str(1e400*0)

Another method if you're stuck on <2.6, you don't have numpy, and you don't have IEEE 754 support:

def isNaN(x):
    return str(x) == str(1e400*0)
撩发小公举 2024-07-31 07:04:34

与 python < 2.6 我最终得到了

def isNaN(x):
    return str(float(x)).lower() == 'nan'

This Works for me with python 2.5.1 on a Solaris 5.9 box and with python 2.6.5 on Ubuntu 10

With python < 2.6 I ended up with

def isNaN(x):
    return str(float(x)).lower() == 'nan'

This works for me with python 2.5.1 on a Solaris 5.9 box and with python 2.6.5 on Ubuntu 10

假装爱人 2024-07-31 07:04:34

比较 pd.isnamath.isnannp.isnan 以及它们处理不同类型对象的灵活性。

下表显示了是否可以使用给定方法检查对象的类型:


+------------+-----+---------+------+--------+------+
|   Method   | NaN | numeric | None | string | list |
+------------+-----+---------+------+--------+------+
| pd.isna    | yes | yes     | yes  | yes    | yes  |
| math.isnan | yes | yes     | no   | no     | no   |
| np.isnan   | yes | yes     | no   | no     | yes  | <-- # will error on mixed type list
+------------+-----+---------+------+--------+------+

pd.isna

检查不同类型缺失值的最灵活的方法。


所有答案都没有涵盖 pd.isna 的灵活性。 虽然 math.isnannp.isnan 将为 NaN 值返回 True,但您无法检查不同类型的None 等对象或字符串。 这两种方法都会返回错误,因此检查混合类型的列表会很麻烦。 而 pd.isna 很灵活,将为不同类型的类型返回正确的布尔值:

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: missing_values = [3, None, np.NaN, pd.NA, pd.NaT, '10']

In [4]: pd.isna(missing_values)
Out[4]: array([False,  True,  True,  True,  True, False])

Comparison pd.isna, math.isnan and np.isnan and their flexibility dealing with different type of objects.

The table below shows if the type of object can be checked with the given method:


+------------+-----+---------+------+--------+------+
|   Method   | NaN | numeric | None | string | list |
+------------+-----+---------+------+--------+------+
| pd.isna    | yes | yes     | yes  | yes    | yes  |
| math.isnan | yes | yes     | no   | no     | no   |
| np.isnan   | yes | yes     | no   | no     | yes  | <-- # will error on mixed type list
+------------+-----+---------+------+--------+------+

pd.isna

The most flexible method to check for different types of missing values.


None of the answers cover the flexibility of pd.isna. While math.isnan and np.isnan will return True for NaN values, you cannot check for different type of objects like None or strings. Both methods will return an error, so checking a list with mixed types will be cumbersom. This while pd.isna is flexible and will return the correct boolean for different kind of types:

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: missing_values = [3, None, np.NaN, pd.NA, pd.NaT, '10']

In [4]: pd.isna(missing_values)
Out[4]: array([False,  True,  True,  True,  True, False])
飘逸的'云 2024-07-31 07:04:34

我正在从 Web 服务接收数据,该服务将 NaN 作为字符串 'Nan' 发送。 但我的数据中也可能存在其他类型的字符串,因此简单的 float(value) 可能会引发异常。 我使用了已接受答案的以下变体:

def isnan(value):
  try:
      import math
      return math.isnan(float(value))
  except:
      return False

要求:

isnan('hello') == False
isnan('NaN') == True
isnan(100) == False
isnan(float('nan')) = True

I am receiving the data from a web-service that sends NaN as a string 'Nan'. But there could be other sorts of string in my data as well, so a simple float(value) could throw an exception. I used the following variant of the accepted answer:

def isnan(value):
  try:
      import math
      return math.isnan(float(value))
  except:
      return False

Requirement:

isnan('hello') == False
isnan('NaN') == True
isnan(100) == False
isnan(float('nan')) = True
吾家有女初长成 2024-07-31 07:04:34

判断变量是 NaN 还是 None 的所有方法:

None 类型

In [1]: from numpy import math

In [2]: a = None
In [3]: not a
Out[3]: True

In [4]: len(a or ()) == 0
Out[4]: True

In [5]: a == None
Out[5]: True

In [6]: a is None
Out[6]: True

In [7]: a != a
Out[7]: False

In [9]: math.isnan(a)
Traceback (most recent call last):
  File "<ipython-input-9-6d4d8c26d370>", line 1, in <module>
    math.isnan(a)
TypeError: a float is required

In [10]: len(a) == 0
Traceback (most recent call last):
  File "<ipython-input-10-65b72372873e>", line 1, in <module>
    len(a) == 0
TypeError: object of type 'NoneType' has no len()

NaN 类型

In [11]: b = float('nan')
In [12]: b
Out[12]: nan

In [13]: not b
Out[13]: False

In [14]: b != b
Out[14]: True

In [15]: math.isnan(b)
Out[15]: True

All the methods to tell if the variable is NaN or None:

None type

In [1]: from numpy import math

In [2]: a = None
In [3]: not a
Out[3]: True

In [4]: len(a or ()) == 0
Out[4]: True

In [5]: a == None
Out[5]: True

In [6]: a is None
Out[6]: True

In [7]: a != a
Out[7]: False

In [9]: math.isnan(a)
Traceback (most recent call last):
  File "<ipython-input-9-6d4d8c26d370>", line 1, in <module>
    math.isnan(a)
TypeError: a float is required

In [10]: len(a) == 0
Traceback (most recent call last):
  File "<ipython-input-10-65b72372873e>", line 1, in <module>
    len(a) == 0
TypeError: object of type 'NoneType' has no len()

NaN type

In [11]: b = float('nan')
In [12]: b
Out[12]: nan

In [13]: not b
Out[13]: False

In [14]: b != b
Out[14]: True

In [15]: math.isnan(b)
Out[15]: True
眼眸 2024-07-31 07:04:34

在 Python 3.6 中,检查字符串值 x math.isnan(x) 和 np.isnan(x) 会引发错误。
因此,如果我事先不知道给定值是一个数字,我无法检查给定值是否为 NaN。
以下似乎可以解决这个问题

if str(x)=='nan' and type(x)!='str':
    print ('NaN')
else:
    print ('non NaN')

In Python 3.6 checking on a string value x math.isnan(x) and np.isnan(x) raises an error.
So I can't check if the given value is NaN or not if I don't know beforehand it's a number.
The following seems to solve this issue

if str(x)=='nan' and type(x)!='str':
    print ('NaN')
else:
    print ('non NaN')
沉睡月亮 2024-07-31 07:04:34

如何从混合数据类型列表中删除 NaN(浮点)项

如果可迭代中有混合类型,这里有一个不使用 numpy 的解决方案:

from math import isnan

Z = ['a','b', float('NaN'), 'd', float('1.1024')]

[x for x in Z if not (
                      type(x) == float # let's drop all float values…
                      and isnan(x) # … but only if they are nan
                      )]
['a', 'b', 'd', 1.1024]

短路评估意味着isnan 不会在非“float”类型的值上调用,因为 False 和 (…) 会快速计算为 False,而无需评估右侧。

How to remove NaN (float) item(s) from a list of mixed data types

If you have mixed types in an iterable, here is a solution that does not use numpy:

from math import isnan

Z = ['a','b', float('NaN'), 'd', float('1.1024')]

[x for x in Z if not (
                      type(x) == float # let's drop all float values…
                      and isnan(x) # … but only if they are nan
                      )]
['a', 'b', 'd', 1.1024]

Short-circuit evaluation means that isnan will not be called on values that are not of type 'float', as False and (…) quickly evaluates to False without having to evaluate the right-hand side.

月棠 2024-07-31 07:04:34

对于 float 类型的 nan

>>> import pandas as pd
>>> value = float(nan)
>>> type(value)
>>> <class 'float'>
>>> pd.isnull(value)
True
>>>
>>> value = 'nan'
>>> type(value)
>>> <class 'str'>
>>> pd.isnull(value)
False

For nan of type float

>>> import pandas as pd
>>> value = float(nan)
>>> type(value)
>>> <class 'float'>
>>> pd.isnull(value)
True
>>>
>>> value = 'nan'
>>> type(value)
>>> <class 'str'>
>>> pd.isnull(value)
False
顾忌 2024-07-31 07:04:34

如果您想检查非 NaN 的值,则对用于标记 NaN 的任何内容取反; pandas 有自己的专用函数来标记非 NaN 值。

lst = [1, 2, float('nan')]

m1 = [e == e for e in lst]              # [True, True, False]

m2 = [not math.isnan(e) for e in lst]   # [True, True, False]

m3 = ~np.isnan(lst)                     # array([ True,  True, False])

m4 = pd.notna(lst)                      # array([ True,  True, False])

如果您想过滤非 NaN 的值,这尤其有用。 对于 ndarray/Series 对象, == 是矢量化的,因此也可以使用它。

s = pd.Series(lst)
arr = np.array(lst)

x = s[s.notna()]
y = s[s==s]                             # `==` is vectorized
z = arr[~np.isnan(arr)]                 # array([1., 2.])

assert (x == y).all() and (x == z).all()

If you want to check for values that are not NaN, then negate whatever is used to flag NaNs; pandas has its own dedicated function for flagging non-NaN values.

lst = [1, 2, float('nan')]

m1 = [e == e for e in lst]              # [True, True, False]

m2 = [not math.isnan(e) for e in lst]   # [True, True, False]

m3 = ~np.isnan(lst)                     # array([ True,  True, False])

m4 = pd.notna(lst)                      # array([ True,  True, False])

This is especially useful if you want to filter values that are not NaN. For ndarray/Series objects, == is vectorized, so it can be used as well.

s = pd.Series(lst)
arr = np.array(lst)

x = s[s.notna()]
y = s[s==s]                             # `==` is vectorized
z = arr[~np.isnan(arr)]                 # array([1., 2.])

assert (x == y).all() and (x == z).all()
遮了一弯 2024-07-31 07:04:34

要过滤掉“num_specimen_seen”列中的空字符串 ('')、None 和 NaN 值,我们可以使用 pd.notna() 函数。

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'num_specimen_seen': [10, 2, 1, '', 34, 'aw', np.NaN, 5, '43', np.nan, 'ed', None, '']
})

for idx, row in df.iterrows():
    if pd.notna(row['num_specimen_seen']) and row['num_specimen_seen'] != '':
        print(idx, row['num_specimen_seen'])

在迭代 DataFrame 时,此代码将跳过“num_specimen_seen”列中的 NaN 和空字符串。

To filter out both empty strings (''), None and NaN values in the 'num_specimen_seen' column, we can use the pd.notna() function from pandas.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'num_specimen_seen': [10, 2, 1, '', 34, 'aw', np.NaN, 5, '43', np.nan, 'ed', None, '']
})

for idx, row in df.iterrows():
    if pd.notna(row['num_specimen_seen']) and row['num_specimen_seen'] != '':
        print(idx, row['num_specimen_seen'])

This code will skip both NaN and empty strings in the 'num_specimen_seen' column when iterating over the DataFrame.

千笙结 2024-07-31 07:04:34

对于 panda 中的字符串,采用 pd.isnull:

if not pd.isnull(atext):
  for word in nltk.word_tokenize(atext):

作为 NLTK 特征提取的函数

def act_features(atext):
features = {}
if not pd.isnull(atext):
  for word in nltk.word_tokenize(atext):
    if word not in default_stopwords:
      features['cont({})'.format(word.lower())]=True
return features

for strings in panda take pd.isnull:

if not pd.isnull(atext):
  for word in nltk.word_tokenize(atext):

the function as feature extraction for NLTK

def act_features(atext):
features = {}
if not pd.isnull(atext):
  for word in nltk.word_tokenize(atext):
    if word not in default_stopwords:
      features['cont({})'.format(word.lower())]=True
return features
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文