从 Python 中的 numpy 数组中删除 NaN(或其他值)的元素对

发布于 2024-08-29 18:32:21 字数 386 浏览 2 评论 0原文

我在 numpy 中有一个包含两列的数组。例如:

a = array([[1, 5, nan, 6],
           [10, 6, 6, nan]])
a = transpose(a)

我想有效地迭代两列 a[:, 0] 和 a[:, 1] 并删除满足特定条件的任何对,在本例中如果它们是 NaN。我能想到的明显方法是:

new_a = []
for val1, val2 in a:
  if val2 == nan or val2 == nan:
    new_a.append([val1, val2])

但这看起来很笨拙。执行此操作的 pythonic numpy 方法是什么?

谢谢。

I have an array with two columns in numpy. For example:

a = array([[1, 5, nan, 6],
           [10, 6, 6, nan]])
a = transpose(a)

I want to efficiently iterate through the two columns, a[:, 0] and a[:, 1] and remove any pairs that meet a certain condition, in this case if they are NaN. The obvious way I can think of is:

new_a = []
for val1, val2 in a:
  if val2 == nan or val2 == nan:
    new_a.append([val1, val2])

But that seems clunky. What's the pythonic numpy way of doing this?

thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

葬花如无物 2024-09-05 18:32:21

如果您只想获取没有 NAN 的行,那么这就是您需要的表达式:

>>> import numpy as np
>>> a[~np.isnan(a).any(1)]
array([[  1.,  10.],
       [  5.,   6.]])

如果您想要其元素中没有特定数字的行,例如 5:

>>> a[~(a == 5).any(1)]
array([[  1.,  10.],
       [ NaN,   6.],
       [  6.,  NaN]])

后者显然相当于

>>> a[(a != 5).all(1)]
array([[  1.,  10.],
       [ NaN,   6.],
       [  6.,  NaN]])

Explanation< /强>:
让我们首先创建示例输入

>>> import numpy as np
>>> a = np.array([[1, 5, np.nan, 6],
...               [10, 6, 6, np.nan]]).transpose()
>>> a
array([[  1.,  10.],
       [  5.,   6.],
       [ NaN,   6.],
       [  6.,  NaN]])

这确定哪些元素是 NAN

>>> np.isnan(a)
array([[False, False],
       [False, False],
       [ True, False],
       [False,  True]], dtype=bool)

这确定哪些行具有任何为 True 的元素

>>> np.isnan(a).any(1)
array([False, False,  True,  True], dtype=bool)

因为我们不需要这些元素,所以我们否定最后一个表达式:

>>> ~np.isnan(a).any(1)
array([ True,  True, False, False], dtype=bool)

最后我们使用布尔数组来选择我们想要的行:

>>> a[~np.isnan(a).any(1)]
array([[  1.,  10.],
       [  5.,   6.]])

If you want to take only the rows that have no NANs, this is the expression you need:

>>> import numpy as np
>>> a[~np.isnan(a).any(1)]
array([[  1.,  10.],
       [  5.,   6.]])

If you want the rows that do not have a specific number among its elements, e.g. 5:

>>> a[~(a == 5).any(1)]
array([[  1.,  10.],
       [ NaN,   6.],
       [  6.,  NaN]])

The latter is clearly equivalent to

>>> a[(a != 5).all(1)]
array([[  1.,  10.],
       [ NaN,   6.],
       [  6.,  NaN]])

Explanation:
Let's first create your example input

>>> import numpy as np
>>> a = np.array([[1, 5, np.nan, 6],
...               [10, 6, 6, np.nan]]).transpose()
>>> a
array([[  1.,  10.],
       [  5.,   6.],
       [ NaN,   6.],
       [  6.,  NaN]])

This determines which elements are NAN

>>> np.isnan(a)
array([[False, False],
       [False, False],
       [ True, False],
       [False,  True]], dtype=bool)

This identifies which rows have any element which are True

>>> np.isnan(a).any(1)
array([False, False,  True,  True], dtype=bool)

Since we don't want these, we negate the last expression:

>>> ~np.isnan(a).any(1)
array([ True,  True, False, False], dtype=bool)

And finally we use the boolean array to select the rows we want:

>>> a[~np.isnan(a).any(1)]
array([[  1.,  10.],
       [  5.,   6.]])
π浅易 2024-09-05 18:32:21

您可以将数组转换为 屏蔽数组,并使用 < a href="http://docs.scipy.org/doc/numpy/reference/routines.ma.html#to-a-ndarray" rel="nofollow noreferrer">compress_rows 方法:

import numpy as np
a = np.array([[1, 5, np.nan, 6],
           [10, 6, 6, np.nan]])
a = np.transpose(a)
print(a)
# [[  1.  10.]
#  [  5.   6.]
#  [ NaN   6.]
#  [  6.  NaN]]
b=np.ma.compress_rows(np.ma.fix_invalid(a))
print(b)
# [[  1.  10.]
#  [  5.   6.]]

You could convert the array into a masked array, and use the compress_rows method:

import numpy as np
a = np.array([[1, 5, np.nan, 6],
           [10, 6, 6, np.nan]])
a = np.transpose(a)
print(a)
# [[  1.  10.]
#  [  5.   6.]
#  [ NaN   6.]
#  [  6.  NaN]]
b=np.ma.compress_rows(np.ma.fix_invalid(a))
print(b)
# [[  1.  10.]
#  [  5.   6.]]
颜漓半夏 2024-09-05 18:32:21

不要偏离 ig0774 的答案,这是完全有效且 Pythonic 的,实际上是用普通 Python 执行这些操作的正常方法,但是:numpy 支持布尔索引系统,也可以完成这项工作。

new_a = a[(a==a).all(1)]

我不确定哪种方式会更有效(或执行速度更快)。

如果您想使用不同的条件来选择行,则必须进行更改,具体更改方式取决于条件。如果可以独立评估每个数组元素,您可以将 a==a 替换为适当的测试,例如消除所有数字大于 100 的行,您可以这样做

new_a = a[(a<=100).all(1)]

但是如果如果您尝试做一些涉及一行​​中所有元素的奇特事情(例如消除总和超过 100 的所有行),则可能会更复杂。如果是这样,如果您想分享您的确切情况,我可以尝试编辑更具体的答案。

Not to detract from ig0774's answer, which is perfectly valid and Pythonic and is in fact the normal way of doing these things in plain Python, but: numpy supports a boolean indexing system which could also do the job.

new_a = a[(a==a).all(1)]

I'm not sure offhand which way would be more efficient (or faster to execute).

If you wanted to use a different condition to select the rows, this would have to be changed, and precisely how depends on the condition. If it's something that can be evaluated for each array element independently, you could just replace the a==a with the appropriate test, for example to eliminate all rows with numbers larger than 100 you could do

new_a = a[(a<=100).all(1)]

But if you're trying to do something fancy that involves all the elements in a row (like eliminating all rows that sum to more than 100), it might be more complicated. If that's the case, I can try to edit in a more specific answer if you want to share your exact condition.

宣告ˉ结束 2024-09-05 18:32:21

我认为 列表推导式 应该做到这一点。例如,

new_a = [(val1, val2) for (val1, val2) in a if math.isnan(val1) or math.isnan(val2)]

I think list comprehensions should do this. E.g.,

new_a = [(val1, val2) for (val1, val2) in a if math.isnan(val1) or math.isnan(val2)]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文