如何从 NumPy 数组中删除所有零元素?

发布于 2024-11-05 21:59:16 字数 178 浏览 2 评论 0原文

我有一个 1 级 numpy.array,我想制作一个箱线图。但是,我想排除数组中所有等于零的值。目前,我通过循环数组并将值复制到新数组(如果不等于零)来解决这个问题。然而,由于该数组由 86 000 000 个值组成,而且我必须多次执行此操作,因此需要很大的耐心。

有没有更智能的方法来做到这一点?

I have a rank-1 numpy.array of which I want to make a boxplot. However, I want to exclude all values equal to zero in the array. Currently, I solved this by looping the array and copy the value to a new array if not equal to zero. However, as the array consists of 86 000 000 values and I have to do this multiple times, this takes a lot of patience.

Is there a more intelligent way to do this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

零崎曲识 2024-11-12 21:59:16

对于 NumPy 数组 a,您可以使用它

a[a != 0]

来提取不等于零的值。

For a NumPy array a, you can use

a[a != 0]

to extract the values not equal to zero.

倥絔 2024-11-12 21:59:16

在这种情况下,您想要使用屏蔽数组,它保持数组的形状,并且所有 numpy 和 matplotlib 函数都会自动识别它。

X = np.random.randn(1e3, 5)
X[np.abs(X)< .1]= 0 # some zeros
X = np.ma.masked_equal(X,0)
plt.boxplot(X) #masked values are not plotted

#other functionalities of masked arrays
X.compressed() # get normal array with masked values removed
X.mask # get a boolean array of the mask
X.mean() # it automatically discards masked values

This is a case where you want to use masked arrays, it keeps the shape of your array and it is automatically recognized by all numpy and matplotlib functions.

X = np.random.randn(1e3, 5)
X[np.abs(X)< .1]= 0 # some zeros
X = np.ma.masked_equal(X,0)
plt.boxplot(X) #masked values are not plotted

#other functionalities of masked arrays
X.compressed() # get normal array with masked values removed
X.mask # get a boolean array of the mask
X.mean() # it automatically discards masked values
浮萍、无处依 2024-11-12 21:59:16

我决定比较这里提到的不同方法的运行时间。为此,我使用了我的库 simple_benchmark

使用 array[array != 0] 的布尔索引似乎是最快(也是最短)的解决方案。

输入图像描述这里

对于较小的数组,与其他方法相比,MaskedArray 方法非常慢,但与布尔索引方法一样快。然而,对于中等大小的数组,它们之间没有太大区别。

这是我使用过的代码:

from simple_benchmark import BenchmarkBuilder

import numpy as np

bench = BenchmarkBuilder()

@bench.add_function()
def boolean_indexing(arr):
    return arr[arr != 0]

@bench.add_function()
def integer_indexing_nonzero(arr):
    return arr[np.nonzero(arr)]

@bench.add_function()
def integer_indexing_where(arr):
    return arr[np.where(arr != 0)]

@bench.add_function()
def masked_array(arr):
    return np.ma.masked_equal(arr, 0)

@bench.add_arguments('array size')
def argument_provider():
    for exp in range(3, 25):
        size = 2**exp
        arr = np.random.random(size)
        arr[arr < 0.1] = 0  # add some zeros
        yield size, arr

r = bench.run()
r.plot()

I decided to compare the runtime of the different approaches mentioned here. I've used my library simple_benchmark for this.

The boolean indexing with array[array != 0] seems to be the fastest (and shortest) solution.

enter image description here

For smaller arrays the MaskedArray approach is very slow compared to the other approaches however is as fast as the boolean indexing approach. However for moderately sized arrays there is not much difference between them.

Here is the code I've used:

from simple_benchmark import BenchmarkBuilder

import numpy as np

bench = BenchmarkBuilder()

@bench.add_function()
def boolean_indexing(arr):
    return arr[arr != 0]

@bench.add_function()
def integer_indexing_nonzero(arr):
    return arr[np.nonzero(arr)]

@bench.add_function()
def integer_indexing_where(arr):
    return arr[np.where(arr != 0)]

@bench.add_function()
def masked_array(arr):
    return np.ma.masked_equal(arr, 0)

@bench.add_arguments('array size')
def argument_provider():
    for exp in range(3, 25):
        size = 2**exp
        arr = np.random.random(size)
        arr[arr < 0.1] = 0  # add some zeros
        yield size, arr

r = bench.run()
r.plot()
走野 2024-11-12 21:59:16

您可以使用布尔数组进行索引。对于 NumPy 数组 A

res = A[A != 0]

您可以使用 布尔数组索引如上,bool类型转换,np.nonzeronp.where。以下是一些性能基准测试:

# Python 3.7, NumPy 1.14.3

np.random.seed(0)

A = np.random.randint(0, 5, 10**8)

%timeit A[A != 0]          # 768 ms
%timeit A[A.astype(bool)]  # 781 ms
%timeit A[np.nonzero(A)]   # 1.49 s
%timeit A[np.where(A)]     # 1.58 s

You can index with a Boolean array. For a NumPy array A:

res = A[A != 0]

You can use Boolean array indexing as above, bool type conversion, np.nonzero, or np.where. Here's some performance benchmarking:

# Python 3.7, NumPy 1.14.3

np.random.seed(0)

A = np.random.randint(0, 5, 10**8)

%timeit A[A != 0]          # 768 ms
%timeit A[A.astype(bool)]  # 781 ms
%timeit A[np.nonzero(A)]   # 1.49 s
%timeit A[np.where(A)]     # 1.58 s
箜明 2024-11-12 21:59:16

我想建议您在这种情况下简单地使用 NaN,在这种情况下,您希望忽略某些值,但仍希望使过程统计尽可能有意义。所以

In []: X= randn(1e3, 5)
In []: X[abs(X)< .1]= NaN
In []: isnan(X).sum(0)
Out[: array([82, 84, 71, 81, 73])
In []: boxplot(X)

在此处输入图像描述

I would like to suggest you to simply utilize NaN for cases like this, where you'll like to ignore some values, but still want to keep the procedure statistical as meaningful as possible. So

In []: X= randn(1e3, 5)
In []: X[abs(X)< .1]= NaN
In []: isnan(X).sum(0)
Out[: array([82, 84, 71, 81, 73])
In []: boxplot(X)

enter image description here

奈何桥上唱咆哮 2024-11-12 21:59:16

一行简单的代码可以获得一个排除所有“0”值的数组:

np.argwhere(*array*)

示例:

import numpy as np
array = [0, 1, 0, 3, 4, 5, 0]
array2 = np.argwhere(array)
print array2

[1, 3, 4, 5]

A simple line of code can get you an array that excludes all '0' values:

np.argwhere(*array*)

example:

import numpy as np
array = [0, 1, 0, 3, 4, 5, 0]
array2 = np.argwhere(array)
print array2

[1, 3, 4, 5]
柠檬色的秋千 2024-11-12 21:59:16

[i for i in Array if i != 0.0] 如果数字是浮点数
[i for i in SICER if i != 0] 如果数字是 int。

[i for i in Array if i != 0.0] if the numbers are float
or [i for i in SICER if i != 0] if the numbers are int.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文