如何从 NumPy 数组中删除所有零元素？

发布于 2024-11-05 21:59:16 字数 178 浏览 2 评论 0原文

我有一个 1 级 numpy.array，我想制作一个箱线图。但是，我想排除数组中所有等于零的值。目前，我通过循环数组并将值复制到新数组（如果不等于零）来解决这个问题。然而，由于该数组由 86 000 000 个值组成，而且我必须多次执行此操作，因此需要很大的耐心。

有没有更智能的方法来做到这一点？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

零崎曲识 2024-11-12 21:59:16

对于 NumPy 数组 a，您可以使用它

a[a != 0]

来提取不等于零的值。

For a NumPy array a, you can use

a[a != 0]

to extract the values not equal to zero.

回复收藏 0 原文

倥絔 2024-11-12 21:59:16

在这种情况下，您想要使用屏蔽数组，它保持数组的形状，并且所有 numpy 和 matplotlib 函数都会自动识别它。

X = np.random.randn(1e3, 5)
X[np.abs(X)< .1]= 0 # some zeros
X = np.ma.masked_equal(X,0)
plt.boxplot(X) #masked values are not plotted

#other functionalities of masked arrays
X.compressed() # get normal array with masked values removed
X.mask # get a boolean array of the mask
X.mean() # it automatically discards masked values

This is a case where you want to use masked arrays, it keeps the shape of your array and it is automatically recognized by all numpy and matplotlib functions.

X = np.random.randn(1e3, 5)
X[np.abs(X)< .1]= 0 # some zeros
X = np.ma.masked_equal(X,0)
plt.boxplot(X) #masked values are not plotted

#other functionalities of masked arrays
X.compressed() # get normal array with masked values removed
X.mask # get a boolean array of the mask
X.mean() # it automatically discards masked values

回复收藏 0 原文

浮萍、无处依 2024-11-12 21:59:16

我决定比较这里提到的不同方法的运行时间。为此，我使用了我的库 simple_benchmark 。

使用 array[array != 0] 的布尔索引似乎是最快（也是最短）的解决方案。

对于较小的数组，与其他方法相比，MaskedArray 方法非常慢，但与布尔索引方法一样快。然而，对于中等大小的数组，它们之间没有太大区别。

这是我使用过的代码：

from simple_benchmark import BenchmarkBuilder

import numpy as np

bench = BenchmarkBuilder()

@bench.add_function()
def boolean_indexing(arr):
    return arr[arr != 0]

@bench.add_function()
def integer_indexing_nonzero(arr):
    return arr[np.nonzero(arr)]

@bench.add_function()
def integer_indexing_where(arr):
    return arr[np.where(arr != 0)]

@bench.add_function()
def masked_array(arr):
    return np.ma.masked_equal(arr, 0)

@bench.add_arguments('array size')
def argument_provider():
    for exp in range(3, 25):
        size = 2**exp
        arr = np.random.random(size)
        arr[arr < 0.1] = 0  # add some zeros
        yield size, arr

r = bench.run()
r.plot()

I decided to compare the runtime of the different approaches mentioned here. I've used my library simple_benchmark for this.

The boolean indexing with array[array != 0] seems to be the fastest (and shortest) solution.

For smaller arrays the MaskedArray approach is very slow compared to the other approaches however is as fast as the boolean indexing approach. However for moderately sized arrays there is not much difference between them.

Here is the code I've used:

from simple_benchmark import BenchmarkBuilder

import numpy as np

bench = BenchmarkBuilder()

@bench.add_function()
def boolean_indexing(arr):
    return arr[arr != 0]

@bench.add_function()
def integer_indexing_nonzero(arr):
    return arr[np.nonzero(arr)]

@bench.add_function()
def integer_indexing_where(arr):
    return arr[np.where(arr != 0)]

@bench.add_function()
def masked_array(arr):
    return np.ma.masked_equal(arr, 0)

@bench.add_arguments('array size')
def argument_provider():
    for exp in range(3, 25):
        size = 2**exp
        arr = np.random.random(size)
        arr[arr < 0.1] = 0  # add some zeros
        yield size, arr

r = bench.run()
r.plot()

回复收藏 0 原文

走野 2024-11-12 21:59:16

您可以使用布尔数组进行索引。对于 NumPy 数组 A：

res = A[A != 0]

您可以使用布尔数组索引如上，bool类型转换，np.nonzero 或 np.where。以下是一些性能基准测试：

# Python 3.7, NumPy 1.14.3

np.random.seed(0)

A = np.random.randint(0, 5, 10**8)

%timeit A[A != 0]          # 768 ms
%timeit A[A.astype(bool)]  # 781 ms
%timeit A[np.nonzero(A)]   # 1.49 s
%timeit A[np.where(A)]     # 1.58 s

You can index with a Boolean array. For a NumPy array A:

res = A[A != 0]

You can use Boolean array indexing as above, bool type conversion, np.nonzero, or np.where. Here's some performance benchmarking:

# Python 3.7, NumPy 1.14.3

np.random.seed(0)

A = np.random.randint(0, 5, 10**8)

%timeit A[A != 0]          # 768 ms
%timeit A[A.astype(bool)]  # 781 ms
%timeit A[np.nonzero(A)]   # 1.49 s
%timeit A[np.where(A)]     # 1.58 s

回复收藏 0 原文

箜明 2024-11-12 21:59:16

我想建议您在这种情况下简单地使用 NaN，在这种情况下，您希望忽略某些值，但仍希望使过程统计尽可能有意义。所以

In []: X= randn(1e3, 5)
In []: X[abs(X)< .1]= NaN
In []: isnan(X).sum(0)
Out[: array([82, 84, 71, 81, 73])
In []: boxplot(X)

在此处输入图像描述

I would like to suggest you to simply utilize NaN for cases like this, where you'll like to ignore some values, but still want to keep the procedure statistical as meaningful as possible. So

In []: X= randn(1e3, 5)
In []: X[abs(X)< .1]= NaN
In []: isnan(X).sum(0)
Out[: array([82, 84, 71, 81, 73])
In []: boxplot(X)

enter image description here

回复收藏 0 原文

奈何桥上唱咆哮 2024-11-12 21:59:16

一行简单的代码可以获得一个排除所有“0”值的数组：

np.argwhere(*array*)

示例：

import numpy as np
array = [0, 1, 0, 3, 4, 5, 0]
array2 = np.argwhere(array)
print array2

[1, 3, 4, 5]

A simple line of code can get you an array that excludes all '0' values:

np.argwhere(*array*)

example:

import numpy as np
array = [0, 1, 0, 3, 4, 5, 0]
array2 = np.argwhere(array)
print array2

[1, 3, 4, 5]

回复收藏 0 原文

柠檬色的秋千 2024-11-12 21:59:16

[i for i in Array if i != 0.0] 如果数字是浮点数
或 [i for i in SICER if i != 0] 如果数字是 int。

回复收藏 0 原文

~没有更多了~

关于作者

清醇

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

如何从 NumPy 数组中删除所有零元素？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

关于作者

相关话题

热门标签

推荐作者

烙印

singlesman

给自己一个微笑

独孤求败

晨钟暮鼓

我是自愿种绣球花的

友情链接

如何从 NumPy 数组中删除所有零元素？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

关于作者

相关话题

热门标签

推荐作者

烙印

singlesman

给自己一个微笑

独孤求败

晨钟暮鼓

我是自愿种绣球花的

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。