如何通过单元格在多个2D阵列上获得标准偏差?

发布于 2025-01-20 15:46:35 字数 661 浏览 1 评论 0原文

我有16个2d阵列,每个阵列的形状为[16000,16000],这意味着一个阵列具有256000000个单元格。我想拥有一个std_array,它是16个阵列中每个单元格的标准偏差。我尝试了一些东西,但失败了,我的问题大胆。

这是我的尝试。例如(简化3*3个数组):

a = np.array([[1,2,3],[1,2,3],[1,2,3]])
b = np.array([[2,3,4],[2,3,4],[2,3,4]])
c = np.array([[3,4,5],[3,4,5],[3,4,5]])

stack = np.vstack((a,b,c))
var = np.std(stack, axis = 0)

但是,NP.STD函数仅返回3个值,但是我想要9。我该怎么办?

[0.81649658 0.81649658 0.81649658]

此外,当我在堆叠的 -阵列,我得到了这个错误。这只是意味着我的阵列太大而无法操作?

MemoryError: Unable to allocate array with shape (256000, 16000) and data type float32

I have 16 2d-arrays, each in a shape of [16000, 16000], which means one array has 256000000 cells. I want to have a std_array that is the standard deviation of each cell in the 16 arrays. I tried something but failed, and my questions are in bold.

Here's my attempt. For example (simplified 3*3 arrays):

a = np.array([[1,2,3],[1,2,3],[1,2,3]])
b = np.array([[2,3,4],[2,3,4],[2,3,4]])
c = np.array([[3,4,5],[3,4,5],[3,4,5]])

stack = np.vstack((a,b,c))
var = np.std(stack, axis = 0)

However, the np.std function only returns 3 values, but I want 9. What should I do?

[0.81649658 0.81649658 0.81649658]

In addition, when I apply std on the stacked-arrays, I get this error. Does it simply mean that my arrays are too large to operate?

MemoryError: Unable to allocate array with shape (256000, 16000) and data type float32

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

最偏执的依靠 2025-01-27 15:46:35

在您的示例中,np.vstack((a,b,c)) 只是堆叠每个数组的所有行,结果如下:

array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [2, 3, 4],
       [2, 3, 4],
       [2, 3, 4],
       [3, 4, 5],
       [3, 4, 5],
       [3, 4, 5]])

计算沿轴 0 或 1 的标准差不满足您的要求。

相反,您可以向每个数组添加一个新维度,以便将它们堆叠在一个新维度中:

stack = np.vstack([a[None], b[None], c[None]])
stack.std(axis=2)

在本例中 stack 为:

array([[[1, 2, 3],   <-- array `a`
        [1, 2, 3],
        [1, 2, 3]],

       [[2, 3, 4],   <-- array `b`
        [2, 3, 4],
        [2, 3, 4]],

       [[3, 4, 5],   <-- array `c`
        [3, 4, 5],
        [3, 4, 5]]])

结果是形状为 (3,3)< 的二维数组/code> 其中标准偏差是根据分别来自 3 个数组中的每一个的 3 个值计算的。

问题是构建一个巨大的数组,因此稍后减少它的内存效率不高。相反,您可以迭代这些行来构建更小的数组:

result = np.empty(a.shape, dtype=np.float64)
for i in range(a.shape[0]):
    stacked_line = np.vstack([a[i, None], b[i, None], c[i, None]])
    result[i,:] = stacked_line.std(axis=0)

为了获得更高的性能,您可以使用 Numba 来避免创建许多构建和填充成本高昂的大数组(Numpy 强制)。

In your example, np.vstack((a,b,c)) just stack all lines of each array resulting in this one:

array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [2, 3, 4],
       [2, 3, 4],
       [2, 3, 4],
       [3, 4, 5],
       [3, 4, 5],
       [3, 4, 5]])

Computing the standard deviation along the axis 0 or 1 does not meet your requirements.

Instead, you can add a new dimension to each array so to stack them in a new dimension:

stack = np.vstack([a[None], b[None], c[None]])
stack.std(axis=2)

In this case stack is:

array([[[1, 2, 3],   <-- array `a`
        [1, 2, 3],
        [1, 2, 3]],

       [[2, 3, 4],   <-- array `b`
        [2, 3, 4],
        [2, 3, 4]],

       [[3, 4, 5],   <-- array `c`
        [3, 4, 5],
        [3, 4, 5]]])

The result is a 2D array of shape (3,3) where the standard deviation is computed based on the 3 values coming from respectively each of the 3 arrays.

The thing is building a huge array so to reduce it later is not memory efficient. You can instead iterate over the lines so to build smaller arrays:

result = np.empty(a.shape, dtype=np.float64)
for i in range(a.shape[0]):
    stacked_line = np.vstack([a[i, None], b[i, None], c[i, None]])
    result[i,:] = stacked_line.std(axis=0)

For higher performance, you can use Numba so to avoid the creation of many big arrays (mandatory with Numpy) that are expensive to build and fill.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文