Python 中的连续数组警告 (numba)

发布于 2025-01-10 22:48:22 字数 911 浏览 0 评论 0原文

我有以下代码片段，我使用 numba 来加速我的代码：

import numpy as np
from numba import jit

Sigma = np.array([
                  [1, 1, 0.5, 0.25],
                  [1, 2.5, 1, 0.5],
                  [0.5, 1, 0.5, 0.25],
                  [0.25, 0.5, 0.25, 0.25]
])
Z = np.array([0.111, 0.00658])

@jit(nopython=True)
def mean(Sigma, Z):
  return np.dot(np.dot(Sigma[0:2, 2:4], linalg.inv(Sigma[2:4, 2:4])), Z)

print(mean(Sigma, Z))

但是，numba 正在抱怨

NumbaPerformanceWarning: np.dot() is faster on contiguous arrays, called on (array(float64, 2d, A), array(float64, 2d, F))
  return np.dot(np.dot(Sigma[0:2, 2:4], linalg.inv(Sigma[2:4, 2:4])), Z)

如果我没有记错的话（在阅读这个），numpy 数组的连续结构由于以下原因而被破坏来自 Sigma 的子矩阵切片（即“Sigma[0:2, 2:4]”）。这是正确的吗？如果是的话有什么办法可以解决这个警告吗？我相信解决此警告将有助于加快我的代码速度，这是我的主要目标。谢谢。

原文

I have the following snippet of code where I've used numba in order to speed up my code:

import numpy as np
from numba import jit

Sigma = np.array([
                  [1, 1, 0.5, 0.25],
                  [1, 2.5, 1, 0.5],
                  [0.5, 1, 0.5, 0.25],
                  [0.25, 0.5, 0.25, 0.25]
])
Z = np.array([0.111, 0.00658])

@jit(nopython=True)
def mean(Sigma, Z):
  return np.dot(np.dot(Sigma[0:2, 2:4], linalg.inv(Sigma[2:4, 2:4])), Z)

print(mean(Sigma, Z))

However, numba is complaining

NumbaPerformanceWarning: np.dot() is faster on contiguous arrays, called on (array(float64, 2d, A), array(float64, 2d, F))
  return np.dot(np.dot(Sigma[0:2, 2:4], linalg.inv(Sigma[2:4, 2:4])), Z)

If I'm not mistaken (after reading this), the contiguous structure of the numpy arrays is broken due to the slicing of sub-matrices from Sigma (i.e., "Sigma[0:2, 2:4]"). Is this correct? If so is there any way to resolve this warning? I believe resolving this warning would help speed up my code which is my main goal. Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

メ斷腸人バ 2025-01-17 22:48:22

您会收到此错误，因为 dot 和 inv 针对连续数组进行了优化。然而，对于较小的输入量来说，这并不是一个大问题。不过，您至少可以使用装饰器中的签名 'float64[:](float64[:,::1], float64[::1])' 指定输入数组是连续的代码>@jit(...)。这也会导致函数被急切地编译。

该函数中最大的性能问题是创建了一些临时数组以及对 linalg.inv 的调用，该函数的设计目的不是为了快速处理非常小的矩阵。根据行列式计算简单的表达式即可得到逆矩阵。

下面是生成的代码：

import numba as nb

@nb.njit('float64[:](float64[:,::1], float64[::1])')
def fast_mean(Sigma, Z):
    # Compute the inverse matrix
    mat_a = Sigma[2, 2]
    mat_b = Sigma[2, 3]
    mat_c = Sigma[3, 2]
    mat_d = Sigma[3, 3]
    invDet = 1.0 / (mat_a*mat_d - mat_b*mat_c)
    inv_a = invDet * mat_d
    inv_b = -invDet * mat_b
    inv_c = -invDet * mat_c
    inv_d = invDet * mat_a

    # Compute the matrix multiplication
    mat_a = Sigma[0, 2]
    mat_b = Sigma[0, 3]
    mat_c = Sigma[1, 2]
    mat_d = Sigma[1, 3]
    tmp_a = mat_a*inv_a + mat_b*inv_c
    tmp_b = mat_a*inv_b + mat_b*inv_d
    tmp_c = mat_c*inv_a + mat_d*inv_c
    tmp_d = mat_c*inv_b + mat_d*inv_d

    # Final dot product
    z0, z1 = Z
    result = np.empty(2, dtype=np.float64)
    result[0] = tmp_a*z0 + tmp_b*z1
    result[1] = tmp_c*z0 + tmp_d*z1
    return result

在我的机器上，这大约快了 3 倍。请注意，超过 60% 的时间花费在调用 Numba 函数和创建输出临时数组的开销上。因此，在调用者函数中使用 Numba 可能是明智的做法，以消除这种开销。

您可以将 result 数组作为参数传递，以避免创建数组，正如 @max9111 指出的那样，数组的创建非常昂贵。仅当您可以在调用程序函数中预分配输出缓冲区（如果可能的话，一次）时，这才有用。这几乎快了 6 倍。

You get this error because dot and inv are optimized for contiguous arrays. However, regarding the small input size, this is not a huge problem. Still, you can at least specify that the input array are contiguous using the signature 'float64[:](float64[:,::1], float64[::1])' in the decorator @jit(...). This also cause the function to be compiled eagerly.

The biggest performance issue in this function is the creation of few temporary array and the call to linalg.inv which is not designed to be fast for very small matrices. The inverse matrix can be obtained by computing a simple expression based on its determinant.

Here is the resulting code:

import numba as nb

@nb.njit('float64[:](float64[:,::1], float64[::1])')
def fast_mean(Sigma, Z):
    # Compute the inverse matrix
    mat_a = Sigma[2, 2]
    mat_b = Sigma[2, 3]
    mat_c = Sigma[3, 2]
    mat_d = Sigma[3, 3]
    invDet = 1.0 / (mat_a*mat_d - mat_b*mat_c)
    inv_a = invDet * mat_d
    inv_b = -invDet * mat_b
    inv_c = -invDet * mat_c
    inv_d = invDet * mat_a

    # Compute the matrix multiplication
    mat_a = Sigma[0, 2]
    mat_b = Sigma[0, 3]
    mat_c = Sigma[1, 2]
    mat_d = Sigma[1, 3]
    tmp_a = mat_a*inv_a + mat_b*inv_c
    tmp_b = mat_a*inv_b + mat_b*inv_d
    tmp_c = mat_c*inv_a + mat_d*inv_c
    tmp_d = mat_c*inv_b + mat_d*inv_d

    # Final dot product
    z0, z1 = Z
    result = np.empty(2, dtype=np.float64)
    result[0] = tmp_a*z0 + tmp_b*z1
    result[1] = tmp_c*z0 + tmp_d*z1
    return result

This is about 3 times faster on my machine. Note that >60% of the time is spend in the overhead of calling the Numba function and creating the output temporary array. Thus, it is probably wise to use Numba in the caller functions so to remove this overhead.

You can pass the result array as parameter so to avoid the creation of the array which is quite expensive as pointed out by @max9111. This is useful only if you could preallocate the output buffer in the caller functions (once if possible). This is nearly 6 times faster with this.

回复收藏 0 原文

~没有更多了~