矢量化的方式乘以乘以Numpy阵列中的特定轴(卷积层backprop)

发布于 2025-02-01 18:03:46 字数 2907 浏览 3 评论 0 原文

我想知道如何可以将以下四倍的前循环矢量化(这是在卷积层中与Backprop一起使用的)。

W = np.ones((2, 2, 3, 8)) # just a toy example
dW = np.zeros(W.shape)
dZ = np.ones((10, 4, 4, 8))*2
# get the shapes: m = samples/images; H_dim = Height of image; W_dim = Width of image; 8 = Channels/filters
(m, H_dim, W_dim, C) = dZ.shape 
dA_prev = np.zeros((10, 4, 4, 3))
# add symmetric padding of 2 around height and width borders with 0-values; shape is now: (10, 8, 8, 3)
dA_prev = np.pad(dA_prev,((0,0),(2,2),(2,2),(0,0)), mode='constant', constant_values = (0,0)) 

# loop over images
for i in range(m):
    # loop over height
    for h in range(H_dim):
        # loop over width
        for w in range(W_dim):
            # loop over channels/filters
            for c in range(C):
                vert_start = 1 * h # 1 = stride, just as an example
                vert_end = vert_start + 2 # 2 = vertical filter size, just as an example
                horiz_start = 1 * w # 1 = stride
                horiz_end = horiz_start + 2 # 2 = horizontal filter size, just as an example
                dW[:,:,:,c] += dA_prev[i, vert_start:vert_end,horiz_start:horiz_end,:] * dZ[i, h, w, c]
                dA_prev[i, vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[i, h, w, c] # dZ[i, h, w, c] is a scalar

在偏见上进行反向处理非常容易( db = np.sum(dz,axis =(0,1,2),keepdims = true)),并且可以使用Stride技巧来处理权重通过重塑DZ,然后使用点产物重新验证的输入(或轴上或Einsum上的Tensordot)。

def _striding(array, stride_size, filter_shapes, Wout=None, Hout=None):
    strides = (array.strides[0], array.strides[1] * stride_size, array.strides[2] * stride_size, array.strides[1], array.strides[2], array.strides[3])
    strided = as_strided(array, shape=(array.shape[0], Hout, Wout, filter_shapes[0], filter_shapes[1], array.shape[3]), strides=strides, writeable=False)
    return strided

Hout = (A_prev.shape[1] - 2) // 1 + 1
Wout = (A_prev.shape[2] - 2) // 1 + 1
x_flat = _striding(array=A_prev, stride_size=2, filter_shapes=(2,2),
                     Wout=Wout, Hout=Hout).reshape(-1, 2 * 2 * A_prev.shape[3])
dout_descendant_flat = dout_descendant.reshape(-1, n_C)
dW = x_flat.T @ dout_descendant_flat # shape (fh * fw * n_prev_C, C)
dW = dW.reshape(fh, fw, n_prev_C, C)

这给出了与慢速版本中 dw 相同的结果。但是,做类似的事情以使衍生物WRT进入应该产生相同结果的输入。这是我所做的:

dZ_pad = np.pad(dZ,((0,0),(2,2),(2,2),(0,0)), mode='constant', constant_values = (0,0)) # padding to get the same shape as A_prev
dZ_pad_reshaped = _striding(array=dZ_pad, stride_size=1, filter_shapes=(2,2),
                     Wout=4, Hout=4) # the Hout and Wout dims are from the unpadded dims of A_prev
Wrot180 = np.rot90(W, 2, axes=(0,1)) # the filter height and width are in the first two axes, which we want to rotate
dA_prev = np.tensordot(dZ_pad_reshaped, Wrot180, axes=([3,4,5],[0,1,3]))

da_prev的形状是正确的,但是由于某种原因,结果与慢版本并不相同

I was wondering how it would be possible to vectorize the following quadruple for-loops (this is t do with backprop in a convolutional layer).

W = np.ones((2, 2, 3, 8)) # just a toy example
dW = np.zeros(W.shape)
dZ = np.ones((10, 4, 4, 8))*2
# get the shapes: m = samples/images; H_dim = Height of image; W_dim = Width of image; 8 = Channels/filters
(m, H_dim, W_dim, C) = dZ.shape 
dA_prev = np.zeros((10, 4, 4, 3))
# add symmetric padding of 2 around height and width borders with 0-values; shape is now: (10, 8, 8, 3)
dA_prev = np.pad(dA_prev,((0,0),(2,2),(2,2),(0,0)), mode='constant', constant_values = (0,0)) 

# loop over images
for i in range(m):
    # loop over height
    for h in range(H_dim):
        # loop over width
        for w in range(W_dim):
            # loop over channels/filters
            for c in range(C):
                vert_start = 1 * h # 1 = stride, just as an example
                vert_end = vert_start + 2 # 2 = vertical filter size, just as an example
                horiz_start = 1 * w # 1 = stride
                horiz_end = horiz_start + 2 # 2 = horizontal filter size, just as an example
                dW[:,:,:,c] += dA_prev[i, vert_start:vert_end,horiz_start:horiz_end,:] * dZ[i, h, w, c]
                dA_prev[i, vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[i, h, w, c] # dZ[i, h, w, c] is a scalar

doing backprop on the bias is easy enough (db = np.sum(dZ, axis=(0,1,2), keepdims=True)), and the weights can be dealt with using stride tricks and by reshaping the dZ and then using the dot product the rescaled input (or tensordot on the axes or einsum).

def _striding(array, stride_size, filter_shapes, Wout=None, Hout=None):
    strides = (array.strides[0], array.strides[1] * stride_size, array.strides[2] * stride_size, array.strides[1], array.strides[2], array.strides[3])
    strided = as_strided(array, shape=(array.shape[0], Hout, Wout, filter_shapes[0], filter_shapes[1], array.shape[3]), strides=strides, writeable=False)
    return strided

Hout = (A_prev.shape[1] - 2) // 1 + 1
Wout = (A_prev.shape[2] - 2) // 1 + 1
x_flat = _striding(array=A_prev, stride_size=2, filter_shapes=(2,2),
                     Wout=Wout, Hout=Hout).reshape(-1, 2 * 2 * A_prev.shape[3])
dout_descendant_flat = dout_descendant.reshape(-1, n_C)
dW = x_flat.T @ dout_descendant_flat # shape (fh * fw * n_prev_C, C)
dW = dW.reshape(fh, fw, n_prev_C, C)

this gives identical results as dW in the slow version. but doing something similar to get the derivative wrt to the input that should yield the same result, doesn't. here's what i've done:

dZ_pad = np.pad(dZ,((0,0),(2,2),(2,2),(0,0)), mode='constant', constant_values = (0,0)) # padding to get the same shape as A_prev
dZ_pad_reshaped = _striding(array=dZ_pad, stride_size=1, filter_shapes=(2,2),
                     Wout=4, Hout=4) # the Hout and Wout dims are from the unpadded dims of A_prev
Wrot180 = np.rot90(W, 2, axes=(0,1)) # the filter height and width are in the first two axes, which we want to rotate
dA_prev = np.tensordot(dZ_pad_reshaped, Wrot180, axes=([3,4,5],[0,1,3]))

the shapes of dA_prev are right, but for some reason the results aren't identical as the slow version

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

心凉 2025-02-08 18:03:46

好的,事实证明,错误是与几件事有关:

  1. dz 需要相对于正向传播中的步幅扩张
  2. dz >的扩张需要以大步1(无论向前传播中的步幅选择如何)来调用带有填充输入的输出高度和宽度(不是原始的,未添加的输入 - 这是我花了几天的时间来调试的主要错误

是相关的代码下面的评论,解释了形状和操作以及一些更多的阅读资源。我还包括了远期传播。
我应该注意,经过几天的调试,编写各种功能,阅读等。一段时间后,变量名称更改了,因此,为了易于阅读,以下是我的问题中定义的变量的名称,然后在代码中等效的变量。下图:

a_prev x
dz dout_descendant
Hout dout_descendant的高度
wout dout_descendant 的宽度

(正如人们期望所有对 self 的参考是对这些功能的一部分

    def _pad(self, array, pad_size, pad_val):
        '''
        only symmetric padding is implemented
        ''' 
        return np.pad(array, ((0, 0), (pad_size, pad_size), (pad_size, pad_size), (0, 0)), 'constant', constant_values=(pad_val, pad_val))


    def _dilate(self, array, stride_size, pad_size, symmetric_filter_shape, output_image_size):
        # on dilation for backprop with stride>1, 
        # see: https://medium.com/@mayank.utexas/backpropagation-for-convolution-with-strides-8137e4fc2710
        # see also: https://leimao.github.io/blog/Transposed-Convolution-As-Convolution/
        pad_bottom_and_right = (output_image_size + 2 * pad_size - symmetric_filter_shape) % stride_size
        for m in range(stride_size - 1):
            array = np.insert(array, range(1, array.shape[1], m + 1), 0, axis=1)
            array = np.insert(array, range(1, array.shape[2], m + 1), 0, axis=2)

        for _ in range(pad_bottom_and_right):
            array = np.insert(array, array.shape[1], 0, axis=1)
            array = np.insert(array, array.shape[2], 0, axis=2)
        return array


    def _windows(self, array, stride_size, filter_shapes, out_height, out_width):
        '''
        inputs:
            array to create windows of
            stride_size: int
            filter_shapes: tuple(int): tuple of filter height and width
            out_height and out_width: int, respectively: output sizes for the windows
        returns:
            windows of array with shape (excl. dilation): 
                array.shape[0], out_height, out_width, filter_shapes[0], filter_shapes[1], array.shape[3]
        '''
        strides = (array.strides[0], array.strides[1] * stride_size, array.strides[2] * stride_size, array.strides[1], array.strides[2], array.strides[3])
        return np.lib.stride_tricks.as_strided(array, shape=(array.shape[0], out_height, out_width, filter_shapes[0], filter_shapes[1], array.shape[3]), strides=strides, writeable=False)


    def forward(self, x):
        '''
        expects inputs to be of shape: [batchsize, height, width, channel in]
        after init, filter_shapes are: [fh, fw, channel in, channel out] 
        '''
        self.input_shape = x.shape
        x_pad = self._pad(x, self.pad_size, self.pad_val)
        self.input_pad_shape = x_pad.shape
        # get the shapes
        batch_size, h, w, Cin = self.input_shape
        # calculate output sizes; only symmetric padding is possible
        self.Hout = (h + 2*self.pad_size - self.fh) // self.stride + 1
        self.Wout = (w + 2*self.pad_size - self.fw) // self.stride + 1
        x_windows = self._windows(array=x_pad, stride_size=self.stride, filter_shapes=(self.fh, self.fw),
                        out_width=self.Wout, out_height=self.Hout) # 2D matrix with shape (batch_size, Hout, Wout, fh, fw, Cin)
        self.out = np.tensordot(x_windows, self.w, axes=([3,4,5], [0,1,2])) + self.b
        self.inputs = x_windows
        ## alternative 1: einsum approach, slower than other alternatives
        # self.out = np.einsum('noufvc,fvck->nouk', x_windows, self.w) + self.b
        ## alternative 2: column approach with simple dot product
        # z = x_windows.reshape(-1, self.fh * self.fw * Cin) @ self.W.reshape(self.fh*self.fw*Cin, Cout) + self.b # 2D matrix with shape (batch_size * Hout * Wout, Cout)
        # self.dout = z.reshape(batch_size, Hout, Wout, Cout)

    def backward(self,dout_descendant):
        '''
        dout_descendant has shape (batch_size, Hout, Wout, Cout)
        ''' 
        # get shapes
        batch_size, h, w, Cin = self.input_shape
        # we want to sum everything but the filters for b
        self.db = np.sum(dout_descendant, axis=(0,1,2), keepdims=True) # shape (1,1,1, Cout)
        # for dW we'll use the column approach with ordinary dot product for variety ;) tensordot does the same without all the reshaping
        dout_descendant_flat = dout_descendant.reshape(-1, self.Cout) # new shape (batch_size * Hout * Wout, Cout)
        x_flat = self.inputs.reshape(-1, self.fh * self.fw * Cin) # shape (batch_size * Hout * Wout, fh * fw * Cin)
        dw = x_flat.T @ dout_descendant_flat # shape (fh * fw * Cin, Cout)
        self.dw = dw.reshape(self.fh, self.fw, Cin, self.Cout)
        del dout_descendant_flat # free memory
        # for dinputs: we'll get padded and dilated windows of dout_descendant and perform the tensordot with 180 rotated W
        # for details, see https://medium.com/@mayank.utexas/backpropagation-for-convolution-with-strides-8137e4fc2710 ; also: https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c ; also: https://youtu.be/Lakz2MoHy6o?t=835
        Wrot180 = np.rot90(self.w, 2, axes=(0,1)) # or also self.w[::-1, ::-1, :, :]
        # backprop for forward with stride > 1 is done on windowed dout that's padded and dilated with stride 1
        dout_descendant = self._dilate(dout_descendant, stride_size=self.stride, pad_size=self.pad_size, symmetric_filter_shape=self.fh, output_image_size=h)
        dout_descendant = self._pad(dout_descendant, pad_size=self.fw-1, pad_val=self.pad_val) # pad dout_descendant to dim: fh-1 (or fw-1); only symmetrical filters are supported
        dout_descendant = self._windows(array=dout_descendant, stride_size=1, filter_shapes=(self.fh, self.fw),
                            out_height=h + 2 * self.pad_size, out_width=w + 2 * self.pad_size) # shape: (batch_size * h_padded * w_padded, fh * fw * Cout)
        self.dout = np.tensordot(dout_descendant, Wrot180, axes=([3,4,5],[0,1,3]))
        self.dout = self.dout[:,self.pad_size:-self.pad_size, self.pad_size:-self.pad_size, :]
        ## einsum alternative, but slower:
        # dinput = np.einsum('nhwfvk,fvck->nhwc', dout_windows, self.W)

)这个答案在这里,因为stackoverflow或github上的所有其他来源我都可以发现用于磨练1(不需要扩张 dz ),或者它们使用的非常复杂奇特的索引操作非常难以遵循(例如 or

OK, turns out the error was to do with several things:

  1. dZ needed to be dilated relative to the stride in the forward propagation
  2. the window function used for windowing dZ (done after dilation of dZ) needed to be called with stride 1 (no matter the stride choice in the forward propagation) with the output heights and widths of the padded input (not the original, unpadded input -- this was the main mistake that took me days to debug)

the relevant code is below with comments explaining shapes and operations as well as some further sources for reading. i've also included the forward propagation.
i should note that after days of debugging, writing various functions, reading etc. the variable names changed after a while, so for the ease of reading, here are the names of the variables as defined in my question and then their equivalent in the code below:

A_prev is x
dZ is dout_descendant
Hout is the height of dout_descendant
Wout is the width of dout_descendant

(as one would expect all references to self are to the class these functions are part of)

    def _pad(self, array, pad_size, pad_val):
        '''
        only symmetric padding is implemented
        ''' 
        return np.pad(array, ((0, 0), (pad_size, pad_size), (pad_size, pad_size), (0, 0)), 'constant', constant_values=(pad_val, pad_val))


    def _dilate(self, array, stride_size, pad_size, symmetric_filter_shape, output_image_size):
        # on dilation for backprop with stride>1, 
        # see: https://medium.com/@mayank.utexas/backpropagation-for-convolution-with-strides-8137e4fc2710
        # see also: https://leimao.github.io/blog/Transposed-Convolution-As-Convolution/
        pad_bottom_and_right = (output_image_size + 2 * pad_size - symmetric_filter_shape) % stride_size
        for m in range(stride_size - 1):
            array = np.insert(array, range(1, array.shape[1], m + 1), 0, axis=1)
            array = np.insert(array, range(1, array.shape[2], m + 1), 0, axis=2)

        for _ in range(pad_bottom_and_right):
            array = np.insert(array, array.shape[1], 0, axis=1)
            array = np.insert(array, array.shape[2], 0, axis=2)
        return array


    def _windows(self, array, stride_size, filter_shapes, out_height, out_width):
        '''
        inputs:
            array to create windows of
            stride_size: int
            filter_shapes: tuple(int): tuple of filter height and width
            out_height and out_width: int, respectively: output sizes for the windows
        returns:
            windows of array with shape (excl. dilation): 
                array.shape[0], out_height, out_width, filter_shapes[0], filter_shapes[1], array.shape[3]
        '''
        strides = (array.strides[0], array.strides[1] * stride_size, array.strides[2] * stride_size, array.strides[1], array.strides[2], array.strides[3])
        return np.lib.stride_tricks.as_strided(array, shape=(array.shape[0], out_height, out_width, filter_shapes[0], filter_shapes[1], array.shape[3]), strides=strides, writeable=False)


    def forward(self, x):
        '''
        expects inputs to be of shape: [batchsize, height, width, channel in]
        after init, filter_shapes are: [fh, fw, channel in, channel out] 
        '''
        self.input_shape = x.shape
        x_pad = self._pad(x, self.pad_size, self.pad_val)
        self.input_pad_shape = x_pad.shape
        # get the shapes
        batch_size, h, w, Cin = self.input_shape
        # calculate output sizes; only symmetric padding is possible
        self.Hout = (h + 2*self.pad_size - self.fh) // self.stride + 1
        self.Wout = (w + 2*self.pad_size - self.fw) // self.stride + 1
        x_windows = self._windows(array=x_pad, stride_size=self.stride, filter_shapes=(self.fh, self.fw),
                        out_width=self.Wout, out_height=self.Hout) # 2D matrix with shape (batch_size, Hout, Wout, fh, fw, Cin)
        self.out = np.tensordot(x_windows, self.w, axes=([3,4,5], [0,1,2])) + self.b
        self.inputs = x_windows
        ## alternative 1: einsum approach, slower than other alternatives
        # self.out = np.einsum('noufvc,fvck->nouk', x_windows, self.w) + self.b
        ## alternative 2: column approach with simple dot product
        # z = x_windows.reshape(-1, self.fh * self.fw * Cin) @ self.W.reshape(self.fh*self.fw*Cin, Cout) + self.b # 2D matrix with shape (batch_size * Hout * Wout, Cout)
        # self.dout = z.reshape(batch_size, Hout, Wout, Cout)

    def backward(self,dout_descendant):
        '''
        dout_descendant has shape (batch_size, Hout, Wout, Cout)
        ''' 
        # get shapes
        batch_size, h, w, Cin = self.input_shape
        # we want to sum everything but the filters for b
        self.db = np.sum(dout_descendant, axis=(0,1,2), keepdims=True) # shape (1,1,1, Cout)
        # for dW we'll use the column approach with ordinary dot product for variety ;) tensordot does the same without all the reshaping
        dout_descendant_flat = dout_descendant.reshape(-1, self.Cout) # new shape (batch_size * Hout * Wout, Cout)
        x_flat = self.inputs.reshape(-1, self.fh * self.fw * Cin) # shape (batch_size * Hout * Wout, fh * fw * Cin)
        dw = x_flat.T @ dout_descendant_flat # shape (fh * fw * Cin, Cout)
        self.dw = dw.reshape(self.fh, self.fw, Cin, self.Cout)
        del dout_descendant_flat # free memory
        # for dinputs: we'll get padded and dilated windows of dout_descendant and perform the tensordot with 180 rotated W
        # for details, see https://medium.com/@mayank.utexas/backpropagation-for-convolution-with-strides-8137e4fc2710 ; also: https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c ; also: https://youtu.be/Lakz2MoHy6o?t=835
        Wrot180 = np.rot90(self.w, 2, axes=(0,1)) # or also self.w[::-1, ::-1, :, :]
        # backprop for forward with stride > 1 is done on windowed dout that's padded and dilated with stride 1
        dout_descendant = self._dilate(dout_descendant, stride_size=self.stride, pad_size=self.pad_size, symmetric_filter_shape=self.fh, output_image_size=h)
        dout_descendant = self._pad(dout_descendant, pad_size=self.fw-1, pad_val=self.pad_val) # pad dout_descendant to dim: fh-1 (or fw-1); only symmetrical filters are supported
        dout_descendant = self._windows(array=dout_descendant, stride_size=1, filter_shapes=(self.fh, self.fw),
                            out_height=h + 2 * self.pad_size, out_width=w + 2 * self.pad_size) # shape: (batch_size * h_padded * w_padded, fh * fw * Cout)
        self.dout = np.tensordot(dout_descendant, Wrot180, axes=([3,4,5],[0,1,3]))
        self.dout = self.dout[:,self.pad_size:-self.pad_size, self.pad_size:-self.pad_size, :]
        ## einsum alternative, but slower:
        # dinput = np.einsum('nhwfvk,fvck->nhwc', dout_windows, self.W)

i've left this answer here, because all the other sources on stackoverflow or github i could find that used numpy stride tricks were implemented for convolutions of stride 1 (which doesn't require dilation of dZ) or they used very complex fancy indexing operations that were extremely hard to follow (e.g. https://sgugger.github.io/convolution-in-depth.html#convolution-in-depth or https://github.com/parasdahal/deepnet/blob/51a9e61c351138b7dc637f4b748a0e6ca2e15595/deepnet/im2col.py)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文