当大步大于内核时,卷积会发生什么?
我最近在Pytorch中进行了卷积和转移卷积的实验。我注意到nn.convtranspose2d
api(我尚未尝试使用普通卷积API),您可以指定比内核大小大的步幅,并且卷积仍然可以工作。
在这种情况下发生了什么?我很困惑,因为如果步幅大于内核,则意味着输入图像中的某些像素将不会被卷曲。那么他们会发生什么呢?
我有以下片段,我在其中手动设置nn.convtranspose2d
layer layer layer的权重:
IN = 1
OUT = 1
KERNEL_SIZE = 2
proof_conv = nn.ConvTranspose2d(IN, OUT, kernel_size=KERNEL_SIZE, stride=4)
assert proof_conv.weight.shape == (IN, OUT, KERNEL_SIZE, KERNEL_SIZE)
FILTER = [
[1., 2.],
[0., 1.]
]
weights = [
[FILTER]
]
weights_as_tensor = torch.from_numpy(np.asarray(weights)).float()
assert weights_as_tensor.shape == proof_conv.weight.shape
proof_conv.weight = nn.Parameter(weights_as_tensor)
img = [[
[1., 2.],
[3., 4.]
]]
img_as_tensor = torch.from_numpy(np.asarray(img)).float()
out_img = proof_conv(img_as_tensor)
assert out_img.shape == (OUT, 6, 6)
步幅大于kernel_size
2的。我们的输出为6x6。引擎盖下发生了什么?
这篇文章:了解pytorch conv2dtranspose的实现何时步幅大于内核的边缘案例。
I recently was experiment with convolutions and transposed convolutions in Pytorch. I noticed with the nn.ConvTranspose2d
API (I haven't tried with the normal convolution API yet), you can specify a stride that is larger than the kernel size and the convolution will still work.
What is happening in this case? I'm confused because if the stride is larger than the kernel, that means some pixels in the input image will not be convolved. So what happens to them?
I have the following snippet where I manually set the weights for a nn.ConvTranspose2d
layer:
IN = 1
OUT = 1
KERNEL_SIZE = 2
proof_conv = nn.ConvTranspose2d(IN, OUT, kernel_size=KERNEL_SIZE, stride=4)
assert proof_conv.weight.shape == (IN, OUT, KERNEL_SIZE, KERNEL_SIZE)
FILTER = [
[1., 2.],
[0., 1.]
]
weights = [
[FILTER]
]
weights_as_tensor = torch.from_numpy(np.asarray(weights)).float()
assert weights_as_tensor.shape == proof_conv.weight.shape
proof_conv.weight = nn.Parameter(weights_as_tensor)
img = [[
[1., 2.],
[3., 4.]
]]
img_as_tensor = torch.from_numpy(np.asarray(img)).float()
out_img = proof_conv(img_as_tensor)
assert out_img.shape == (OUT, 6, 6)
The stride is larger than the KERNEL_SIZE
of 2. Yet, the transposed convolution still occurs and we get an output of 6x6. What is happening underneath the hood?
This post: Understanding the PyTorch implementation of Conv2DTranspose is helpful but does not answer the edge-case of when the stride is greater than the kernel.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
正如您已经猜到的 - 当步幅大于内核大小时,有一些输入像素不参与卷积操作。
这取决于您 - 架构的设计师决定该属性是错误还是功能。在某些情况下,我利用此属性忽略了部分输入。
更新:
我认为您对
PROCED_CONV
中的偏差术语感到困惑。尝试消除它:现在您将获得
out_img
be:代表4个内核的副本,由输入映像加权,根据
stride = 4 。
其余的输出图像充满了零 - 代表不促进转置卷积的像素。
遵循与常规的“逻辑”
Conv
,仅以“转移”的方式。如果您查看计算输出形状的公式,您会发现您获得的行为是一致的。As you already guessed - when the stride is larger than the kernel size, there are input pixels that do not participate in the convolution operation.
It's up to you - the designer of the architecture to decide whether this property is a bug or a feature. In some cases, I took advantage of this property to ignore portions of the inputs.
Update:
I think you are being confused by the bias term in
proof_conv
. Try to eliminate it:Now you'll get
out_img
to be:Which represent 4 copies of the kernel, weighted by the input image, spaced 4 pixels apart according to
stride=4
.The rest of the output image is filled with zeros - representing pixels that do not contribute to the transposed convolution.
ConvTranspose
follows the same "logic" as the regularconv
, only in a "transposed" fashion. If you look at the formula for computing output shape you'll see that the behavior you get is consistent.我的理解是
condtranspose2d
将始终使用输入映像中的所有像素,无论stride> stride
和kernel_size
>。这与conv2d
不同。如您所见,通过查看OUT_IMG
中的实际值(如 @shai的答案所示),每个值都用于生成图像拐角处的四组2x2值。步幅
convtranspose2d
相反,会影响输出图像大小和间距。您可以看到,因为在这种情况下,stride = 4
,2x2输入和2x2内核的4 2x2结果间隔为4个单元。中间空间充满了零,因为如果stride
>kernel_size
。这本质上是
conv2d
在stride
&gt中未使用的一些输入单元格的推论。kernel_size
。我认为也许这就是您想解决的问题。My understanding is that
ConvTranspose2D
will always use all of the pixels in the input image, regardless of thestride
andkernel_size
. This is different than forConv2D
. As you can see by looking at the actual values inout_img
(as seen in @Shai's answer), each value is used to generate the four sets of 2x2 values at the corners of the image.stride
inConvTranspose2D
instead affects the output image size and spacing. You can see that becausestride=4
in this case, the 4 2x2 results of the 2x2 inputs and 2x2 kernel are spaced 4 units apart. The intervening spaces are filled with zeros, as some of the output pixels will have no input ifstride
>kernel_size
.This is essentially the corollary of some input cells not being used in
Conv2D
ifstride
>kernel_size
. I think maybe this is what you were trying to get at with your question.