当大步大于内核时，卷积会发生什么？

发布于 2025-02-12 08:16:59 字数 1166 浏览 0 评论 0原文

我最近在Pytorch中进行了卷积和转移卷积的实验。我注意到nn.convtranspose2d api（我尚未尝试使用普通卷积API），您可以指定比内核大小大的步幅，并且卷积仍然可以工作。

在这种情况下发生了什么？我很困惑，因为如果步幅大于内核，则意味着输入图像中的某些像素将不会被卷曲。那么他们会发生什么呢？

我有以下片段，我在其中手动设置nn.convtranspose2d layer layer layer的权重：

IN = 1
OUT = 1
KERNEL_SIZE = 2
proof_conv = nn.ConvTranspose2d(IN, OUT, kernel_size=KERNEL_SIZE, stride=4)
assert proof_conv.weight.shape == (IN, OUT, KERNEL_SIZE, KERNEL_SIZE)

FILTER = [
    [1., 2.],
    [0., 1.]
]
weights = [
    [FILTER]
]

weights_as_tensor = torch.from_numpy(np.asarray(weights)).float()
assert weights_as_tensor.shape == proof_conv.weight.shape
proof_conv.weight = nn.Parameter(weights_as_tensor)

img = [[
  [1., 2.],
  [3., 4.]
]]
img_as_tensor = torch.from_numpy(np.asarray(img)).float()
out_img = proof_conv(img_as_tensor)
assert out_img.shape == (OUT, 6, 6)

步幅大于kernel_size 2的。我们的输出为6x6。引擎盖下发生了什么？

这篇文章：了解pytorch conv2dtranspose的实现何时步幅大于内核的边缘案例。

原文

I recently was experiment with convolutions and transposed convolutions in Pytorch. I noticed with the nn.ConvTranspose2d API (I haven't tried with the normal convolution API yet), you can specify a stride that is larger than the kernel size and the convolution will still work.

What is happening in this case? I'm confused because if the stride is larger than the kernel, that means some pixels in the input image will not be convolved. So what happens to them?

I have the following snippet where I manually set the weights for a nn.ConvTranspose2d layer:

IN = 1
OUT = 1
KERNEL_SIZE = 2
proof_conv = nn.ConvTranspose2d(IN, OUT, kernel_size=KERNEL_SIZE, stride=4)
assert proof_conv.weight.shape == (IN, OUT, KERNEL_SIZE, KERNEL_SIZE)

FILTER = [
    [1., 2.],
    [0., 1.]
]
weights = [
    [FILTER]
]

weights_as_tensor = torch.from_numpy(np.asarray(weights)).float()
assert weights_as_tensor.shape == proof_conv.weight.shape
proof_conv.weight = nn.Parameter(weights_as_tensor)

img = [[
  [1., 2.],
  [3., 4.]
]]
img_as_tensor = torch.from_numpy(np.asarray(img)).float()
out_img = proof_conv(img_as_tensor)
assert out_img.shape == (OUT, 6, 6)

The stride is larger than the KERNEL_SIZE of 2. Yet, the transposed convolution still occurs and we get an output of 6x6. What is happening underneath the hood?

This post: Understanding the PyTorch implementation of Conv2DTranspose is helpful but does not answer the edge-case of when the stride is greater than the kernel.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夜雨飘雪 2025-02-19 08:16:59

正如您已经猜到的 - 当步幅大于内核大小时，有一些输入像素不参与卷积操作。
这取决于您 - 架构的设计师决定该属性是错误还是功能。在某些情况下，我利用此属性忽略了部分输入。

更新：
我认为您对PROCED_CONV中的偏差术语感到困惑。尝试消除它：

proof_conv = nn.ConvTranspose2d(IN, OUT, kernel_size=KERNEL_SIZE, stride=4, bias=False)

现在您将获得out_img be：

[[[[1., 2., 0., 0., 2., 4.],
          [0., 1., 0., 0., 0., 2.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [3., 6., 0., 0., 4., 8.],
          [0., 3., 0., 0., 0., 4.]]]]

代表4个内核的副本，由输入映像加权，根据stride = 4 。其余的输出图像充满了零 - 代表不促进转置卷积的像素。

遵循与常规的“逻辑” Conv，仅以“转移”的方式。如果您查看计算输出形状的公式，您会发现您获得的行为是一致的。

As you already guessed - when the stride is larger than the kernel size, there are input pixels that do not participate in the convolution operation.
It's up to you - the designer of the architecture to decide whether this property is a bug or a feature. In some cases, I took advantage of this property to ignore portions of the inputs.

Update:
I think you are being confused by the bias term in proof_conv. Try to eliminate it:

proof_conv = nn.ConvTranspose2d(IN, OUT, kernel_size=KERNEL_SIZE, stride=4, bias=False)

Now you'll get out_img to be:

[[[[1., 2., 0., 0., 2., 4.],
          [0., 1., 0., 0., 0., 2.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [3., 6., 0., 0., 4., 8.],
          [0., 3., 0., 0., 0., 4.]]]]

Which represent 4 copies of the kernel, weighted by the input image, spaced 4 pixels apart according to stride=4.
The rest of the output image is filled with zeros - representing pixels that do not contribute to the transposed convolution.

ConvTranspose follows the same "logic" as the regular conv, only in a "transposed" fashion. If you look at the formula for computing output shape you'll see that the behavior you get is consistent.

回复收藏 0 原文

站稳脚跟 2025-02-19 08:16:59

我的理解是condtranspose2d将始终使用输入映像中的所有像素，无论stride> stride和kernel_size >。这与conv2d不同。如您所见，通过查看OUT_IMG中的实际值（如 @shai的答案所示），每个值都用于生成图像拐角处的四组2x2值。 步幅 convtranspose2d相反，会影响输出图像大小和间距。您可以看到，因为在这种情况下，stride = 4，2x2输入和2x2内核的4 2x2结果间隔为4个单元。中间空间充满了零，因为如果stride＆gt; kernel_size。

这本质上是conv2d在stride＆gt中未使用的一些输入单元格的推论。 kernel_size。我认为也许这就是您想解决的问题。