关于卷积的问题(CNN中)
我突然想到一个关于卷积的问题,只是想弄清楚我是否遗漏了一些东西。问题是下面的两个操作是否相同。
案例1) 假设我们有一个特征图 C^2 x H x W。并且,我们有一个 K x K x C^2 Conv 权重,步长为 S。 (需要明确的是,C^2 是通道维度,但只是想将其设为平方数,K 是内核大小)。
案例2) 假设我们有一个特征图 1 x CH x CW。而且,我们有一个CK x CK x 1 Conv权重和步幅CS。
因此,基本上 Case2 是 case1 的像素升级版本(特征图和卷积权重)。由于卷积只是逐元素乘法,因此这两种操作对我来说似乎相同。
# given a feature map and a conv_weight, namely f_map, conv_weight
#case1)
convLayer = Conv(conv_weight)
result = convLayer(f_map, stride=1)
#case2)
f_map = pixelshuffle(f_map, scale=C)
conv_weight = pixelshuffle(f_map, scale=C)
result = convLayer(f_map, stride=C)
但这意味着,(例如)给定一个带有 3x3 Conv 的 256xHxW 特征图(如许多深度学习模型中那样),执行卷积只是简单地执行 HUUUGE 48x48 转换为1 x 16*H x 16*W 特征图。
但这不符合我对 CNN 的基本直觉,用最小的 3x3 Conv 堆叠多个层,导致感受野有些大,并且每个通道都有不同的(可能是冗余的)信息。
I suddenly came up with a question about convolution and just wanted to be clear if I'm missing something. The question is whether if the two operations below are identical.
Case1)
Suppose we have a feature map C^2 x H x W. And, we have a K x K x C^2 Conv weight with stride S. (To be clear, C^2 is the channel dimension but just wanted to make it as a square number, K is the kernel size).
Case2)
Suppose we have a feature map 1 x CH x CW. And, we have a CK x CK x 1 Conv weight with stride CS.
So, basically Case2 is a pixel-upshuffled version of case1 (both feature-map and Conv weight.) As convolutions are simply element-wise multiplication, both operations seem identical to me.
# given a feature map and a conv_weight, namely f_map, conv_weight
#case1)
convLayer = Conv(conv_weight)
result = convLayer(f_map, stride=1)
#case2)
f_map = pixelshuffle(f_map, scale=C)
conv_weight = pixelshuffle(f_map, scale=C)
result = convLayer(f_map, stride=C)
But this means that, (for example) given a 256xHxW feature-map with a 3x3 Conv (as in many deep learning models), performing a convolution was simply doing a HUUUGE 48x48 Conv to a 1 x 16*H x 16*W Feature map.
But this doesn't meet my basic intuition of CNNs, stacking multiple of layers with the smallest 3x3 Conv, resulting in somewhat large receptive field, and each channel having different (possibly redundant) information.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
从某种意义上说,您可以将空间信息“折叠”到通道维度中。这就是 ResNet 在空间分辨率和特征维度之间进行权衡的基本原理。在 ResNet 的情况下,每当他们在空间中采样 x2 时,就会增加特征空间 x2。然而,由于您有两个空间维度,并且您在两个空间维度中采样 x2,因此您有效地将特征图的“体积”减少了 x0.5。
You can, in a sense, think of "folding" spatial information into the channel dimension. This is the rationale behind ResNet's trade-off between spatial resolution and feature dimension. In the ResNet case whenever they sample x2 in space they increase feature space x2. However, since you have two spatial dimensions and you sample x2 in both you effectively reduce the "volume" of the feature map by x0.5.