两个具有相同架构的模型,但一个具有算术错误
我正在尝试将NNCLR的RESNET18主链加载到线性分类器上。问题在于我得到了一个矩阵乘法错误。在下面的代码中,模型
是指整个NNCLR模型。这大部分基于轻度文档。
pt_backbone = model.backbone
state_dict = {'resnet18_parameters': pt_backbone.state_dict()}
torch.save(state_dict, 'test_nnclr.h5')
resnet18_new = torchvision.models.resnet18()
backbone_new = nn.Sequential(*list(resnet18_new.children())[:-1])
ckpt = torch.load('test_nnclr.h5')
backbone_new.load_state_dict(ckpt['resnet18_parameters'])
backbone_new.add_module('fc', nn.Linear(512, 2, device=cuda0))
backbone_new(torch.tensor(np.random.uniform(-10, 10, (8, 3, 128, 128)), device=cuda0).float())
我收到以下运行时错误:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-51-ac885f772507> in <module>()
1 backbone_new.add_module('fc', nn.Linear(512, 2, device=cuda0))
----> 2 backbone_new(torch.tensor(np.random.uniform(-10, 10, (8, 3, 128, 128)), device=cuda0).float())
4 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
1846 if has_torch_function_variadic(input, weight, bias):
1847 return handle_torch_function(linear, (input, weight, bias), input, weight, bias=bias)
-> 1848 return torch._C._nn.linear(input, weight, bias)
1849
1850
RuntimeError: mat1 and mat2 shapes cannot be multiplied (4096x1 and 512x2)
我知道4096
来自512 x 8
,其中8
是批处理大小,512
是线性层之前的最后一个维度输出。但是我很困惑,因为我看不到如何在新线性层中考虑到这样的批处理大小。由于以下结果,我特别感到困惑:
resnet18_new(torch.tensor(np.random.uniform(-10, 10, (8, 3, 128, 128)), device=cuda0).float()).shape
它似乎可以完美地工作,从而在torch中。但是这两个模型具有相同的体系结构,因此我不明白一个模型有错误,而另一个则没有。这两个模型之间的区别在于
Backbone_new
(实际上不是骨干BTW)具有不同的权重。如何解决此错误?
I am attempting to load weights from the Resnet18 backbone of an NNCLR onto a linear classifier. The issue is that I get a matrix multiplication error. In the code below, model
refers to the entire NNCLR model. Much of this is based off of the Lightly documentation.
pt_backbone = model.backbone
state_dict = {'resnet18_parameters': pt_backbone.state_dict()}
torch.save(state_dict, 'test_nnclr.h5')
resnet18_new = torchvision.models.resnet18()
backbone_new = nn.Sequential(*list(resnet18_new.children())[:-1])
ckpt = torch.load('test_nnclr.h5')
backbone_new.load_state_dict(ckpt['resnet18_parameters'])
backbone_new.add_module('fc', nn.Linear(512, 2, device=cuda0))
backbone_new(torch.tensor(np.random.uniform(-10, 10, (8, 3, 128, 128)), device=cuda0).float())
I get the following runtime error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-51-ac885f772507> in <module>()
1 backbone_new.add_module('fc', nn.Linear(512, 2, device=cuda0))
----> 2 backbone_new(torch.tensor(np.random.uniform(-10, 10, (8, 3, 128, 128)), device=cuda0).float())
4 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
1846 if has_torch_function_variadic(input, weight, bias):
1847 return handle_torch_function(linear, (input, weight, bias), input, weight, bias=bias)
-> 1848 return torch._C._nn.linear(input, weight, bias)
1849
1850
RuntimeError: mat1 and mat2 shapes cannot be multiplied (4096x1 and 512x2)
I understand that the 4096
comes from 512 x 8
, where 8
is the batch size and 512
is the last dimension output before the linear layer. But I'm confused, because I don't see how I could account for the batch size like that in the new linear layer. I'm especially confused because of the following result:
resnet18_new(torch.tensor(np.random.uniform(-10, 10, (8, 3, 128, 128)), device=cuda0).float()).shape
Which seems to work perfectly, resulting in torch.Size([8, 2])
. But the two models have the same architecture, so I don't understand how one has an error and the other doesn't. The difference between the two models is that backbone_new
(which isn't actually a backbone btw) has different weights. How do I fix this error?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
考虑代码<代码>类Resnet 的代码>。它调用
self.relu
或self.layer2
之类的属性,但在完全连接之前,它呼叫torch.flatten(x,1)
。要了解为什么重要的是要查看针对resnet18_new
的前方生成的代码:stdout中的最后几行类似于:
但是
backbone_new
的类似过程返回不同的结果,之后:stdout中的最后一行看起来像(请记住它是生成的,所以它不是很漂亮):
where
getAttr(self,“ 8”)
对应于adaptiveavgpool2d(output_size =(1,1,1))
。当然会崩溃 -扁平
被扔掉了!这一切都是因为用法torch.flatten
在原始resnet._forward_impl
中。因此,要修复,在添加新模块时,请将其放在nn.flatten
上(行为类似于torch.flatten(x,1)
):Consider code for method
forward
ofclass ResNet
. It calls attributes likeself.relu
orself.layer2
, but right before fully-connected it calls fortorch.flatten(x, 1)
. To understand why it's important have a look at code generated for forward ofresnet18_new
:Last few lines in stdout is something like:
But similar procedure for
backbone_new
returns different result, after:last lines in stdout look like (remember it's generated, so no surprise it's not pretty):
where
getattr(self, "8")
corresponds toAdaptiveAvgPool2d(output_size=(1, 1))
. Of course it'll crash --flatten
was thrown away! And it's all because of usagetorch.flatten
in originalResNet._forward_impl
. So to fix, when adding a new module, put therenn.Flatten
also (with behavior similar totorch.flatten(x, 1)
):