两个具有相同架构的模型，但一个具有算术错误

发布于 2025-01-21 13:50:54 字数 1920 浏览 2 评论 0原文

我正在尝试将NNCLR的RESNET18主链加载到线性分类器上。问题在于我得到了一个矩阵乘法错误。在下面的代码中，模型是指整个NNCLR模型。这大部分基于轻度文档。

pt_backbone = model.backbone

state_dict = {'resnet18_parameters': pt_backbone.state_dict()}
torch.save(state_dict, 'test_nnclr.h5')

resnet18_new = torchvision.models.resnet18()

backbone_new = nn.Sequential(*list(resnet18_new.children())[:-1])

ckpt = torch.load('test_nnclr.h5')
backbone_new.load_state_dict(ckpt['resnet18_parameters'])
backbone_new.add_module('fc', nn.Linear(512, 2, device=cuda0))
backbone_new(torch.tensor(np.random.uniform(-10, 10, (8, 3, 128, 128)), device=cuda0).float())

我收到以下运行时错误：

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-51-ac885f772507> in <module>()
      1 backbone_new.add_module('fc', nn.Linear(512, 2, device=cuda0))
----> 2 backbone_new(torch.tensor(np.random.uniform(-10, 10, (8, 3, 128, 128)), device=cuda0).float())

4 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
   1846     if has_torch_function_variadic(input, weight, bias):
   1847         return handle_torch_function(linear, (input, weight, bias), input, weight, bias=bias)
-> 1848     return torch._C._nn.linear(input, weight, bias)
   1849 
   1850 

RuntimeError: mat1 and mat2 shapes cannot be multiplied (4096x1 and 512x2)

我知道4096来自512 x 8，其中8是批处理大小，512是线性层之前的最后一个维度输出。但是我很困惑，因为我看不到如何在新线性层中考虑到这样的批处理大小。由于以下结果，我特别感到困惑：

resnet18_new(torch.tensor(np.random.uniform(-10, 10, (8, 3, 128, 128)), device=cuda0).float()).shape

它似乎可以完美地工作，从而在torch中。但是这两个模型具有相同的体系结构，因此我不明白一个模型有错误，而另一个则没有。这两个模型之间的区别在于Backbone_new（实际上不是骨干BTW）具有不同的权重。如何解决此错误？

原文

I am attempting to load weights from the Resnet18 backbone of an NNCLR onto a linear classifier. The issue is that I get a matrix multiplication error. In the code below, model refers to the entire NNCLR model. Much of this is based off of the Lightly documentation.

pt_backbone = model.backbone

state_dict = {'resnet18_parameters': pt_backbone.state_dict()}
torch.save(state_dict, 'test_nnclr.h5')

resnet18_new = torchvision.models.resnet18()

backbone_new = nn.Sequential(*list(resnet18_new.children())[:-1])

ckpt = torch.load('test_nnclr.h5')
backbone_new.load_state_dict(ckpt['resnet18_parameters'])
backbone_new.add_module('fc', nn.Linear(512, 2, device=cuda0))
backbone_new(torch.tensor(np.random.uniform(-10, 10, (8, 3, 128, 128)), device=cuda0).float())

I get the following runtime error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-51-ac885f772507> in <module>()
      1 backbone_new.add_module('fc', nn.Linear(512, 2, device=cuda0))
----> 2 backbone_new(torch.tensor(np.random.uniform(-10, 10, (8, 3, 128, 128)), device=cuda0).float())

4 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
   1846     if has_torch_function_variadic(input, weight, bias):
   1847         return handle_torch_function(linear, (input, weight, bias), input, weight, bias=bias)
-> 1848     return torch._C._nn.linear(input, weight, bias)
   1849 
   1850 

RuntimeError: mat1 and mat2 shapes cannot be multiplied (4096x1 and 512x2)

I understand that the 4096 comes from 512 x 8, where 8 is the batch size and 512 is the last dimension output before the linear layer. But I'm confused, because I don't see how I could account for the batch size like that in the new linear layer. I'm especially confused because of the following result:

resnet18_new(torch.tensor(np.random.uniform(-10, 10, (8, 3, 128, 128)), device=cuda0).float()).shape

Which seems to work perfectly, resulting in torch.Size([8, 2]). But the two models have the same architecture, so I don't understand how one has an error and the other doesn't. The difference between the two models is that backbone_new (which isn't actually a backbone btw) has different weights. How do I fix this error?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

望喜 2025-01-28 13:50:54

考虑代码<代码>类Resnet 的代码>。它调用self.relu或self.layer2之类的属性，但在完全连接之前，它呼叫torch.flatten（x，1） 。要了解为什么重要的是要查看针对resnet18_new的前方生成的代码：

from torch.fx import symbolic_trace

symbolic_traced = symbolic_trace(resnet18_test)

print(symbolic_traced.code)

stdout中的最后几行类似于：

avgpool = self.avgpool(layer4_1_relu_1);  layer4_1_relu_1 = None
flatten = torch.flatten(avgpool, 1);  avgpool = None
fc = self.fc(flatten);  flatten = None
return fc

但是backbone_new的类似过程返回不同的结果，之后：

symbolic_traced_new = symbolic_trace(backbone_new)

print(symbolic_traced_new.code)

stdout中的最后一行看起来像（请记住它是生成的，所以它不是很漂亮）：

_8 = getattr(self, "8")(_7_1_relu_1);  _7_1_relu_1 = None
fc = self.fc(_8);  _8 = None
return fc

where getAttr（self，“ 8”） 对应于adaptiveavgpool2d（output_size =（1，1，1））。当然会崩溃 - 扁平被扔掉了！这一切都是因为用法torch.flatten在原始resnet._forward_impl中。因此，要修复，在添加新模块时，请将其放在nn.flatten上（行为类似于torch.flatten（x，1））：

backbone_new = nn.Sequential(*list(resnet18_new.children())[:-1])
ckpt = torch.load('test_nnclr.h5')
backbone_new.load_state_dict(ckpt['resnet18_parameters'])
backbone_new.add_module('fc', nn.Sequential(nn.Flatten(1), nn.Linear(512, 2, device='cuda:0')))

Consider code for method forward of class ResNet. It calls attributes like self.relu or self.layer2, but right before fully-connected it calls for torch.flatten(x, 1). To understand why it's important have a look at code generated for forward of resnet18_new:

from torch.fx import symbolic_trace

symbolic_traced = symbolic_trace(resnet18_test)

print(symbolic_traced.code)

Last few lines in stdout is something like:

avgpool = self.avgpool(layer4_1_relu_1);  layer4_1_relu_1 = None
flatten = torch.flatten(avgpool, 1);  avgpool = None
fc = self.fc(flatten);  flatten = None
return fc

But similar procedure for backbone_new returns different result, after:

symbolic_traced_new = symbolic_trace(backbone_new)

print(symbolic_traced_new.code)

last lines in stdout look like (remember it's generated, so no surprise it's not pretty):

_8 = getattr(self, "8")(_7_1_relu_1);  _7_1_relu_1 = None
fc = self.fc(_8);  _8 = None
return fc

where getattr(self, "8") corresponds to AdaptiveAvgPool2d(output_size=(1, 1)). Of course it'll crash -- flatten was thrown away! And it's all because of usage torch.flatten in original ResNet._forward_impl. So to fix, when adding a new module, put there nn.Flatten also (with behavior similar to torch.flatten(x, 1)):

backbone_new = nn.Sequential(*list(resnet18_new.children())[:-1])
ckpt = torch.load('test_nnclr.h5')
backbone_new.load_state_dict(ckpt['resnet18_parameters'])
backbone_new.add_module('fc', nn.Sequential(nn.Flatten(1), nn.Linear(512, 2, device='cuda:0')))

回复收藏 0 原文

~没有更多了~