介绍与对比损失的暹罗网络训练缺少参数更新
我尝试实现一个相当简单的暹罗网络和对比损失功能。我使用预先训练的VGG16作为骨干模型,然后从编码器中剥离最后一个relu和maxpool。然后,我添加一个自适应池和一个普通的线性层来生成嵌入向量。
要测试我的实现,我将传递随机输入,并检查每个参数是否获取更新。
问题:正如我的MWE输出中可以看到的那样,参数列表的元素25和27无法接收更新。我认为这些是最后一个卷积层和线性层的偏见。我还检查了 Optimizer.param_groups [0] [“ params”] [25] .grad
和 Optimizer.param_groups [0] [“ params”] [27] .grad <
附加:如果一个输入大于224 x 224,例如 input_1 = torch.randn(4、3、400、224)
,最后卷积的偏见被更新了。
MWE使用Pytorch 1.11.0:
import torch
import torchvision.models as models
import torch.nn.functional as F
class Siamese_VGG16(torch.nn.Module):
def __init__(self, num_elements_embedding_vector: int) -> None:
super().__init__()
encoder = models.vgg16(pretrained=True)
layers = list(encoder.features.children())[:-2]
encoder = torch.nn.Sequential(*layers)
self.model = torch.nn.Module()
self.model.add_module("encoder", encoder)
global_pool = torch.nn.AdaptiveAvgPool2d((7, 7))
self.model.add_module("pool", global_pool)
embedded_vector = torch.nn.Sequential(
torch.nn.Linear(25088, num_elements_embedding_vector),
)
self.model.add_module("embedding", embedded_vector)
def forward_once(self, x: torch.Tensor) -> torch.Tensor:
encoding = self.model.encoder(x)
pool = self.model.pool(encoding)
pool = pool.reshape(pool.shape[0], -1)
return self.model.embedding(pool)
def forward(self, input1: torch.Tensor, input2: torch.Tensor):
output1 = self.forward_once(input1)
output2 = self.forward_once(input2)
return output1, output2
def contrastive_loss(embedding_vec_1, embedding_vec_2, label):
negative_margin = 1.0
euclidean_distance = F.pairwise_distance(
embedding_vec_1, embedding_vec_2, keepdim=True
)
loss_contrastive = torch.mean(
(1 - label).unsqueeze(1) * torch.pow(euclidean_distance, 2)
+ (label).unsqueeze(1)
* torch.pow(torch.clamp(negative_margin - euclidean_distance, min=0.0), 2)
)
return loss_contrastive
model = Siamese_VGG16(128)
optimizer = torch.optim.Adam(
params=model.parameters(),
lr=0.0005,
)
loss_func = contrastive_loss
parameters_pre = [t.detach().clone() for t in optimizer.param_groups[0]["params"]]
input_1 = torch.randn(4, 3, 224, 224)
input_2 = torch.randn(4, 3, 224, 224)
label = torch.tensor([1, 0, 1, 0], dtype=torch.long)
# forward pass
output_1, output_2 = model(input_1, input_2)
loss = loss_func(output_1, output_2, label)
# clear gradients
optimizer.zero_grad()
# backward pass
loss.backward()
# update parameters
optimizer.step()
parameters_post = [t.detach().clone() for t in optimizer.param_groups[0]["params"]]
idx = 0
for t_pre, t_post in zip(parameters_pre, parameters_post):
if torch.equal(t_pre, t_post):
print(f"{idx} : Equal")
else:
print(f"{idx} : Different")
idx += 1
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
从本质上讲,在对比度损失中,计算两个嵌入式向量之间的差异。在反向传播期间,两个输入的梯度都会累积。由于损失的上述差异,累积梯度评估为零。在最后一个线性层之后的某些激活函数中,我们可以省略此行为。
Essentially, in the contrastive loss, a difference between the two embedded vectors is calculated. During the backpropagation, the gradients for both inputs get accumulated. Because of the aforementioned difference in the loss, the accumulated gradient evaluates to zero. With some activation function after the last linear layer, we could omit this behavior.