介绍与对比损失的暹罗网络训练缺少参数更新

发布于 2025-01-24 11:44:57 字数 3087 浏览 0 评论 0 原文

我尝试实现一个相当简单的暹罗网络和对比损失功能。我使用预先训练的VGG16作为骨干模型，然后从编码器中剥离最后一个relu和maxpool。然后，我添加一个自适应池和一个普通的线性层来生成嵌入向量。

要测试我的实现，我将传递随机输入，并检查每个参数是否获取更新。

问题：正如我的MWE输出中可以看到的那样，参数列表的元素25和27无法接收更新。我认为这些是最后一个卷积层和线性层的偏见。我还检查了 Optimizer.param_groups [0] [“ params”] [25] .grad 和 Optimizer.param_groups [0] [“ params”] [27] .grad <。梯度全部为零... 为什么是？

附加：如果一个输入大于224 x 224，例如 input_1 = torch.randn（4、3、400、224） ，最后卷积的偏见被更新了。

MWE使用Pytorch 1.11.0：

import torch
import torchvision.models as models
import torch.nn.functional as F


class Siamese_VGG16(torch.nn.Module):
    def __init__(self, num_elements_embedding_vector: int) -> None:
        super().__init__()

        encoder = models.vgg16(pretrained=True)

        layers = list(encoder.features.children())[:-2]

        encoder = torch.nn.Sequential(*layers)

        self.model = torch.nn.Module()
        self.model.add_module("encoder", encoder)

        global_pool = torch.nn.AdaptiveAvgPool2d((7, 7))
        self.model.add_module("pool", global_pool)

        embedded_vector = torch.nn.Sequential(
            torch.nn.Linear(25088, num_elements_embedding_vector),
        )

        self.model.add_module("embedding", embedded_vector)

    def forward_once(self, x: torch.Tensor) -> torch.Tensor:
        encoding = self.model.encoder(x)
        pool = self.model.pool(encoding)
        pool = pool.reshape(pool.shape[0], -1)

        return self.model.embedding(pool)

    def forward(self, input1: torch.Tensor, input2: torch.Tensor):

        output1 = self.forward_once(input1)
        output2 = self.forward_once(input2)

        return output1, output2


def contrastive_loss(embedding_vec_1, embedding_vec_2, label):
    negative_margin = 1.0

    euclidean_distance = F.pairwise_distance(
        embedding_vec_1, embedding_vec_2, keepdim=True
    )

    loss_contrastive = torch.mean(
        (1 - label).unsqueeze(1) * torch.pow(euclidean_distance, 2)
        + (label).unsqueeze(1)
        * torch.pow(torch.clamp(negative_margin - euclidean_distance, min=0.0), 2)
    )

    return loss_contrastive


model = Siamese_VGG16(128)

optimizer = torch.optim.Adam(
    params=model.parameters(),
    lr=0.0005,
)

loss_func = contrastive_loss

parameters_pre = [t.detach().clone() for t in optimizer.param_groups[0]["params"]]

input_1 = torch.randn(4, 3, 224, 224)
input_2 = torch.randn(4, 3, 224, 224)
label = torch.tensor([1, 0, 1, 0], dtype=torch.long)

# forward pass
output_1, output_2 = model(input_1, input_2)
loss = loss_func(output_1, output_2, label)

# clear gradients
optimizer.zero_grad()
# backward pass
loss.backward()
# update parameters
optimizer.step()

parameters_post = [t.detach().clone() for t in optimizer.param_groups[0]["params"]]

idx = 0

for t_pre, t_post in zip(parameters_pre, parameters_post):
    if torch.equal(t_pre, t_post):
        print(f"{idx} : Equal")
    else:
        print(f"{idx} : Different")

    idx += 1

原文

I try to implement a rather simple siamese network and a contrastive loss function. I use a pre-trained VGG16 as a backbone model and strip away the last ReLU and MaxPooling from the encoder. Then I add an adaptive pooling and a plain linear layer to generate the embedding vector.

To test my implementation, I pass random inputs and check if every parameter gets an update.

Problem: As one can see in the output of my MWE, the elements 25 and 27 of the parameter list don’t receive updates. I think these are the biases of the last convolution layer and the linear layer. I also checked the content of optimizer.param_groups[0][“params”][25].grad and optimizer.param_groups[0][“params”][27].grad. The gradients are all zero... Why is that?

Additional: If one input is bigger than 224 by 224, for instance input_1 = torch.randn(4, 3, 400, 224), the bias of the last convolutional gets updated.

MWE using PyTorch 1.11.0:

import torch
import torchvision.models as models
import torch.nn.functional as F


class Siamese_VGG16(torch.nn.Module):
    def __init__(self, num_elements_embedding_vector: int) -> None:
        super().__init__()

        encoder = models.vgg16(pretrained=True)

        layers = list(encoder.features.children())[:-2]

        encoder = torch.nn.Sequential(*layers)

        self.model = torch.nn.Module()
        self.model.add_module("encoder", encoder)

        global_pool = torch.nn.AdaptiveAvgPool2d((7, 7))
        self.model.add_module("pool", global_pool)

        embedded_vector = torch.nn.Sequential(
            torch.nn.Linear(25088, num_elements_embedding_vector),
        )

        self.model.add_module("embedding", embedded_vector)

    def forward_once(self, x: torch.Tensor) -> torch.Tensor:
        encoding = self.model.encoder(x)
        pool = self.model.pool(encoding)
        pool = pool.reshape(pool.shape[0], -1)

        return self.model.embedding(pool)

    def forward(self, input1: torch.Tensor, input2: torch.Tensor):

        output1 = self.forward_once(input1)
        output2 = self.forward_once(input2)

        return output1, output2


def contrastive_loss(embedding_vec_1, embedding_vec_2, label):
    negative_margin = 1.0

    euclidean_distance = F.pairwise_distance(
        embedding_vec_1, embedding_vec_2, keepdim=True
    )

    loss_contrastive = torch.mean(
        (1 - label).unsqueeze(1) * torch.pow(euclidean_distance, 2)
        + (label).unsqueeze(1)
        * torch.pow(torch.clamp(negative_margin - euclidean_distance, min=0.0), 2)
    )

    return loss_contrastive


model = Siamese_VGG16(128)

optimizer = torch.optim.Adam(
    params=model.parameters(),
    lr=0.0005,
)

loss_func = contrastive_loss

parameters_pre = [t.detach().clone() for t in optimizer.param_groups[0]["params"]]

input_1 = torch.randn(4, 3, 224, 224)
input_2 = torch.randn(4, 3, 224, 224)
label = torch.tensor([1, 0, 1, 0], dtype=torch.long)

# forward pass
output_1, output_2 = model(input_1, input_2)
loss = loss_func(output_1, output_2, label)

# clear gradients
optimizer.zero_grad()
# backward pass
loss.backward()
# update parameters
optimizer.step()

parameters_post = [t.detach().clone() for t in optimizer.param_groups[0]["params"]]

idx = 0

for t_pre, t_post in zip(parameters_pre, parameters_post):
    if torch.equal(t_pre, t_post):
        print(f"{idx} : Equal")
    else:
        print(f"{idx} : Different")

    idx += 1

分享到QQ

分享到微博