介绍与对比损失的暹罗网络训练缺少参数更新

发布于 2025-01-24 11:44:57 字数 3087 浏览 0 评论 0 原文

我尝试实现一个相当简单的暹罗网络和对比损失功能。我使用预先训练的VGG16作为骨干模型,然后从编码器中剥离最后一个relu和maxpool。然后,我添加一个自适应池和一个普通的线性层来生成嵌入向量。

要测试我的实现,我将传递随机输入,并检查每个参数是否获取更新。

问题:正如我的MWE输出中可以看到的那样,参数列表的元素25和27无法接收更新。我认为这些是最后一个卷积层和线性层的偏见。我还检查了 Optimizer.param_groups [0] [“ params”] [25] .grad Optimizer.param_groups [0] [“ params”] [27] .grad <。梯度全部为零... 为什么是?

附加:如果一个输入大于224 x 224,例如 input_1 = torch.randn(4、3、400、224) ,最后卷积的偏见被更新了。

MWE使用Pytorch 1.11.0:

import torch
import torchvision.models as models
import torch.nn.functional as F


class Siamese_VGG16(torch.nn.Module):
    def __init__(self, num_elements_embedding_vector: int) -> None:
        super().__init__()

        encoder = models.vgg16(pretrained=True)

        layers = list(encoder.features.children())[:-2]

        encoder = torch.nn.Sequential(*layers)

        self.model = torch.nn.Module()
        self.model.add_module("encoder", encoder)

        global_pool = torch.nn.AdaptiveAvgPool2d((7, 7))
        self.model.add_module("pool", global_pool)

        embedded_vector = torch.nn.Sequential(
            torch.nn.Linear(25088, num_elements_embedding_vector),
        )

        self.model.add_module("embedding", embedded_vector)

    def forward_once(self, x: torch.Tensor) -> torch.Tensor:
        encoding = self.model.encoder(x)
        pool = self.model.pool(encoding)
        pool = pool.reshape(pool.shape[0], -1)

        return self.model.embedding(pool)

    def forward(self, input1: torch.Tensor, input2: torch.Tensor):

        output1 = self.forward_once(input1)
        output2 = self.forward_once(input2)

        return output1, output2


def contrastive_loss(embedding_vec_1, embedding_vec_2, label):
    negative_margin = 1.0

    euclidean_distance = F.pairwise_distance(
        embedding_vec_1, embedding_vec_2, keepdim=True
    )

    loss_contrastive = torch.mean(
        (1 - label).unsqueeze(1) * torch.pow(euclidean_distance, 2)
        + (label).unsqueeze(1)
        * torch.pow(torch.clamp(negative_margin - euclidean_distance, min=0.0), 2)
    )

    return loss_contrastive


model = Siamese_VGG16(128)

optimizer = torch.optim.Adam(
    params=model.parameters(),
    lr=0.0005,
)

loss_func = contrastive_loss

parameters_pre = [t.detach().clone() for t in optimizer.param_groups[0]["params"]]

input_1 = torch.randn(4, 3, 224, 224)
input_2 = torch.randn(4, 3, 224, 224)
label = torch.tensor([1, 0, 1, 0], dtype=torch.long)

# forward pass
output_1, output_2 = model(input_1, input_2)
loss = loss_func(output_1, output_2, label)

# clear gradients
optimizer.zero_grad()
# backward pass
loss.backward()
# update parameters
optimizer.step()

parameters_post = [t.detach().clone() for t in optimizer.param_groups[0]["params"]]

idx = 0

for t_pre, t_post in zip(parameters_pre, parameters_post):
    if torch.equal(t_pre, t_post):
        print(f"{idx} : Equal")
    else:
        print(f"{idx} : Different")

    idx += 1

I try to implement a rather simple siamese network and a contrastive loss function. I use a pre-trained VGG16 as a backbone model and strip away the last ReLU and MaxPooling from the encoder. Then I add an adaptive pooling and a plain linear layer to generate the embedding vector.

To test my implementation, I pass random inputs and check if every parameter gets an update.

Problem: As one can see in the output of my MWE, the elements 25 and 27 of the parameter list don’t receive updates. I think these are the biases of the last convolution layer and the linear layer. I also checked the content of optimizer.param_groups[0][“params”][25].grad and optimizer.param_groups[0][“params”][27].grad. The gradients are all zero... Why is that?

Additional: If one input is bigger than 224 by 224, for instance input_1 = torch.randn(4, 3, 400, 224), the bias of the last convolutional gets updated.

MWE using PyTorch 1.11.0:

import torch
import torchvision.models as models
import torch.nn.functional as F


class Siamese_VGG16(torch.nn.Module):
    def __init__(self, num_elements_embedding_vector: int) -> None:
        super().__init__()

        encoder = models.vgg16(pretrained=True)

        layers = list(encoder.features.children())[:-2]

        encoder = torch.nn.Sequential(*layers)

        self.model = torch.nn.Module()
        self.model.add_module("encoder", encoder)

        global_pool = torch.nn.AdaptiveAvgPool2d((7, 7))
        self.model.add_module("pool", global_pool)

        embedded_vector = torch.nn.Sequential(
            torch.nn.Linear(25088, num_elements_embedding_vector),
        )

        self.model.add_module("embedding", embedded_vector)

    def forward_once(self, x: torch.Tensor) -> torch.Tensor:
        encoding = self.model.encoder(x)
        pool = self.model.pool(encoding)
        pool = pool.reshape(pool.shape[0], -1)

        return self.model.embedding(pool)

    def forward(self, input1: torch.Tensor, input2: torch.Tensor):

        output1 = self.forward_once(input1)
        output2 = self.forward_once(input2)

        return output1, output2


def contrastive_loss(embedding_vec_1, embedding_vec_2, label):
    negative_margin = 1.0

    euclidean_distance = F.pairwise_distance(
        embedding_vec_1, embedding_vec_2, keepdim=True
    )

    loss_contrastive = torch.mean(
        (1 - label).unsqueeze(1) * torch.pow(euclidean_distance, 2)
        + (label).unsqueeze(1)
        * torch.pow(torch.clamp(negative_margin - euclidean_distance, min=0.0), 2)
    )

    return loss_contrastive


model = Siamese_VGG16(128)

optimizer = torch.optim.Adam(
    params=model.parameters(),
    lr=0.0005,
)

loss_func = contrastive_loss

parameters_pre = [t.detach().clone() for t in optimizer.param_groups[0]["params"]]

input_1 = torch.randn(4, 3, 224, 224)
input_2 = torch.randn(4, 3, 224, 224)
label = torch.tensor([1, 0, 1, 0], dtype=torch.long)

# forward pass
output_1, output_2 = model(input_1, input_2)
loss = loss_func(output_1, output_2, label)

# clear gradients
optimizer.zero_grad()
# backward pass
loss.backward()
# update parameters
optimizer.step()

parameters_post = [t.detach().clone() for t in optimizer.param_groups[0]["params"]]

idx = 0

for t_pre, t_post in zip(parameters_pre, parameters_post):
    if torch.equal(t_pre, t_post):
        print(f"{idx} : Equal")
    else:
        print(f"{idx} : Different")

    idx += 1

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

洋洋洒洒 2025-01-31 11:44:58

从本质上讲,在对比度损失中,计算两个嵌入式向量之间的差异。在反向传播期间,两个输入的梯度都会累积。由于损失的上述差异,累积梯度评估为零。在最后一个线性层之后的某些激活函数中,我们可以省略此行为。

Essentially, in the contrastive loss, a difference between the two embedded vectors is calculated. During the backpropagation, the gradients for both inputs get accumulated. Because of the aforementioned difference in the loss, the accumulated gradient evaluates to zero. With some activation function after the last linear layer, we could omit this behavior.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文