TensorFlow Applip_gradients()有多个损失

发布于 2025-01-20 01:00:12 字数 1790 浏览 1 评论 0原文

我正在训练一个具有中间输出的模型(VAEGAN),我有两个损失,

  • 从输出层计算的 KL 散度损失,
  • 从中间层计算的相似性(rec)损失。

我可以简单地总结它们并应用如下所示的渐变吗?

with tf.GradientTape() as tape:
    z_mean, z_log_sigma, z_encoder_output = self.encoder(real_images, training = True)
    kl_loss = self.kl_loss_fn(z_mean, z_log_sigma) * kl_loss_coeff

    fake_images = self.decoder(z_encoder_output)
    fake_inter_activations, logits_fake = self.discriminator(fake_images, training = True)
    real_inter_activations, logits_real = self.discriminator(real_images, training = True)

    rec_loss = self.rec_loss_fn(fake_inter_activations, real_inter_activations) * rec_loss_coeff

    total_encoder_loss = kl_loss + rec_loss

grads = tape.gradient(total_encoder_loss, self.encoder.trainable_weights)
self.e_optimizer.apply_gradients(zip(grads, self.encoder.trainable_weights))

或者我是否需要像下面一样将它们分开,同时保持胶带持久?

with tf.GradientTape(persistent = True) as tape:
    z_mean, z_log_sigma, z_encoder_output = self.encoder(real_images, training = True)
    kl_loss = self.kl_loss_fn(z_mean, z_log_sigma) * kl_loss_coeff
    
    fake_images = self.decoder(z_encoder_output)
    fake_inter_activations, logits_fake = self.discriminator(fake_images, training = True)
    real_inter_activations, logits_real = self.discriminator(real_images, training = True)
    
    rec_loss = self.rec_loss_fn(fake_inter_activations, real_inter_activations) * rec_loss_coeff

grads_kl_loss = tape.gradient(kl_loss, self.encoder.trainable_weights)
self.e_optimizer.apply_gradients(zip(grads_kl_loss, self.encoder.trainable_weights))

grads_rec_loss = tape.gradient(rec_loss, self.encoder.trainable_weights)
self.e_optimizer.apply_gradients(zip(grads_rec_loss, self.encoder.trainable_weights))

I am training a model(VAEGAN) with intermediate outputs and I have two losses,

  • KL Divergence loss I compute from output layer
  • Similarity (rec) loss I compute from an intermediate layer.

Can I simply sum them up and apply gradients like below?

with tf.GradientTape() as tape:
    z_mean, z_log_sigma, z_encoder_output = self.encoder(real_images, training = True)
    kl_loss = self.kl_loss_fn(z_mean, z_log_sigma) * kl_loss_coeff

    fake_images = self.decoder(z_encoder_output)
    fake_inter_activations, logits_fake = self.discriminator(fake_images, training = True)
    real_inter_activations, logits_real = self.discriminator(real_images, training = True)

    rec_loss = self.rec_loss_fn(fake_inter_activations, real_inter_activations) * rec_loss_coeff

    total_encoder_loss = kl_loss + rec_loss

grads = tape.gradient(total_encoder_loss, self.encoder.trainable_weights)
self.e_optimizer.apply_gradients(zip(grads, self.encoder.trainable_weights))

or do I need to seperate them like below while keeping tape persistent?

with tf.GradientTape(persistent = True) as tape:
    z_mean, z_log_sigma, z_encoder_output = self.encoder(real_images, training = True)
    kl_loss = self.kl_loss_fn(z_mean, z_log_sigma) * kl_loss_coeff
    
    fake_images = self.decoder(z_encoder_output)
    fake_inter_activations, logits_fake = self.discriminator(fake_images, training = True)
    real_inter_activations, logits_real = self.discriminator(real_images, training = True)
    
    rec_loss = self.rec_loss_fn(fake_inter_activations, real_inter_activations) * rec_loss_coeff

grads_kl_loss = tape.gradient(kl_loss, self.encoder.trainable_weights)
self.e_optimizer.apply_gradients(zip(grads_kl_loss, self.encoder.trainable_weights))

grads_rec_loss = tape.gradient(rec_loss, self.encoder.trainable_weights)
self.e_optimizer.apply_gradients(zip(grads_rec_loss, self.encoder.trainable_weights))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

莫言歌 2025-01-27 01:00:12

是的,您通常可以总结损失并计算单个梯度。由于总和的梯度是相应梯度的总和,因此总损失所采取的步骤与一个接一个地采取这两个步骤相同。

这是一个简单的例子:假设您有两个权重,而您当前处于(1,3)(“起点”)。损失1的梯度为(2,-4),损失2的梯度为(1,2)。

  • 如果一个接一个地应用步骤,则首先移动到(3,-1),然后移动到(4,1)。
  • 如果首先总结梯度,则总步骤为(3,-2)。从起点遵循这个方向也可以使您到(4,1)。

Yes, you can generally sum the losses and compute a single gradient. Since the gradient of a sum is the sum of the respective gradients, so the step taken by the summed loss is the same as taking both steps one after another.

Here's a simple example: Say you have two weights, and you are currently at the point (1, 3) ("starting point"). The gradient for loss 1 is (2, -4) and the gradient for loss 2 is (1, 2).

  • If you apply the steps one after the other, you will first move to (3, -1) and then to (4, 1).
  • If you sum the gradients first, the overall step is (3, -2). Following this direction from the starting point gets you to (4, 1) as well.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文