开始微调时的损失高于转移学习的损失

发布于 2025-02-05 19:26:56 字数 2753 浏览 2 评论 0原文

由于我开始通过转移学习所学的权重进行微调，因此我希望损失相同或更少。但是，看起来它开始使用不同的起始权重进行微调。

开始转移学习的代码：

base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
                                              include_top=False, 
                                              weights='imagenet')
base_model.trainable = False

model = tf.keras.Sequential([
  base_model,
  tf.keras.layers.GlobalAveragePooling2D(),
  tf.keras.layers.Dense(units=3, activation='sigmoid')
])

model.compile(optimizer='adam', 
              loss='binary_crossentropy', 
              metrics=['accuracy'])

epochs = 1000
callback = tf.keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)
history = model.fit(train_generator,
                    steps_per_epoch=len(train_generator), 
                    epochs=epochs,
                    validation_data=val_generator,
                    validation_steps=len(val_generator),
                    callbacks=[callback],)

从上一个时期的输出：

Epoch 29/1000
232/232 [==============================] - 492s 2s/step - loss: 0.1298 - accuracy: 0.8940 - val_loss: 0.1220 - val_accuracy: 0.8937

开始进行微调的代码：

model.trainable = True

# Fine-tune from this layer onwards
fine_tune_at = -20

# Freeze all the layers before the `fine_tune_at` layer
for layer in model.layers[:fine_tune_at]:
  layer.trainable =  False

model.compile(optimizer=tf.keras.optimizers.Adam(1e-5),
              loss='binary_crossentropy',
              metrics=['accuracy'])

history_fine = model.fit(train_generator,
                         steps_per_epoch=len(train_generator), 
                         epochs=epochs,
                         validation_data=val_generator,
                         validation_steps=len(val_generator),
                         callbacks=[callback],)

但这是我看到的前几个时期：

Epoch 1/1000
232/232 [==============================] - ETA: 0s - loss: 0.3459 - accuracy: 0.8409/usr/local/lib/python3.7/dist-packages/PIL/Image.py:960: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
  "Palette images with Transparency expressed in bytes should be "
232/232 [==============================] - 509s 2s/step - loss: 0.3459 - accuracy: 0.8409 - val_loss: 0.7755 - val_accuracy: 0.7262
Epoch 2/1000
232/232 [==============================] - 502s 2s/step - loss: 0.1889 - accuracy: 0.9066 - val_loss: 0.5628 - val_accuracy: 0.8881

最终损失下降并通过了转移学习损失：

Epoch 87/1000
232/232 [==============================] - 521s 2s/step - loss: 0.0232 - accuracy: 0.8312 - val_loss: 0.0481 - val_accuracy: 0.8563

为什么第一个时代的损失是Fine的第一个时代的损失调谐高于转移学习的最后一次损失？

原文

Since I start fine tuning with the weights learned by transfer learning, I would expect the loss to be the same or less. However it looks like it starts fine tuning using a different set of starting weights.

Code to start transfer learning:

base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
                                              include_top=False, 
                                              weights='imagenet')
base_model.trainable = False

model = tf.keras.Sequential([
  base_model,
  tf.keras.layers.GlobalAveragePooling2D(),
  tf.keras.layers.Dense(units=3, activation='sigmoid')
])

model.compile(optimizer='adam', 
              loss='binary_crossentropy', 
              metrics=['accuracy'])

epochs = 1000
callback = tf.keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)
history = model.fit(train_generator,
                    steps_per_epoch=len(train_generator), 
                    epochs=epochs,
                    validation_data=val_generator,
                    validation_steps=len(val_generator),
                    callbacks=[callback],)

Output from last epoch:

Epoch 29/1000
232/232 [==============================] - 492s 2s/step - loss: 0.1298 - accuracy: 0.8940 - val_loss: 0.1220 - val_accuracy: 0.8937

Code to start fine tuning:

model.trainable = True

# Fine-tune from this layer onwards
fine_tune_at = -20

# Freeze all the layers before the `fine_tune_at` layer
for layer in model.layers[:fine_tune_at]:
  layer.trainable =  False

model.compile(optimizer=tf.keras.optimizers.Adam(1e-5),
              loss='binary_crossentropy',
              metrics=['accuracy'])

history_fine = model.fit(train_generator,
                         steps_per_epoch=len(train_generator), 
                         epochs=epochs,
                         validation_data=val_generator,
                         validation_steps=len(val_generator),
                         callbacks=[callback],)

But this is what I see for the first few epochs:

Epoch 1/1000
232/232 [==============================] - ETA: 0s - loss: 0.3459 - accuracy: 0.8409/usr/local/lib/python3.7/dist-packages/PIL/Image.py:960: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
  "Palette images with Transparency expressed in bytes should be "
232/232 [==============================] - 509s 2s/step - loss: 0.3459 - accuracy: 0.8409 - val_loss: 0.7755 - val_accuracy: 0.7262
Epoch 2/1000
232/232 [==============================] - 502s 2s/step - loss: 0.1889 - accuracy: 0.9066 - val_loss: 0.5628 - val_accuracy: 0.8881

Eventually the loss drops and passes the transfer learning loss:

Epoch 87/1000
232/232 [==============================] - 521s 2s/step - loss: 0.0232 - accuracy: 0.8312 - val_loss: 0.0481 - val_accuracy: 0.8563

Why was the loss in the first epoch of fine tuning higher than the last loss from transfer learning?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

冷弦 2025-02-12 19:26:56

根据TensorFlow的说法，KERAS页面上的传输学习和微调 link 。批处理层的参数应单独保留。

重要的是，尽管基本模型变得可训练，但它仍处于推理模式下，因为我们通过训练= false时在构建模型时调用它时。这意味着内部的批准层不会更新其批处理统计信息。如果他们这样做了，他们将对迄今为止模型所学的表示形式造成破坏。

以下是我所做的，这解决了解冻层后突然增加损失的问题：

from tensorflow.keras import layers
from tensorflow.keras.applications import MobileNet

img_width, img_height, num_channel = 128, 128, 3
conv_base = MobileNet(
             include_top=False,
             input_shape=(img_width, img_height, num_channel),
             pooling="avg")
conv_base.trainable = False

check_layer = layers.BatchNormalization() # a dummy layer

for layer in conv_base.layers[-50:]: # unfreeze 50 layers from the top
        # check if the layer is of type BatchNorm
        if type(layer) != type(check_layer): 
            layer.trainable = True

print(conv_base.summary(show_trainable=True)) # checking the layers' trainability

According to Tensorflow, Keras page on Transfer learning and fine-tuning link. The params of the Batch Norm layer should be left alone.

Importantly, although the base model becomes trainable, it is still running in inference mode since we passed training=False when calling it when we built the model. This means that the batch normalization layers inside won't update their batch statistics. If they did, they would wreck havoc on the representations learned by the model so far.

Below is what I did that fixed the issue of sudden increase in loss after unfreeze layers:

from tensorflow.keras import layers
from tensorflow.keras.applications import MobileNet

img_width, img_height, num_channel = 128, 128, 3
conv_base = MobileNet(
             include_top=False,
             input_shape=(img_width, img_height, num_channel),
             pooling="avg")
conv_base.trainable = False

check_layer = layers.BatchNormalization() # a dummy layer

for layer in conv_base.layers[-50:]: # unfreeze 50 layers from the top
        # check if the layer is of type BatchNorm
        if type(layer) != type(check_layer): 
            layer.trainable = True

print(conv_base.summary(show_trainable=True)) # checking the layers' trainability

回复收藏 0 原文

~没有更多了~