为什么不做t keras' ModelCheckpoint在培训过程中保存最佳模型,具有最高的验证精度?
我正在与Keras一起训练RESNET18。如下所示,我使用ModelCheckpoint根据验证精度保存最佳模型。
model = ResNet18(2)
model.build(input_shape = (None,128,128,3))
model.summary()
model.save_weights('./Adam_resnet18_original.hdf5')
opt = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
mcp_save = ModelCheckpoint('Adam_resnet18_weights.hdf5', save_best_only=True, monitor='val_accuracy', mode='max')
batch_size = 128
model.fit(generator(batch_size, x_train, y_train), steps_per_epoch = len(x_train) // batch_size, validation_data = generator(batch_size, x_valid, y_valid), validation_steps = len(x_valid) // batch_size, callbacks=[mcp_save], epochs = 300)
如下图所示,验证精度在训练过程中可能会达到0.8281。 培训历史记录
但是,当我使用最终模型以下面的代码获得最终验证准确性时,我的精度仅为0.78109。有人可以启发我这里的问题吗?多谢!
model.load_weights('Adam_resnet18_weights.hdf5')
predictions_validation = model.predict(generator(batch_size, x_valid, y_valid), steps = len(x_valid) // batch_size + 1)
predictions_validation_label = np.argmax(predictions_validation, axis=1)
Y_valid_label = np.argmax(Y_valid, axis=1)
accuracy_validation_conventional = accuracy_score(Y_valid_label, predictions_validation_label[:len(Y_valid_label)])
print(f'Accuracy on the validation set: {accuracy_validation_conventional}')
I am training a ResNet18 with Keras. As shown below, I used ModelCheckPoint to save the best model based on the validation accuracy.
model = ResNet18(2)
model.build(input_shape = (None,128,128,3))
model.summary()
model.save_weights('./Adam_resnet18_original.hdf5')
opt = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
mcp_save = ModelCheckpoint('Adam_resnet18_weights.hdf5', save_best_only=True, monitor='val_accuracy', mode='max')
batch_size = 128
model.fit(generator(batch_size, x_train, y_train), steps_per_epoch = len(x_train) // batch_size, validation_data = generator(batch_size, x_valid, y_valid), validation_steps = len(x_valid) // batch_size, callbacks=[mcp_save], epochs = 300)
As shown in the picture below, the validation accuracy could go up to 0.8281 during training.
Training History
However, when I used the final model to get the final validation accuracy with the code below, I got an accuracy that's only 0.78109. Can anybody enlighten me what might be the problem here? Thanks a lot!
model.load_weights('Adam_resnet18_weights.hdf5')
predictions_validation = model.predict(generator(batch_size, x_valid, y_valid), steps = len(x_valid) // batch_size + 1)
predictions_validation_label = np.argmax(predictions_validation, axis=1)
Y_valid_label = np.argmax(Y_valid, axis=1)
accuracy_validation_conventional = accuracy_score(Y_valid_label, predictions_validation_label[:len(Y_valid_label)])
print(f'Accuracy on the validation set: {accuracy_validation_conventional}')
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这里最大的线索是,最后几个时代的精度被卡住了1.000。由此,该模型似乎过于适应。对过度拟合的直觉理解就像是一个一遍又一遍地接受完全相同的测试的学生,以至于他们只是记住每个问题的答案,并且无法适应措辞的小变化。网已经“记住”了训练数据,但无法适应测试数据。
弄清楚最好的方法有点棘手,因为我不知道您正在使用的数据集的大小或模型的详细信息。我假设数据集的尺寸不错(如果没有,请尝试数据增强),您已经定义了一个多层网(如果您是从keras导入此模型,则您的选项可能是一个更有限的)。但是,这里有一些建议:
较早停止。将您的以弗类设置为较小的数字,以防止过度训练。这是最简单,最简单的解决方案,在您的情况下,这是有意义的,因为在过去的几个时期的准确性已经达到1.00。如果您能够随着时间的推移来绘制准确性和损失,这将有所帮助,因为您将能够在此示例。有一些更典型的方法可以实施尽早停止,但是仅仅以较少的时期运行就足以满足您的目的。
add dropfout layers 。简而言之,这将“关闭”网络中的随机权重,从而防止网络过度偏向一小部分节点。这也是防止过度拟合的常见技术。
可以找到一个完整的解释以及其他建议,在这里。希望这有帮助!
The biggest clue here is that the accuracy is stuck to 1.000 for the last couple epochs. From this, it appears that this model is overfitting. An intuitive understanding of overfitting would be like a student taking the exact same test over and over again, to the point where they just memorize the answers to each question and are unable to adapt to small changes in wording. The net has "memorized" the training data but is unable to adapt to the testing data.
It's a little tricky to figure out what the best approach would be since I don't know the size of the dataset you are working with or the details of the model. I am under the assumption that the dataset is of a decent size (if not, try data augmentation) and you have defined a multi-layered net (if you are importing this model from Keras, your options may be a little more limited). Here are some suggestions though:
Stop earlier. Set your ephochs to be a smaller number to prevent overtraining. This is the simplest and easiest solution, and it would make sense in your case since accuracy is already at 1.00 for the last several epochs. If you are able to graph your accuracy and loss over time, this will help as you will be able to visually pinpoint the number of epochs where overfitting begins, as you can see in this example. There are fancier ways to implement early stopping, but simply running for fewer epochs will probably be sufficient for your purposes.
Add dropout layers. Put simply, this will "turn off" random weights in the network, which prevents the network from over-relying on a small subset of nodes. This is also a common technique to prevent overfitting.
A fuller explanation along with other suggestions can be found here. Hope this was helpful!