在pretraining 880000 step之后, fine-tune不起作用
我用的这里的代码 https://github.com/NVIDIA/Dee...
Pretrain参数:
15:47:02,534: INFO tensorflow 140678508230464 init_checkpoint: bertbase3layer-extract-from-google
15:47:02,534: INFO tensorflow 140678508230464 optimizer_type: lamb
15:47:02,534: INFO tensorflow 140678508230464 max_seq_length: 64
15:47:02,534: INFO tensorflow 140678508230464 max_predictions_per_seq: 5
15:47:02,534: INFO tensorflow 140678508230464 do_train: True
15:47:02,535: INFO tensorflow 140678508230464 do_eval: False
15:47:02,535: INFO tensorflow 140678508230464 train_batch_size: 32
15:47:02,535: INFO tensorflow 140678508230464 eval_batch_size: 8
15:47:02,535: INFO tensorflow 140678508230464 learning_rate: 5e-05
15:47:02,535: INFO tensorflow 140678508230464 num_train_steps: 10000000
15:47:02,535: INFO tensorflow 140678508230464 num_warmup_steps: 10000
15:47:02,535: INFO tensorflow 140678508230464 save_checkpoints_steps: 1000
15:47:02,535: INFO tensorflow 140678508230464 display_loss_steps: 10
15:47:02,535: INFO tensorflow 140678508230464 iterations_per_loop: 1000
15:47:02,535: INFO tensorflow 140678508230464 max_eval_steps: 100
15:47:02,535: INFO tensorflow 140678508230464 num_accumulation_steps: 1
15:47:02,535: INFO tensorflow 140678508230464 allreduce_post_accumulation: False
15:47:02,535: INFO tensorflow 140678508230464 verbose_logging: False
15:47:02,535: INFO tensorflow 140678508230464 horovod: True
15:47:02,536: INFO tensorflow 140678508230464 report_loss: True
15:47:02,536: INFO tensorflow 140678508230464 manual_fp16: False
15:47:02,536: INFO tensorflow 140678508230464 amp: False
15:47:02,536: INFO tensorflow 140678508230464 use_xla: True
15:47:02,536: INFO tensorflow 140678508230464 init_loss_scale: 4294967296
15:47:02,536: INFO tensorflow 140678508230464 ?: False
15:47:02,536: INFO tensorflow 140678508230464 help: False
15:47:02,536: INFO tensorflow 140678508230464 helpshort: False
15:47:02,536: INFO tensorflow 140678508230464 helpfull: False
15:47:02,536: INFO tensorflow 140678508230464 helpxml: False
15:47:02,536: INFO tensorflow 140678508230464 **************************
Pretrain loss: (我去掉了nsp_loss)
{'throughput_train': 1196.9646684552622, 'mlm_loss': 0.9837073683738708, 'nsp_loss': 0.0, 'total_loss': 0.9837073683738708, 'avg_loss_step': 1.200513333082199, 'learning_rate': '0.00038143058'}
{'throughput_train': 1230.5063662500734, 'mlm_loss': 1.3001925945281982, 'nsp_loss': 0.0, 'total_loss': 1.3001925945281982, 'avg_loss_step': 1.299936044216156, 'learning_rate': '0.00038143038'}
{'throughput_train': 1236.4348949169155, 'mlm_loss': 1.473339319229126, 'nsp_loss': 0.0, 'total_loss': 1.473339319229126, 'avg_loss_step': 1.2444063007831574, 'learning_rate': '0.00038143017'}
{'throughput_train': 1221.2668264552692, 'mlm_loss': 0.9924975633621216, 'nsp_loss': 0.0, 'total_loss': 0.9924975633621216, 'avg_loss_step': 1.1603020071983337, 'learning_rate': '0.00038142994'}
Fine-tune代码:
self.train_op = tf.train.AdamOptimizer(0.00001).minimize(self.loss, global_step=self.global_step)
Fine-tune 正确率: (restore from my ckpt pretrained from https://github.com/NVIDIA/Dee...
epoch 1:
training step 895429, loss 4.98, acc 0.079
dev loss 4.853, acc 0.092
epoch 2:
training step 895429, loss 4.97, acc 0.080
dev loss 4.823, acc 0.092
epoch 3:
training step 895429, loss 4.96, acc 0.081
dev loss 4.849, acc 0.092
epoch 4:
training step 895429, loss 4.95, acc 0.082
dev loss 4.843, acc 0.092
不 restore pretrained ckpt:
epoch 1:
training step 10429, loss 2.48, acc 0.606
dev loss 1.604, acc 0.8036
Restore google官方的BERT-Base pretrained ckpt. 或者 restore from a pretrained ckpt pretrained from https://github.com/guotong198...
epoch 1:
training loss 1.89, acc 0.761
dev loss 1.351, acc 0.869
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论