是否有一个好的规则可以选择合适的批量尺寸?
常见的启发式方法是确保它与加速硬件允许的大小一样大。然而,当在批量较大的小数据集上训练时,似乎训练效率低下,因为每个时期只有少量步骤,这将防止足够数量的更新以使培训汇聚。
The common heuristic is to ensure it is as big as your acceleration hardware allows it. Nevertheless, when training on a small dataset with a large batch size it appears that the training is inefficient because each epoch only has a small number of steps, which would prevent a good enough number of updates in order to make the training converge.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论