在神经网络中进行自适应学习率时使用哪个乘法和加法因子?
我是神经网络的新手,为了掌握这个问题,我实现了一个基本的前馈 MLP,目前我通过反向传播对其进行训练。我知道有更复杂和更好的方法可以做到这一点,但在 机器学习简介 他们建议,通过一两个技巧,基本的梯度下降可以有效地从现实世界的数据中学习。技巧之一是自适应学习率。
这个想法是当误差变小时,将学习率增加一个常数值a,当误差变大时,将学习率减少学习率的一小部分b。所以基本上学习率的变化是由以下因素决定的:
+(a)
我们是否在正确的方向上学习,以及
-(b * <learning rate>)
我们是否正在破坏我们的学习。然而,在上面的书中没有关于如何设置这些参数的建议。我不期望得到精确的建议,因为参数调整本身就是一个完整的主题,但至少只是对其数量级的提示。有什么想法吗?
谢谢你,
通努兹
I am new to neural networks and, to get grip on the matter, I have implemented a basic feed-forward MLP which I currently train through back-propagation. I am aware that there are more sophisticated and better ways to do that, but in Introduction to Machine Learning they suggest that with one or two tricks, basic gradient descent can be effective for learning from real world data. One of the tricks is adaptive learning rate.
The idea is to increase the learning rate by a constant value a when the error gets smaller, and decrease it by a fraction b of the learning rate when the error gets larger. So basically the learning rate change is determined by:
+(a)
if we're learning in the right direction, and
-(b * <learning rate>)
if we're ruining our learning. However, on the above book there's no advice on how to set these parameters. I wouldn't expect a precise suggestion since parameter tuning is a whole topic on its own, but just a hint at least on their order of magnitude. Any ideas?
Thank you,
Tunnuz
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我已经很长时间(10年以上)没有研究过神经网络了,但是在我看到你的问题之后,我想我应该快速了解一下。我在互联网上不断看到与增加(a)和减少(b)相关的相同数字因子(分别为1.2 和0.5)。
我已经设法将这些值追溯到 Martin Riedmiller 和 Heinrich Braun 的 RPROP 算法 (1992)。 Riedmiller 和 Braun 对于选择的合理参数非常具体。
请参阅:RPROP:一种快速自适应学习算法
我希望这会有所帮助。
I haven't looked at neural networks for the longest time (10 years+) but after I saw your question I thought I would have a quick scout about. I kept seeing the same figures all over the internet in relation to increase(a) and decrease(b) factor (1.2 & 0.5 respectively).
I have managed to track these values down to Martin Riedmiller and Heinrich Braun's RPROP algorithm (1992). Riedmiller and Braun are quite specific about sensible parameters to choose.
See: RPROP: A Fast Adaptive Learning Algorithm
I hope this helps.