需要好的方法来选择和调整“学习率”

发布于 2024-07-24 13:20:48 字数 446 浏览 5 评论 0原文

在下图中，您可以看到一个学习算法试图学习产生所需的输出（红线）。学习算法类似于后向误差传播神经网络。

“学习率”是一个控制训练过程中调整大小的值。如果学习率太高，则算法学习得很快，但其预测在训练过程中会跳跃很多（绿线 - 学习率为 0.001），如果学习率较低，则预测跳跃较少，但算法需要学习时间更长（蓝线 - 学习率为 0.0001）。

黑线是移动平均线。

如何调整学习速率，使其最初收敛到接近所需的输出，但随后减慢速度，以便能够磨练出正确的值？

学习率图 http://img.skitch.com/20090605-pqpkse1yr1e5r869y6eehmpsym.png

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凉城已无爱 2024-07-31 13:20:48

有时，随着时间的推移降低学习率的过程称为学习率“退火”。

有许多可能的“退火计划”，例如让学习率成为时间的线性函数：

u(t) = c / t

...其中 c 是某个常数。或者有“搜索然后收敛”的时间表：

u(t) = A * (1 + (c/A)*(t/T)) / 
           (1 + (c/A)*(t/T) + T*(t^2)/(T^2))

...当 t 与 T 相比较小时，它将学习率保持在 A 左右> （“搜索”阶段），然后当 t 与 T 相比较大时降低学习率（“收敛”阶段）。当然，对于这两种方法，您都必须调整参数（例如 c、A 或 T），但希望引入它们会有所帮助比它会伤害。 :)

一些参考文献：

更快随机梯度搜索的学习率计划，Christian Darken、Joseph Chang 和 John Moody，信号处理神经网络 2 --- 1992 年 IEEE 研讨会论文集，IEEE Press，皮斯卡塔韦，新泽西州，1992 年。
随机近似方法，Herbert Robbins 和 Sutton Monro，《数理统计年鉴》22，#3（1951 年 9 月），第 400-407 页。
神经网络和学习机（特别是第 3.13 节），Simon S. Haykin，第 3 版 (2008)，ISBN 0131471392,
9780131471399 edu/~gorr/classes/cs449/momrate.html" rel="noreferrer">简要讨论学习率适应的页面。

Sometimes the process of decreasing the learning rate over time is called "annealing" the learning rate.

There are many possible "annealing schedules", like having the learning rate be a linear function of time:

u(t) = c / t

...where c is some constant. Or there is the "search-then-converge" schedule:

u(t) = A * (1 + (c/A)*(t/T)) / 
           (1 + (c/A)*(t/T) + T*(t^2)/(T^2))

...which keeps the learning rate around A when t is small compared to T (the "search" phase) and then decreases the learning rate when t is large compared to T (the "converge" phase). Of course, for both of these approaches you have to tune parameters (e.g. c, A, or T) but hopefully introducing them will help more than it will hurt. :)

Some references:

Learning Rate Schedules for Faster Stochastic Gradient Search, Christian Darken, Joseph Chang and John Moody, Neural Networks for Signal Processing 2 --- Proceedings of the 1992 IEEE Workshop, IEEE Press, Piscataway, NJ, 1992.
A Stochastic Approximation Method, Herbert Robbins and Sutton Monro, Annals of Mathematical Statistics 22, #3 (September 1951), pp. 400–407.
Neural Networks and Learning Machines (section 3.13 in particular), Simon S. Haykin, 3rd edition (2008), ISBN 0131471392, 9780131471399
Here is a page that briefly discusses learning rate adaptation.

回复收藏 0 原文