需要好的方法来选择和调整“学习率”

发布于 2024-07-24 13:20:48 字数 446 浏览 5 评论 0原文

在下图中,您可以看到一个学习算法试图学习产生所需的输出(红线)。 学习算法类似于后向误差传播神经网络。

“学习率”是一个控制训练过程中调整大小的值。 如果学习率太高,则算法学习得很快,但其预测在训练过程中会跳跃很多(绿线 - 学习率为 0.001),如果学习率较低,则预测跳跃较少,但算法需要学习时间更长(蓝线 - 学习率为 0.0001)。

黑线是移动平均线。

如何调整学习速率,使其最初收敛到接近所需的输出,但随后减慢速度,以便能够磨练出正确的值?

学习率图 http://img.skitch.com/20090605-pqpkse1yr1e5r869y6eehmpsym.png

In the picture below you can see a learning algorithm trying to learn to produce a desired output (the red line). The learning algorithm is similar to a backward error propagation neural network.

The "learning rate" is a value that controls the size of the adjustments made during the training process. If the learning rate is too high, then the algorithm learns quickly but its predictions jump around a lot during the training process (green line - learning rate of 0.001), if it is lower then the predictions jump around less, but the algorithm takes a lot longer to learn (blue line - learning rate of 0.0001).

The black lines are moving averages.

How can I adapt the learning rate so that it converges to close to the desired output initially, but then slows down so that it can hone in on the correct value?

learning rate graph http://img.skitch.com/20090605-pqpkse1yr1e5r869y6eehmpsym.png

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

凉城已无爱 2024-07-31 13:20:48

有时,随着时间的推移降低学习率的过程称为学习率“退火”。

有许多可能的“退火计划”,例如让学习率成为时间的线性函数:

u(t) = c / t

...其中 c 是某个常数。 或者有“搜索然后收敛”的时间表:

u(t) = A * (1 + (c/A)*(t/T)) / 
           (1 + (c/A)*(t/T) + T*(t^2)/(T^2))

...当 tT 相比较小时,它将学习率保持在 A 左右> (“搜索”阶段),然后当 tT 相比较大时降低学习率(“收敛”阶段)。 当然,对于这两种方法,您都必须调整参数(例如 cAT),但希望引入它们会有所帮助比它会伤害。 :)

一些参考文献:

  • 更快随机梯度搜索的学习率计划,Christian Darken、Joseph Chang 和 John Moody,信号处理神经网络 2 --- 1992 年 IEEE 研讨会论文集,IEEE Press,皮斯卡塔韦,新泽西州,1992 年。
  • 随机近似方法,Herbert Robbins 和 Sutton Monro,《数理统计年鉴》22,#3(1951 年 9 月),第 400-407 页。
  • 神经网络和学习机(特别是第 3.13 节),Simon S. Haykin,第 3 版 (2008),ISBN 0131471392,
  • 9780131471399 edu/~gorr/classes/cs449/momrate.html" rel="noreferrer">简要讨论学习率适应的页面。

Sometimes the process of decreasing the learning rate over time is called "annealing" the learning rate.

There are many possible "annealing schedules", like having the learning rate be a linear function of time:

u(t) = c / t

...where c is some constant. Or there is the "search-then-converge" schedule:

u(t) = A * (1 + (c/A)*(t/T)) / 
           (1 + (c/A)*(t/T) + T*(t^2)/(T^2))

...which keeps the learning rate around A when t is small compared to T (the "search" phase) and then decreases the learning rate when t is large compared to T (the "converge" phase). Of course, for both of these approaches you have to tune parameters (e.g. c, A, or T) but hopefully introducing them will help more than it will hurt. :)

Some references:

  • Learning Rate Schedules for Faster Stochastic Gradient Search, Christian Darken, Joseph Chang and John Moody, Neural Networks for Signal Processing 2 --- Proceedings of the 1992 IEEE Workshop, IEEE Press, Piscataway, NJ, 1992.
  • A Stochastic Approximation Method, Herbert Robbins and Sutton Monro, Annals of Mathematical Statistics 22, #3 (September 1951), pp. 400–407.
  • Neural Networks and Learning Machines (section 3.13 in particular), Simon S. Haykin, 3rd edition (2008), ISBN 0131471392, 9780131471399
  • Here is a page that briefly discusses learning rate adaptation.
动听の歌 2024-07-31 13:20:48

当您说您需要随着网络学习而改变学习率时,您回答了自己的问题。 有很多不同的方法可以做到这一点。

最简单的方法是随着迭代次数线性降低学习率。 每 25(或其他任意数字),减去一部分速率,直到达到合适的最小值。

您还可以通过迭代次数非线性地完成此操作。 例如,每次迭代将学习率乘以 0.99,直到达到合适的最小值。

或者你可以变得更狡猾。 使用网络的结果来确定网络的下一个学习率。 它的适应度指标做得越好,它的学习率就越小。 这样,只要需要,它就会快速收敛,然后慢慢收敛。 这可能是最好的方法,但它比简单的迭代次数方法成本更高。

You answered your own question when you said you need to have your learning rate change as the network learns. There are a lot of different ways you can do it.

The simplest way is to reduce the learning rate linearly with number of iterations. Every 25 (or some other arbitrary number), subtract a portion off of the rate until it gets to a good minimum.

You can also do it nonlinearly with number of iterations. For example, multiply the learning rate by .99 every iteration, again until it reaches a good minimum.

Or you can get more crafty. Use the results of the network to determine the network's next learning rate. The better it's doing by its fitness metric, the smaller you make its learning rate. That way it will converge quickly for as long as it needs to, then slowly. This is probably the best way, but it's more costly than the simple number-of-iteration approaches.

情徒 2024-07-31 13:20:48

您是否考虑过其他独立于任何学习率的训练方法?

有些训练方法可以绕过计算 Hessian 矩阵的学习率(如 Levenberg-Marquardt),而我遇到过直接搜索算法(如 Norio Baba 开发的算法)。

Have you considered other training methods that are independent of any learning rate?

There are training methods which bypass the need for a learning-rate that calculate the Hessian matrix (like Levenberg-Marquardt), while I have come across direct-search algorithms (like those developed by Norio Baba).

锦爱 2024-07-31 13:20:48

也许将负反馈循环编码到学习算法中,以速率为关键。 开始摆动幅度过大的学习率值会影响反馈环路的调节部分,导致其向另一个方向摆动,此时相反的调节力开始起作用。

状态向量最终将达到平衡,在“太多”和“太少”。 这就是生物学中有多少个系统在起作用

Perhaps code in a negative-feedback loop into the learning algorithm, keyed to the rate. Learning rate values that start to swing too wide hit the moderating part of the feedback loop, causing it to swing the other way, at which point the opposing moderation force kicks in.

The state vector will eventually settle into an equilibrium that strikes a balance between "too much" and "too little". It's how many systems in biology work

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文