我应该使用哪种优化算法来优化多层感知器的权重?

发布于 2024-12-01 20:51:16 字数 229 浏览 1 评论 0原文

实际上这是 3 个问题:

,我应该使用哪种优化算法来优化多层感知器的权重

如果我知道...... 1)只有误差函数的值 ? (黑匣子)

2)梯度? (一阶导数)

3)梯度和粗麻布? (二阶导数)

我听说 CMA-ES 对于 1) 应该工作得很好,而 BFGS 对于 2) 应该工作得很好,但我想知道是否有任何替代方案,而且我不知道对于 3) 应该采用哪种算法。

Actually these are 3 questions:

Which optimization algorithm should I use to optimize the weights of a multilayer perceptron, if I knew...

1) only the value of the error function? (blackbox)

2) the gradient? (first derivative)

3) the gradient and the hessian? (second derivative)

I heard CMA-ES should work very well for 1) and BFGS for 2) but I would like to know if there are any alternatives and I don't know wich algorithm to take for 3).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

小草泠泠 2024-12-08 20:51:16

好吧,这并没有真正回答您最初提出的问题,但它确实为您在评论中提到的问题提供了解决方案。

像处理连续动作空间这样的问题通常不是通过改变错误度量来解决的,而是通过改变整个网络的架构来解决。这使您可以继续使用相同的信息丰富的错误信息,同时仍然解决您想要解决的问题。

解决方案中讨论了可以实现此目的的一些可能的架构更改 问题。在我看来,我建议使用改进的 Q 学习技术,其中状态和动作空间均由自组织映射表示,这在上面链接中提到的论文中进行了讨论。

我希望这有帮助。

Ok, so this doesn't really answer the question you initially asked, but it does provide a solution to the problem you mentioned in the comments.

Problems like dealing with a continuous action space are normally not dealt with via changing the error measure, but rather by changing the architecture of the overall network. This allows you to keep using the same highly informative error information while still solving the problem you want to solve.

Some possible architectural changes that could accomplish this are discussed in the solutions to this question. In my opinion, I'd suggest using a modified Q-learning technique where the state and action spaces are both represented by self organizing maps, which is discussed in a paper mentioned in the above link.

I hope this helps.

青瓷清茶倾城歌 2024-12-08 20:51:16

我终于解决了这个问题:有一些有效的算法可以优化强化学习中的神经网络(具有固定拓扑),例如CMA-ES(CMA-NeuroES) 或 CoSyNE

监督学习的最佳优化算法似乎是 Levenberg-Marquardt (LMA)。这是专门为最小二乘问题设计的算法。当连接和权重较多时,LMA 效果不佳,因为所需空间巨大。在本例中,我使用共轭梯度 (CG)。

Hessian 矩阵不会加速优化。近似二阶导数的算法更快、更高效(BFGS、CG、LMA)。

编辑:对于大规模学习问题,随机梯度下降(SGD)通常优于所有其他算法。

I solved this problem finally: there are some efficient algorithms for optimizing neural networks in reinforcement learning (with fixed topology), e. g. CMA-ES (CMA-NeuroES) or CoSyNE.

The best optimization algorithm for supervised learning seems to be Levenberg-Marquardt (LMA). This is an algorithm that is specifically designed for least square problems. When there are many connections and weights, LMA does not work very well because the required space is huge. In this case I am using Conjugate Gradient (CG).

The hessian matrix does not accelerate optimization. Algorithms that approximate the 2nd derivative are faster and more efficient (BFGS, CG, LMA).

edit: For large scale learning problems often Stochastic Gradient Descent (SGD) outperforms all other algorithms.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文