凸超参数化问题的 BFGS 收敛性
“众所周知”BFGS优化算法对于严格凸问题是超线性收敛的,但是对于非严格凸问题有没有分析。例如,假设 f(x) 对于某个标量 x 是凸的。然后,假设我们优化 g(x1,x2)=f(x1+x2)。这仍然总是超线性收敛吗?
It is "well-known" that the BFGS optimization algorithm is superlinearly convergent for strictly convex problems, but is there any analysis for problems that are non-strictly convex. For example, suppose that f(x) is convex for some scalar x. Then, suppose we optimize over g(x1,x2)=f(x1+x2). Will this still always be superlinearly convergent?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
BFGS 在非凸问题上是否收敛仍然是一个悬而未决的问题。事实上,鲍威尔在 1984 年给出了一个反例,表明使用不精确的线搜索搜索的 BFGS 可能会
无法收敛。可以做出的是局部陈述,例如:给定局部最小值 x*,如果最终进入 x* 附近的空间区域,BFGS 将超线性收敛。原因是在 x* 附近,目标函数可以通过凸二次函数精确建模。
至于你给出的合成函数已知什么,我不确定。有关 BFGS 属性的详细说明,请参阅 Dennis 和 Schnabel 或 Nocedal 和 Wright。
祝你好运。
Whether BFGS converges at all on non-convex problems is still an open problem. In fact, in 1984 Powell gave a counter-example that shows that BFGS with an inexact line-search search may
fail to converge. What can be made are local statements, such as: Given a local minima x*, if you eventually enter a region of space near x* BFGS will converge super-linearly. The reason for this is that near x*, the objective function can be accurately modelled by a convex quadratic.
As for what is known for the composition function you gave, I am not sure. For a detailed explanation of the properties of BFGS, see either Dennis and Schnabel or Nocedal and Wright.
Best of luck.
如果我错了,请纠正我,但是这种情况下的“解决方案”实际上不是一条线,而不是一个点吗?如果 x' 是 f(x) 的最小化器,那么当对 g(x1, x2) 应用任何方法时,您所能期望的最好结果就是它收敛到线 x2 = x' - x1。
Correct me if I'm wrong, but won't the "solution" in this case actually be a line, not a single point? If x' is a minimizer for f(x), then the best you can hope for when applying any method to g(x1, x2) is for it to converge to the line x2 = x' - x1.
在实践中我发现精心编写的算法会收敛,但不一定是超线性的。舍入误差是罪魁祸首。收敛准则开始发挥作用。对于“几乎”非凸函数,即“刚性”函数,情况也是如此。
人们必须小心处理 BFGS 更新,以确保生成的近似 Hessian 矩阵保持正定“足够”,即使理论上的 Hessian 矩阵不是。我所做的是保留并更新 Hessian 矩阵的 Cholesky 分解,而不是 Hessian 矩阵本身或其逆矩阵。
In practice I have found that a carefully written algorithm will converge, but not necessarily super-linearly. Roundoff error is the culprit. Convergence criteria come into play. It is the same for functions that are "almost" not convex, i.e. "stiff."
One must be careful with the BFGS updates to assure that the resulting approximate Hessian remains positive-definite "enough" even though the theoretical Hessian is not. What I do is to keep and update a Cholesky decomposition of the Hessian, rather than the Hessian per se or its inverse.