当参数空间受限时如何运行梯度下降算法?

发布于 2024-09-07 13:59:23 字数 377 浏览 4 评论 0原文

我想用一个参数最大化一个函数

所以我运行梯度下降(或者实际上是上升):我从初始参数开始,不断添加梯度(乘以一些越来越小的学习率因子),重新评估给定新参数的梯度,依此类推,直到收敛。

但有一个问题:我的参数必须保持正值,所以它不应该变成 <= 0,因为我的函数将是未定义的。不过,我的梯度搜索有时会进入这样的区域(当它为正时,梯度告诉它要低一点,并且它会超调)。

更糟糕的是,此时的梯度可能为负,从而推动搜索走向更负的参数值。 (原因是目标函数包含对数,但梯度不包含。)

有哪些好的(简单)算法可以处理这个约束优化问题?我希望对我的算法进行简单的修复。或者也许忽略梯度并进行某种线性搜索来寻找最佳参数?

I would like to maximize a function with one parameter.

So I run gradient descent (or, ascent actually): I start with an initial parameter and keep adding the gradient (times some learning rate factor that gets smaller and smaller), re-evaluate the gradient given the new parameter, and so on until convergence.

But there is one problem: My parameter must stay positive, so it is not supposed to become <= 0 because my function will be undefined. My gradient search will sometimes go into such regions though (when it was positive, the gradient told it to go a bit lower, and it overshoots).

And to make things worse, the gradient at such a point might be negative, driving the search toward even more negative parameter values. (The reason is that the objective function contains logs, but the gradient doesn't.)

What are some good (simple) algorithms that deal with this constrained optimization problem? I'm hoping for just a simple fix to my algorithm. Or maybe ignore the gradient and do some kind of line search for the optimal parameter?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

如日中天 2024-09-14 13:59:23
  1. 每次更新参数时,检查它是否为负值,如果是,则将其钳位为零。
  2. 如果不能接受限制为零,请尝试添加“log-barrier”(Google)。基本上,它为您的目标函数添加了一个光滑的“软”墙(并修改您的梯度),以使其远离您不希望它去的区域。然后,您可以通过加固墙使其更加无限垂直来重复运行优化,但从之前找到的解决方案开始。在极限情况下(实际上只需要几次迭代),您正在解决的问题与具有硬约束的原始问题相同。
  1. Each time you update your parameter, check to see if it's negative, and if it is, clamp it to zero.
  2. If clamping to zero is not acceptable, try adding a "log-barrier" (Google it). Basically, it adds a smooth "soft" wall to your objective function (and modifying your gradient) to keep it away from regions you don't want it to go to. You then repeatedly run the optimization by hardening up the wall to make it more infinitely vertical, but starting with the previously found solution. In the limit (in practice only a few iterations are needed), the problem you are solving is identical to the original problem with a hard constraint.
水晶透心 2024-09-14 13:59:23

如果不了解您的问题,很难给出具体的建议。您的梯度上升算法可能不太适合您的函数空间。然而,鉴于这就是您所拥有的,这里有一个可能会有所帮助的调整。

您正在遵循您认为的上升梯度。但当你沿着梯度的方向前进时,你发现自己掉进了负值的坑里。这意味着附近有一个局部最大值,但也有一个非常陡峭的负梯度悬崖。明显的解决方法是回溯到之前的位置,并采取较小的步骤(例如一半大小)。如果您仍然陷入困境,请以更小的步骤重复。这将迭代,直到您在悬崖边缘找到局部最大值。

问题是,不能保证你的局部最大值实际上是全局的(除非你对你的函数的了解比你所分享的更多)。这是朴素梯度上升的主要限制——它总是固定在第一个局部最大值上并收敛到它。如果您不想切换到更强大的算法,一种可以提供帮助的简单方法是运行代码的 n 次迭代,每次从函数空间中的随机位置开始,并保持您总体上找到的最佳最大值。这种蒙特卡罗方法会增加您遇到全局最大值的可能性,但代价是运行时间增加 n 倍。其效果如何取决于目标函数的“凹凸程度”。

Without knowing more about your problem, it's hard to give specific advice. Your gradient ascent algorithm may not be particularly suitable for your function space. However, given that's what you've got, here's one tweak that would help.

You're following what you believe is an ascending gradient. But when you move forwards in the direction of the gradient, you discover you have fallen into a pit of negative value. This implies that there was a nearby local maximum, but also a very sharp negative gradient cliff. The obvious fix is to backtrack to your previous position, and take a smaller step (e.g half the size). If you still fall in, repeat with a still smaller step. This will iterate until you find the local maximum at the edge of the cliff.

The problem is, there is no guarantee that your local maximum is actually global (unless you know more about your function than you are sharing). This is the main limitation of naive gradient ascent - it always fixes on the first local maximum and converges to it. If you don't want to switch to a more robust algorithm, one simple approach that could help is to run n iterations of your code, starting each time from random positions in the function space, and keeping the best maximum you find overall. This Monte Carlo approach increases the odds that you will stumble on the global maximum, at the cost of increasing your run time by a factor n. How effective this is will depend on the 'bumpiness' of your objective function.

夏有森光若流苏 2024-09-14 13:59:23

将参数限制为正数的一个简单技巧是根据问题的对数重新参数化问题(确保适当更改梯度)。当然,通过这种变换,最优值可能会移动到 -infty,并且搜索不会收敛。

A simple trick to restrict a parameter to be positive is to re-parametrize the problem in terms of its logarithm (make sure to change the gradient appropriately). Of course it is possible that the optimum moves to -infty with this transformation, and the search does not converge.

删除会话 2024-09-14 13:59:23

在每一步中,将参数限制为正值。 (简而言之)这就是您可能需要谷歌搜索的投影梯度方法

At each step, constrain the parameter to be positive. This is (in short) the projected gradient method you may want to google about.

半步萧音过轻尘 2024-09-14 13:59:23

我有三个建议,按照你想做多少思考和工作的顺序排列。

首先,在梯度下降/上升中,每次移动的速度是梯度乘以某个因子,您将其称为“学习率因子”。如果,正如您所描述的,这一举动导致 x 变为负值,则有两种自然的解释:要么梯度太大,要么学习率因子太大。由于您无法控制渐变,因此采用第二种解释。检查你的举动是否会导致 x 变为负数,如果是,请将学习率因子减半,然后重试。

其次,为了详细说明 Aniko 的答案,让 x 成为你的参数,f(x) 成为你的函数。然后定义一个新函数g(x) = f(e^x),注意虽然f的定义域是(0,无穷大),但g的定义域是(-无穷大,无穷大)。所以 g 不能承受 f 所遭遇的问题。使用梯度下降找到使 g 最大化的值 x_0。然后 e^(x_0) 为正,使 f 最大化。要对 g 应用梯度下降,您需要根据链式法则得到它的导数,即 f'(e^x)*e^x。

第三,听起来您只是在尝试最大化一个函数,而不是编写一般的最大化例程。您可以考虑搁置梯度下降,并定制
针对特定功能的特性进行优化的方法。我们必须更多地了解 f 的预期行为才能帮助您做到这一点。

I have three suggestions, in order of how much thinking and work you want to do.

First, in gradient descent/ascent, you move each time by the gradient times some factor, which you refer to as a "learning rate factor." If, as you describe, this move causes x to become negative, there are two natural interpretations: Either the gradient was too big, or the learning rate factor was too big. Since you can't control the gradient, take the second interpretation. Check whether your move will cause x to become negative, and if so, cut the learning rate factor in half and try again.

Second, to elaborate on Aniko's answer, let x be your parameter, and f(x) be your function. Then define a new function g(x) = f(e^x), and note that although the domain of f is (0,infinity), the domain of g is (-infinity, infinity). So g cannot suffer the problems that f suffers. Use gradient descent to find the value x_0 that maximizes g. Then e^(x_0), which is positive, maximizes f. To apply gradient descent on g, you need its derivative, which is f'(e^x)*e^x, by the chain rule.

Third, it sounds like you're trying maximize just one function, not write a general maximization routine. You could consider shelving gradient descent, and tailoring the
method of optimization to the idiosyncrasies of your specific function. We would have to know a lot more about the expected behavior of f to help you do that.

醉生梦死 2024-09-14 13:59:23

只需使用布伦特最小化方法即可。它稳定且快速,如果您只有一个参数,那么这是正确的做法。这是R 函数optimize 使用的。该链接还包含一个简单的 C++ 实现。是的,您可以为其指定 MIN 和 MAX 参数值限制。

Just use Brent's method for minimization. It is stable and fast and the right thing to do if you have only one parameter. It's what the R function optimize uses. The link also contains a simple C++ implementation. And yes, you can give it MIN and MAX parameter value limits.

云裳 2024-09-14 13:59:23

您在这里得到了很好的答案。我建议重新参数化。另外,您是否考虑过另一种搜索方法,例如 Metropolis-Hastings?一旦您完成了看起来可怕的数学运算,它实际上非常简单,并且它会为您提供标准错误和最佳值。

You're getting good answers here. Reparameterizing is what I would recommend. Also, have you considered another search method, like Metropolis-Hastings? It's actually quite simple once you bull through the scary-looking math, and it gives you standard errors as well as an optimum.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文