如何使用牛顿法找到非线性多元函数的最小值（代码不是线性代数）

发布于 2024-07-10 10:43:53 字数 1020 浏览 20 评论 0原文

我正在尝试进行一些参数估计，并希望选择能够最小化大约 30 个变量的预测方程中的平方误差的参数估计。如果方程是线性的，我只需计算 30 个偏导数，将它们全部设置为零，然后使用线性方程求解器。但不幸的是方程是非线性的，它的导数也是非线性的。

如果方程针对单个变量，我只会使用牛顿法（也称为牛顿-拉夫森）。网络上有丰富的示例和代码来实现单个变量函数的牛顿方法。

鉴于我有大约 30 个变量，如何使用牛顿法编写该问题的数值解？我有封闭形式的方程，可以计算一阶和二阶导数，但我不太知道如何从那里开始。我在网上找到了大量的处理方法，但它们很快就陷入了沉重的矩阵表示法中。我在维基百科上找到了一些有一定帮助的内容，但我在翻译时遇到了问题将其转化为代码。

我担心崩溃的地方是矩阵代数和矩阵求逆。我可以使用线性方程求解器反转矩阵，但我担心获得正确的行和列，避免转置错误等。

具体来说：

我想使用将变量映射到其值的表。我可以编写这样一个表的函数，该函数返回给定这样的表作为参数的平方误差。我还可以创建返回任何给定变量的偏导数的函数。
我对表中的值有一个合理的起始估计，所以我不担心收敛。
不确定如何编写使用估计值（每个变量的值表）、函数和偏导函数表来生成新估计值的循环。

最后一点是我想要帮助的。任何直接帮助或良好来源的指示都将受到热烈欢迎。

编辑：由于我有封闭形式的一阶和二阶导数，我想利用它们并避免像单纯形搜索这样更慢的收敛方法。

原文

I'm trying to do some parameter estimation and want to choose parameter estimates that minimize the square error in a predicted equation over about 30 variables. If the equation were linear, I would just compute the 30 partial derivatives, set them all to zero, and use a linear-equation solver. But unfortunately the equation is nonlinear and so are its derivatives.

If the equation were over a single variable, I would just use Newton's method (also known as Newton-Raphson). The Web is rich in examples and code to implement Newton's method for functions of a single variable.

Given that I have about 30 variables, how can I program a numeric solution to this problem using Newton's method? I have the equation in closed form and can compute the first and second derivatives, but I don't know quite how to proceed from there. I have found a large number of treatments on the web, but they quickly get into heavy matrix notation. I've found something moderately helpful on Wikipedia, but I'm having trouble translating it into code.

Where I'm worried about breaking down is in the matrix algebra and matrix inversions. I can invert a matrix with a linear-equation solver but I'm worried about getting the right rows and columns, avoiding transposition errors, and so on.

To be quite concrete:

I want to work with tables mapping variables to their values. I can write a function of such a table that returns the square error given such a table as argument. I can also create functions that return a partial derivative with respect to any given variable.
I have a reasonable starting estimate for the values in the table, so I'm not worried about convergence.
I'm not sure how to write the loop that uses an estimate (table of value for each variable), the function, and a table of partial-derivative functions to produce a new estimate.

That last is what I'd like help with. Any direct help or pointers to good sources will be warmly appreciated.

Edit: Since I have the first and second derivatives in closed form, I would like to take advantage of them and avoid more slowly converging methods like simplex searches.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

吃兔兔 2024-07-17 10:43:53

数字食谱链接是最有帮助的。我最终对误差估计进行符号微分以产生 30 个偏导数，然后使用牛顿法将它们全部设置为零。以下是代码的重点：

__doc.findzero = [[function(functions, partials, point, [epsilon, steps]) returns table, boolean
Where
  point       is a table mapping variable names to real numbers 
              (a point in N-dimensional space)
  functions   is a list of functions, each of which takes a table like
              point as an argument
  partials    is a list of tables; partials[i].x is the partial derivative
              of functions[i] with respect to 'x'
  epilson     is a number that says how close to zero we're trying to get
  steps       is max number of steps to take (defaults to infinity)
  result      is a table like 'point', boolean that says 'converged'
]]

-- See Numerical Recipes in C, Section 9.6 [http://www.nrbook.com/a/bookcpdf.php]




function findzero(functions, partials, point, epsilon, steps)
  epsilon = epsilon or 1.0e-6
  steps = steps or 1/0
  assert(#functions > 0)
  assert(table.numpairs(partials[1]) == #functions,
         'number of functions not equal to number of variables')
  local equations = { }
  repeat
    if Linf(functions, point) <= epsilon then
      return point, true
    end
    for i = 1, #functions do
      local F = functions[i](point)
      local zero = F
      for x, partial in pairs(partials[i]) do
        zero = zero + lineq.var(x) * partial(point)
      end
      equations[i] = lineq.eqn(zero, 0)
    end
    local delta = table.map(lineq.tonumber, lineq.solve(equations, {}).answers)
    point = table.map(function(v, x) return v + delta[x] end, point)
    steps = steps - 1
  until steps <= 0
  return point, false
end


function Linf(functions, point)
  -- distance using L-infinity norm
  assert(#functions > 0)
  local max = 0
  for i = 1, #functions do
    local z = functions[i](point)
    max = math.max(max, math.abs(z))
  end
  return max
end

The Numerical Recipes link was most helpful. I wound up symbolically differentiating my error estimate to produce 30 partial derivatives, then used Newton's method to set them all to zero. Here are the highlights of the code:

__doc.findzero = [[function(functions, partials, point, [epsilon, steps]) returns table, boolean
Where
  point       is a table mapping variable names to real numbers 
              (a point in N-dimensional space)
  functions   is a list of functions, each of which takes a table like
              point as an argument
  partials    is a list of tables; partials[i].x is the partial derivative
              of functions[i] with respect to 'x'
  epilson     is a number that says how close to zero we're trying to get
  steps       is max number of steps to take (defaults to infinity)
  result      is a table like 'point', boolean that says 'converged'
]]

-- See Numerical Recipes in C, Section 9.6 [http://www.nrbook.com/a/bookcpdf.php]




function findzero(functions, partials, point, epsilon, steps)
  epsilon = epsilon or 1.0e-6
  steps = steps or 1/0
  assert(#functions > 0)
  assert(table.numpairs(partials[1]) == #functions,
         'number of functions not equal to number of variables')
  local equations = { }
  repeat
    if Linf(functions, point) <= epsilon then
      return point, true
    end
    for i = 1, #functions do
      local F = functions[i](point)
      local zero = F
      for x, partial in pairs(partials[i]) do
        zero = zero + lineq.var(x) * partial(point)
      end
      equations[i] = lineq.eqn(zero, 0)
    end
    local delta = table.map(lineq.tonumber, lineq.solve(equations, {}).answers)
    point = table.map(function(v, x) return v + delta[x] end, point)
    steps = steps - 1
  until steps <= 0
  return point, false
end


function Linf(functions, point)
  -- distance using L-infinity norm
  assert(#functions > 0)
  local max = 0
  for i = 1, #functions do
    local z = functions[i](point)
    max = math.max(max, math.abs(z))
  end
  return max
end

回复收藏 0 原文

楠木可依 2024-07-17 10:43:53

您也许可以在 C 语言数值食谱网页中找到您需要的内容。有一个在线免费版本。这里 (PDF) 是包含实现的 Newton-Raphson 方法的章节您可能还想查看 Netlib（LINPack 等）中提供的内容。

回复收藏 0 原文

宣告ˉ结束 2024-07-17 10:43:53

作为使用牛顿法的替代方法，Nelder-Mead 单纯法非常适合的 Numerical Recpies 中被引用

这个问题在 C. Rob

回复收藏 0 原文

天冷不及心凉 2024-07-17 10:43:53

您需要一个函数最小化算法。主要有两个类：本地类和全局类。您的问题是最小二乘，因此局部和全局最小化算法都应该收敛到相同的唯一解决方案。局部最小化比全局最小化更有效，所以选择它。

有许多局部最小化算法，但特别适合最小二乘问题的一种算法是 Levenberg-马夸特。如果您手头没有这样的求解器（例如，来自 MINPACK），那么您可能可以使用牛顿法：

x <- x - (hessian x)^-1 * grad x

使用线性求解器计算逆矩阵乘以向量。

You are asking for a function minimization algorithm. There are two main classes: local and global. Your problem is least squares so both local and global minimization algorithms should converge to the same unique solution. Local minimization is far more efficient than global so select that.

There are many local minimization algorithms but one particularly well suited to least squares problems is Levenberg-Marquardt. If you don't have such a solver to hand (e.g. from MINPACK) then you can probably get away with Newton's method:

x <- x - (hessian x)^-1 * grad x

where you compute the inverse matrix multiplied by a vector using a linear solver.

回复收藏 0 原文

比忠 2024-07-17 10:43:53

既然您已经有了偏导数，那么通用的梯度下降方法怎么样？

回复收藏 0 原文

挖个坑埋了你 2024-07-17 10:43:53

也许您认为您有一个足够好的解决方案，但对我来说，思考这个问题的最简单方法是首先在 1 变量情况下理解它，然后将其扩展到矩阵情况。

在 1 变量的情况下，如果将一阶导数除以二阶导数，您将得到下一个试验点的（负）步长，例如 -V/A。

在 N 变量的情况下，一阶导数是向量，二阶导数是矩阵（Hessian）。您将导数向量乘以二阶导数的倒数，结果是下一个试验点的负阶跃向量，例如 -V*(1/A)

我假设您可以获得二阶导数 Hessian 矩阵。你需要一个例程来反转它。各种线性代数包中有很多这样的东西，而且它们速度相当快。

（对于不熟悉这个想法的读者，假设两个变量是x和y，曲面是v(x,y)。那么一阶导数是向量：

 V = [ dv/dx, dv/dy ]

二阶导数是矩阵：

 A = [dV/dx]
    [dV/dy]

或：

 A = [ d(dv/dx)/dx, d(dv/dy)/dx]
    [ d(dv/dx)/dy, d(dv/dy)/dy]

或：

 A = [d^2v/dx^2, d^2v/dydx]
    [d^2v/dxdy, d^2v/dy^2]

这是对称的。）

如果表面是抛物线（恒定的二阶导数），它将在 1 步内得到答案。另一方面，如果二阶导数非常不恒定，则可能会遇到振荡。将每个步骤切成两半（或一些分数）应该可以使其稳定。

如果 N == 1，您将看到它执行与 1 变量情况相同的操作。

祝你好运。

补充：您想要代码：

 double X[N];
// Set X to initial estimate
while(!done){
    double V[N];    // 1st derivative "velocity" vector
    double A[N*N];  // 2nd derivative "acceleration" matrix
    double A1[N*N]; // inverse of A
    double S[N];    // step vector
    CalculateFirstDerivative(V, X);
    CalculateSecondDerivative(A, X);
    // A1 = 1/A
    GetMatrixInverse(A, A1);
    // S = V*(1/A)
    VectorTimesMatrix(V, A1, S);
    // if S is small enough, stop
    // X -= S
    VectorMinusVector(X, S, X);
}

Maybe you think you have a good-enough solution, but for me, the easiest way to think about this is to understand it in the 1-variable case first, and then extend it to the matrix case.

In the 1-variable case, if you divide the first derivative by the second derivative, you get the (negative) step size to your next trial point, e.g. -V/A.

In the N-variable case, the first derivative is a vector and the second derivative is a matrix (the Hessian). You multiply the derivative vector by the inverse of the second derivative, and the result is the negative step-vector to your next trial point, e.g. -V*(1/A)

I assume you can get the 2nd-derivative Hessian matrix. You will need a routine to invert it. There are plenty of these around in various linear algebra packages, and they are quite fast.

(For readers who are not familiar with this idea, suppose the two variables are x and y, and the surface is v(x,y). Then the first derivative is the vector:

 V = [ dv/dx, dv/dy ]

and the second derivative is the matrix:

 A = [dV/dx]
    [dV/dy]

or:

 A = [ d(dv/dx)/dx, d(dv/dy)/dx]
    [ d(dv/dx)/dy, d(dv/dy)/dy]

or:

 A = [d^2v/dx^2, d^2v/dydx]
    [d^2v/dxdy, d^2v/dy^2]

which is symmetric.)

If the surface is parabolic (constant 2nd derivative) it will get to the answer in 1 step. On the other hand, if the 2nd derivative is very not-constant, you could encounter oscillation. Cutting each step in half (or some fraction) should make it stable.

If N == 1, you'll see that it does the same thing as in the 1-variable case.

Good luck.

Added: You wanted code:

 double X[N];
// Set X to initial estimate
while(!done){
    double V[N];    // 1st derivative "velocity" vector
    double A[N*N];  // 2nd derivative "acceleration" matrix
    double A1[N*N]; // inverse of A
    double S[N];    // step vector
    CalculateFirstDerivative(V, X);
    CalculateSecondDerivative(A, X);
    // A1 = 1/A
    GetMatrixInverse(A, A1);
    // S = V*(1/A)
    VectorTimesMatrix(V, A1, S);
    // if S is small enough, stop
    // X -= S
    VectorMinusVector(X, S, X);
}

回复收藏 0 原文