在 R 的 optim 函数中使用 pgtol

发布于 2025-01-10 04:17:59 字数 2565 浏览 0 评论 0原文

我试图在 R 的 optim 函数中使用 pgtol ,但没有得到任何结果。

我优化 此函数

RosenbrockFactory <- function(a,b) {
  function(v) {
    return( (a-v[1])^2 + b*(v[2]-v[1]^2)^2 )
  } 
}
# the exact function to optimize
fn <- RosenbrockFactory(2,3)

我使用此辅助函数来获取 a=2, b= 处的梯度3

gradR <- function(v) {
  x <- v[1]
  y <- v[2]
  gr <- rep(0,2)
  gr[1] <- -2 * (2-x) - 12 * (y-x^2)*x
  gr[2] <- 3*2*(y-x^2)
  return(gr)
}

默认优化,并检查,梯度高于我想要的

fn <- RosenbrockFactory(2,3)
opt0 <- optim(c(0,0), fn, method="L-BFGS-B")
gradR(opt0$par)
# [1]  9.060194e-05 -3.449572e-05

R stats 包帮助 optim 文档说,“帮助控制“L-BFGS-B”的收敛这是一种宽容的方法。当检查被抑制时,在当前搜索方向上的投影梯度默认为零。”

这仅说明了 pgtol 为 0 时发生的情况,并未说明非零值的作用,因此请尝试使用较大的 1e10 值:

opt10 <- optim(c(0,0), fn, method="L-BFGS-B", control=list(pgtol=1e10))
gradR(opt10$par)
# [1] -4  0
opt10$par
# [1] 0 0

这将停止任何优化。好吧,尝试换个方式:

optm10 <- optim(c(0,0), fn, method="L-BFGS-B", control=list(pgtol=1e-10))
gradR(optm10$par)
# [1]  9.060194e-05 -3.449572e-05

小值 (1e-10) 相对于我更改 pgtol 之前没有任何改变。所以,我尝试将其调高,

optm300 <- optim(c(0,0), fn, method="L-BFGS-B", control=list(pgtol=1e-300))
gradR(optm300$par)
# [1]  9.060194e-05 -3.449572e-05

这样,对于最小的双倍它就没有再做任何事情。

只需验证一下,我可以用牛顿法对此进行优化,

Newton <- function(x0, tol) {
  xi <- x0
  grNorm <- tol*2 # let iteration start
  while( grNorm > tol) {
    f0 <- fn(xi)
    gr <- gradR(xi)
    H <- hessR(xi)
    step <- -1 * qr.solve(qr(H), gr)
    xi <- xi + step
    grNorm <- sqrt(sum(gr*gr))
  }
  return(xi)
}

hessR <- function(v) {
  x <- v[1]
  y <- v[2]
  H <- matrix(0, nrow=2, ncol=2)
  H[1,1] <- 2 - 12 * y + 36*x^2
  H[1,2] <- H[2,1] <- -12*x
  H[2,2] <-  6
  return(H)
}


n1 <- Newton(c(-1,2), 1e-1)
gradR(n1)
# [1]  0.0010399242 -0.0002601235
n2 <- Newton(c(-1,2), 1e-2)
gradR(n2)
# [1] -1.461804e-10 -4.822809e-13
n3 <- Newton(c(-1,2), 1e-3)
gradR(n3)
# [1] 0 0
print(n3 - c(2,4), digits=16) # exactly at the correct answer

因此这里我将公差设置得越低,我就越接近。当然,一旦牛顿法接近,它就会得到确切的答案。

这是一个 MWE,表明我无法让 L-BFGS-G 继续下去,直到梯度小到我要求的函数有点难以优化,但事实上,牛顿方法可以准确地解决它。

知道 pgtol 是如何工作的,或者我能做些什么来说服 optim 继续下去,直到优化发现结果低于预先指定的梯度大小?我会正常接受这一点,我只是希望能够鼓励它继续前进。

I'm trying to use pgtol in R's optim function and not getting anywhere.

I optimize this function

RosenbrockFactory <- function(a,b) {
  function(v) {
    return( (a-v[1])^2 + b*(v[2]-v[1]^2)^2 )
  } 
}
# the exact function to optimize
fn <- RosenbrockFactory(2,3)

I use this helper function to get the gradient at a=2, b=3

gradR <- function(v) {
  x <- v[1]
  y <- v[2]
  gr <- rep(0,2)
  gr[1] <- -2 * (2-x) - 12 * (y-x^2)*x
  gr[2] <- 3*2*(y-x^2)
  return(gr)
}

default optimization, and check, gradient higher than I want

fn <- RosenbrockFactory(2,3)
opt0 <- optim(c(0,0), fn, method="L-BFGS-B")
gradR(opt0$par)
# [1]  9.060194e-05 -3.449572e-05

The R stats package help for optim documentation says, "helps control the convergence of the "L-BFGS-B" method. It is a tolerance on the projected gradient in the current search direction. This defaults to zero, when the check is suppressed."

This only says what happens with pgtol is 0, doesn't say what non-zero values do, so try a large value of 1e10:

opt10 <- optim(c(0,0), fn, method="L-BFGS-B", control=list(pgtol=1e10))
gradR(opt10$par)
# [1] -4  0
opt10$par
# [1] 0 0

that stops any optimization. Okay, try going the other way:

optm10 <- optim(c(0,0), fn, method="L-BFGS-B", control=list(pgtol=1e-10))
gradR(optm10$par)
# [1]  9.060194e-05 -3.449572e-05

The small value (1e-10) changed nothing relative to before I changed pgtol. So, I try cranking that up

optm300 <- optim(c(0,0), fn, method="L-BFGS-B", control=list(pgtol=1e-300))
gradR(optm300$par)
# [1]  9.060194e-05 -3.449572e-05

so, it did nothing again for about the smallest double.

Just verify, I can optimize this with Newton's method

Newton <- function(x0, tol) {
  xi <- x0
  grNorm <- tol*2 # let iteration start
  while( grNorm > tol) {
    f0 <- fn(xi)
    gr <- gradR(xi)
    H <- hessR(xi)
    step <- -1 * qr.solve(qr(H), gr)
    xi <- xi + step
    grNorm <- sqrt(sum(gr*gr))
  }
  return(xi)
}

hessR <- function(v) {
  x <- v[1]
  y <- v[2]
  H <- matrix(0, nrow=2, ncol=2)
  H[1,1] <- 2 - 12 * y + 36*x^2
  H[1,2] <- H[2,1] <- -12*x
  H[2,2] <-  6
  return(H)
}


n1 <- Newton(c(-1,2), 1e-1)
gradR(n1)
# [1]  0.0010399242 -0.0002601235
n2 <- Newton(c(-1,2), 1e-2)
gradR(n2)
# [1] -1.461804e-10 -4.822809e-13
n3 <- Newton(c(-1,2), 1e-3)
gradR(n3)
# [1] 0 0
print(n3 - c(2,4), digits=16) # exactly at the correct answer

so here the lower I set the tolerance the closer I get. Of course, once Newton's method is close it gets the exact answer.

This is a MWE to show that I cannot get L-BFGS-G to continue until the gradient is as small as I ask for a function that is somewhat difficult to optimize but Newton's method can, in fact, solve it exactly.

Any idea how pgtol works or what I can do to convince optim to keep going until the optimization has found a result below a prespecified gradient size? I'd accept this in norm, I just want to be able to encourage it to keep going.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

听风念你 2025-01-17 04:17:59

请注意,使用“L-BFGS-B”方法时,控制收敛的选项是“factr”!

opt0 <- optim(c(0,0), fn, method="L-BFGS-B",
              control = list(factr = 1e-10))

opt0$par            #>  1.999988  3.999952
gradR(opt0$par)     #> -2.399986e-05  0.000000e+00

像“L-BFGS-B”这样的方法最小化函数值,它们尝试使最佳点处的梯度尽可能小。

R 还有更多现代优化函数,请参阅“优化和数学编程”任务视图。

library(lbfgsb3c)

lbfgsb3(c(0,0), fn, control = list(factr = 1e-10))
$par
[1] 2 4
$grad
[1]  5.846166e-14 -3.804366e-14

这使用了 Nocedal 等人的“L-BFGS-B”的 Fortran 实现。

Please note that with method "L-BFGS-B" the option to control the convergence is 'factr'!

opt0 <- optim(c(0,0), fn, method="L-BFGS-B",
              control = list(factr = 1e-10))

opt0$par            #>  1.999988  3.999952
gradR(opt0$par)     #> -2.399986e-05  0.000000e+00

Methods like "L-BFGS-B" minimize the function values, they do not attempt to make the gradient as small as possible at the optimal point.

There are more modern optimization functions for R, see the "Optimization and Mathematical Programming" Task View.

library(lbfgsb3c)

lbfgsb3(c(0,0), fn, control = list(factr = 1e-10))
$par
[1] 2 4
$grad
[1]  5.846166e-14 -3.804366e-14

This uses the Fortran implementation of "L-BFGS-B" by Nocedal et al.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文