在 R 的 optim 函数中使用 pgtol
我试图在 R 的 optim 函数中使用 pgtol ,但没有得到任何结果。
我优化 此函数
RosenbrockFactory <- function(a,b) {
function(v) {
return( (a-v[1])^2 + b*(v[2]-v[1]^2)^2 )
}
}
# the exact function to optimize
fn <- RosenbrockFactory(2,3)
我使用此辅助函数来获取 a=2, b= 处的梯度3
gradR <- function(v) {
x <- v[1]
y <- v[2]
gr <- rep(0,2)
gr[1] <- -2 * (2-x) - 12 * (y-x^2)*x
gr[2] <- 3*2*(y-x^2)
return(gr)
}
默认优化,并检查,梯度高于我想要的
fn <- RosenbrockFactory(2,3)
opt0 <- optim(c(0,0), fn, method="L-BFGS-B")
gradR(opt0$par)
# [1] 9.060194e-05 -3.449572e-05
R stats
包帮助 optim
文档说,“帮助控制“L-BFGS-B”的收敛这是一种宽容的方法。当检查被抑制时,在当前搜索方向上的投影梯度默认为零。”
这仅说明了 pgtol 为 0 时发生的情况,并未说明非零值的作用,因此请尝试使用较大的 1e10 值:
opt10 <- optim(c(0,0), fn, method="L-BFGS-B", control=list(pgtol=1e10))
gradR(opt10$par)
# [1] -4 0
opt10$par
# [1] 0 0
这将停止任何优化。好吧,尝试换个方式:
optm10 <- optim(c(0,0), fn, method="L-BFGS-B", control=list(pgtol=1e-10))
gradR(optm10$par)
# [1] 9.060194e-05 -3.449572e-05
小值 (1e-10
) 相对于我更改 pgtol
之前没有任何改变。所以,我尝试将其调高,
optm300 <- optim(c(0,0), fn, method="L-BFGS-B", control=list(pgtol=1e-300))
gradR(optm300$par)
# [1] 9.060194e-05 -3.449572e-05
这样,对于最小的双倍它就没有再做任何事情。
只需验证一下,我可以用牛顿法对此进行优化,
Newton <- function(x0, tol) {
xi <- x0
grNorm <- tol*2 # let iteration start
while( grNorm > tol) {
f0 <- fn(xi)
gr <- gradR(xi)
H <- hessR(xi)
step <- -1 * qr.solve(qr(H), gr)
xi <- xi + step
grNorm <- sqrt(sum(gr*gr))
}
return(xi)
}
hessR <- function(v) {
x <- v[1]
y <- v[2]
H <- matrix(0, nrow=2, ncol=2)
H[1,1] <- 2 - 12 * y + 36*x^2
H[1,2] <- H[2,1] <- -12*x
H[2,2] <- 6
return(H)
}
n1 <- Newton(c(-1,2), 1e-1)
gradR(n1)
# [1] 0.0010399242 -0.0002601235
n2 <- Newton(c(-1,2), 1e-2)
gradR(n2)
# [1] -1.461804e-10 -4.822809e-13
n3 <- Newton(c(-1,2), 1e-3)
gradR(n3)
# [1] 0 0
print(n3 - c(2,4), digits=16) # exactly at the correct answer
因此这里我将公差设置得越低,我就越接近。当然,一旦牛顿法接近,它就会得到确切的答案。
这是一个 MWE,表明我无法让 L-BFGS-G 继续下去,直到梯度小到我要求的函数有点难以优化,但事实上,牛顿方法可以准确地解决它。
知道 pgtol
是如何工作的,或者我能做些什么来说服 optim
继续下去,直到优化发现结果低于预先指定的梯度大小?我会正常接受这一点,我只是希望能够鼓励它继续前进。
I'm trying to use pgtol
in R's optim function and not getting anywhere.
I optimize this function
RosenbrockFactory <- function(a,b) {
function(v) {
return( (a-v[1])^2 + b*(v[2]-v[1]^2)^2 )
}
}
# the exact function to optimize
fn <- RosenbrockFactory(2,3)
I use this helper function to get the gradient at a=2, b=3
gradR <- function(v) {
x <- v[1]
y <- v[2]
gr <- rep(0,2)
gr[1] <- -2 * (2-x) - 12 * (y-x^2)*x
gr[2] <- 3*2*(y-x^2)
return(gr)
}
default optimization, and check, gradient higher than I want
fn <- RosenbrockFactory(2,3)
opt0 <- optim(c(0,0), fn, method="L-BFGS-B")
gradR(opt0$par)
# [1] 9.060194e-05 -3.449572e-05
The R stats
package help for optim
documentation says, "helps control the convergence of the "L-BFGS-B" method. It is a tolerance on the projected gradient in the current search direction. This defaults to zero, when the check is suppressed."
This only says what happens with pgtol is 0, doesn't say what non-zero values do, so try a large value of 1e10:
opt10 <- optim(c(0,0), fn, method="L-BFGS-B", control=list(pgtol=1e10))
gradR(opt10$par)
# [1] -4 0
opt10$par
# [1] 0 0
that stops any optimization. Okay, try going the other way:
optm10 <- optim(c(0,0), fn, method="L-BFGS-B", control=list(pgtol=1e-10))
gradR(optm10$par)
# [1] 9.060194e-05 -3.449572e-05
The small value (1e-10
) changed nothing relative to before I changed pgtol
. So, I try cranking that up
optm300 <- optim(c(0,0), fn, method="L-BFGS-B", control=list(pgtol=1e-300))
gradR(optm300$par)
# [1] 9.060194e-05 -3.449572e-05
so, it did nothing again for about the smallest double.
Just verify, I can optimize this with Newton's method
Newton <- function(x0, tol) {
xi <- x0
grNorm <- tol*2 # let iteration start
while( grNorm > tol) {
f0 <- fn(xi)
gr <- gradR(xi)
H <- hessR(xi)
step <- -1 * qr.solve(qr(H), gr)
xi <- xi + step
grNorm <- sqrt(sum(gr*gr))
}
return(xi)
}
hessR <- function(v) {
x <- v[1]
y <- v[2]
H <- matrix(0, nrow=2, ncol=2)
H[1,1] <- 2 - 12 * y + 36*x^2
H[1,2] <- H[2,1] <- -12*x
H[2,2] <- 6
return(H)
}
n1 <- Newton(c(-1,2), 1e-1)
gradR(n1)
# [1] 0.0010399242 -0.0002601235
n2 <- Newton(c(-1,2), 1e-2)
gradR(n2)
# [1] -1.461804e-10 -4.822809e-13
n3 <- Newton(c(-1,2), 1e-3)
gradR(n3)
# [1] 0 0
print(n3 - c(2,4), digits=16) # exactly at the correct answer
so here the lower I set the tolerance the closer I get. Of course, once Newton's method is close it gets the exact answer.
This is a MWE to show that I cannot get L-BFGS-G to continue until the gradient is as small as I ask for a function that is somewhat difficult to optimize but Newton's method can, in fact, solve it exactly.
Any idea how pgtol
works or what I can do to convince optim
to keep going until the optimization has found a result below a prespecified gradient size? I'd accept this in norm, I just want to be able to encourage it to keep going.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
请注意,使用“L-BFGS-B”方法时,控制收敛的选项是“factr”!
像“L-BFGS-B”这样的方法最小化函数值,它们不尝试使最佳点处的梯度尽可能小。
R 还有更多现代优化函数,请参阅“优化和数学编程”任务视图。
这使用了 Nocedal 等人的“L-BFGS-B”的 Fortran 实现。
Please note that with method "L-BFGS-B" the option to control the convergence is 'factr'!
Methods like "L-BFGS-B" minimize the function values, they do not attempt to make the gradient as small as possible at the optimal point.
There are more modern optimization functions for R, see the "Optimization and Mathematical Programming" Task View.
This uses the Fortran implementation of "L-BFGS-B" by Nocedal et al.