The Newton line-search method solves the symmetric system of equations
Hk dk = -gk
to obtain a step dk, where Hk is the Hessian of the objective function at xk and gk is the gradient of the objective function at xk. For problems where the Hessian matrix is indefinite, the perturbed system of equations
is solved to obtain the direction, where
is a positive constant.
If the direction computed is not a descent direction, the (scaled) steepest
descent direction is used instead. Having obtained the direction,
a More-Thuente line search is applied to obtain a step length,
, that approximately solves the one-dimensional optimization
problem
The Newton line-search method can be set using the TaoMethod tao_nls. For the best efficiency, function and gradient evaluations should be performed simultaneously when using this algorithm.
The system of equations is approximately solved by applying the conjugate gradient method, Steihaug-Toint conjugate gradient method, generalized Lanczos method, or an alternative Krylov subspace method supplied by PETSc. The method used to solve the systems of equations is specified with the command line argument -tao_nls_ksp_type <cg,stcg,gltr,petsc>; cg is the default. When the type is set to petsc, the method set with the PETSc -ksp_type command line argument is used. For example, to use GMRES as the linear system solver, one would use the the command line arguments -tao_nls_ksp_type petsc -ksp_type gmres. Internally, the PETSc implementations for the conjugate gradient methods and the generalized Lanczos method are used. See the PETSc manual for further information on changing the behavior of the linear system solvers.
A good preconditioner reduces the number of iterations required to solve the linear system of equations. For the conjugate gradient methods and generalized Lanczos method, this preconditioner must be symmetric and positive definite. The available options are to use no preconditioner, the absolute value of the diagonal of the Hessian matrix, a limited-memory BFGS approximation to the Hessian matrix, or one of the other preconditioners provided by the PETSc package. These preconditioners are specified by the command line argument -tao_nls_pc_type <none,ahess,bfgs,petsc>, respectively. The default is the bfgs preconditioner. When the preconditioner type is set to petsc, the preconditioner set with the PETSc -pc_type command line argument is used. For example, to use an incomplete Cholesky factorization for the preconditioner, one would use the command line arguments -tao_nls_pc_type petsc -pc_type icc. See the PETSc manual for further information on changing the behavior of the preconditioners.
The choice of scaling matrix can have a significant impact on the quality of the Hessian approximation when using the bfgs preconditioner and affect the number of iterations required by the linear system solver. The choices for scaling matrices are the same as those discussed for the limited-memory, variable-metric algorithm. For Newton methods, however, the option exists to use a scaling matrix based on the true Hessian matrix. In particular, the implementation supports using the absolute value of the diagonal of the Hessian matrix or the absolute value of the diagonal of the perturbed Hessian matrix. The scaling matrix to use with the bfgs preconditioner is set with the command line argument -tao_nls_bfgs_scale_type <bfgs,ahess,phess>; phess is the default. The bfgs scaling matrix is derived from the BFGS options. The ahess scaling matrix is the absolute value of the diagonal of the Hessian matrix. The phess scaling matrix is the absolute value of the diagonal of the perturbed Hessian matrix.
The perturbation
is added when the direction returned by the
Krylov subspace method is either not a descent direction, the Krylov method
diverged due to an indefinite preconditioner or matrix, or a direction of
negative curvature was found. In the two latter cases, if the step returned
is a descent direction, it is used during the line search. Otherwise, a
steepest descent direction is used during the line search. The perturbation
is decreased as long as the Krylov subspace method reports success and
increased if further problems are encountered. There are three cases:
initializing, increasing, and decreasing the perturbation. These cases
are described below.
where imin is set with the command line argument
-tao_nls_imin <double> with a default value of 10-4,
imfac by -tao_nls_imfac with a default value of 0.1, and
imax by -tao_nls_imax with a default value of 100.
When using the gltr method to solve the system of equations, an
estimate of the minimum eigenvalue
of the Hessian matrix
is available. This value is use to initialize the perturbation to
.
2. If
is nonzero and a problem was detected with either the
direction or Krylov subspace method, the perturbation is increased to
where pgfac is set with the command line argument -tao_nls_pgfac
with a default value of 10, pmgfac by -tao_nls_pmgfac with a
default value of 0.1, and pmax by -tao_nls_pmax with a default
value of 100.
3. If
is nonzero and no problems were detected with either
the direction or Krylov subspace method, the perturbation is decreased to
where psfac is set with the command line argument -tao_nls_psfac
with a default value of 0.4, and pmsfac by -tao_nls_pmsfac with
a default value of 0.1. Moreover, if
then
, where pmin is set with the command line argument
-tao_nls_pmin and has a default value of 10-12.
Finally, the method for updating the trust-region radius is set with the
command line argument
-tao_nls_update_type <step,reduction,interpolation>; step
is the default. The step method updates the trust-region
radius based on the value of
. In particular,
where
and
are constants. The reduction
method computes the ratio of the actual reduction in the objective function
to the reduction predicted by the quadratic model for the full step,
, where
qk is the quadratic model. The radius is then updated as:
where
and
are constants. The interpolation
method uses the same interpolation mechanism as in the initialization to
compute a new value for the trust-region radius.