24.4. Newton Line-Search Method

The Newton line-search method solves the symmetric system of equations

Hk dk = -gk

to obtain a step dk, where Hk is the Hessian of the objective function at xk and gk is the gradient of the objective function at xk. For problems where the Hessian matrix is indefinite, the perturbed system of equations

is solved to obtain the direction, where is a positive constant. If the direction computed is not a descent direction, the (scaled) steepest descent direction is used instead. Having obtained the direction, a More-Thuente line search is applied to obtain a step length, , that approximately solves the one-dimensional optimization problem

The Newton line-search method can be set using the TaoMethod tao_nls. For the best efficiency, function and gradient evaluations should be performed simultaneously when using this algorithm.

The system of equations is approximately solved by applying the conjugate gradient method, Steihaug-Toint conjugate gradient method, generalized Lanczos method, or an alternative Krylov subspace method supplied by PETSc. The method used to solve the systems of equations is specified with the command line argument -tao_nls_ksp_type <cg,stcg,gltr,petsc>; cg is the default. When the type is set to petsc, the method set with the PETSc -ksp_type command line argument is used. For example, to use GMRES as the linear system solver, one would use the the command line arguments -tao_nls_ksp_type petsc -ksp_type gmres. Internally, the PETSc implementations for the conjugate gradient methods and the generalized Lanczos method are used. See the PETSc manual for further information on changing the behavior of the linear system solvers.

A good preconditioner reduces the number of iterations required to solve the linear system of equations. For the conjugate gradient methods and generalized Lanczos method, this preconditioner must be symmetric and positive definite. The available options are to use no preconditioner, the absolute value of the diagonal of the Hessian matrix, a limited-memory BFGS approximation to the Hessian matrix, or one of the other preconditioners provided by the PETSc package. These preconditioners are specified by the command line argument -tao_nls_pc_type <none,ahess,bfgs,petsc>, respectively. The default is the bfgs preconditioner. When the preconditioner type is set to petsc, the preconditioner set with the PETSc -pc_type command line argument is used. For example, to use an incomplete Cholesky factorization for the preconditioner, one would use the command line arguments -tao_nls_pc_type petsc -pc_type icc. See the PETSc manual for further information on changing the behavior of the preconditioners.

The choice of scaling matrix can have a significant impact on the quality of the Hessian approximation when using the bfgs preconditioner and affect the number of iterations required by the linear system solver. The choices for scaling matrices are the same as those discussed for the limited-memory, variable-metric algorithm. For Newton methods, however, the option exists to use a scaling matrix based on the true Hessian matrix. In particular, the implementation supports using the absolute value of the diagonal of the Hessian matrix or the absolute value of the diagonal of the perturbed Hessian matrix. The scaling matrix to use with the bfgs preconditioner is set with the command line argument -tao_nls_bfgs_scale_type <bfgs,ahess,phess>; phess is the default. The bfgs scaling matrix is derived from the BFGS options. The ahess scaling matrix is the absolute value of the diagonal of the Hessian matrix. The phess scaling matrix is the absolute value of the diagonal of the perturbed Hessian matrix.

The perturbation is added when the direction returned by the Krylov subspace method is either not a descent direction, the Krylov method diverged due to an indefinite preconditioner or matrix, or a direction of negative curvature was found. In the two latter cases, if the step returned is a descent direction, it is used during the line search. Otherwise, a steepest descent direction is used during the line search. The perturbation is decreased as long as the Krylov subspace method reports success and increased if further problems are encountered. There are three cases: initializing, increasing, and decreasing the perturbation. These cases are described below.

    1. If is zero and a problem was detected with either the direction on the Krylov subspace method, the perturbation is initialized to

    where imin is set with the command line argument -tao_nls_imin <double> with a default value of 10-4, imfac by -tao_nls_imfac with a default value of 0.1, and imax by -tao_nls_imax with a default value of 100. When using the gltr method to solve the system of equations, an estimate of the minimum eigenvalue of the Hessian matrix is available. This value is use to initialize the perturbation to .
    2. If is nonzero and a problem was detected with either the direction or Krylov subspace method, the perturbation is increased to

    where pgfac is set with the command line argument -tao_nls_pgfac with a default value of 10, pmgfac by -tao_nls_pmgfac with a default value of 0.1, and pmax by -tao_nls_pmax with a default value of 100.
    3. If is nonzero and no problems were detected with either the direction or Krylov subspace method, the perturbation is decreased to

    where psfac is set with the command line argument -tao_nls_psfac with a default value of 0.4, and pmsfac by -tao_nls_pmsfac with a default value of 0.1. Moreover, if then , where pmin is set with the command line argument -tao_nls_pmin and has a default value of 10-12.

When using stcg or gltr to solve the linear systems of equation, a trust-region radius need to be initialized and updated. This trust-region radius limits the size of the step computed. The method for initializing the trust-region radius is set with the command line argument -tao_nls_init_type <constant,direction,interpolation>; interpolation, which chooses an initial value based on the interpolation scheme found in [(ref CGT)], is the default. This scheme performs a number of function and gradient evaluations to determine a radius such that the reduction predicted by the quadratic model along the gradient direction coincides with the actual reduction in the nonlinear function. The iterate obtaining the best objective function value is used as the starting point for the main line-search algorithm. The constant method initializes the trust-region radius by using the value specified with the -tao_trust0 <double> command line argument, where the default value is 100. The direction technique solves the first quadratic optimization problem by using a standard conjugate gradient method and initializes the trust-region to .

Finally, the method for updating the trust-region radius is set with the command line argument -tao_nls_update_type <step,reduction,interpolation>; step is the default. The step method updates the trust-region radius based on the value of . In particular,

where and are constants. The reduction method computes the ratio of the actual reduction in the objective function to the reduction predicted by the quadratic model for the full step, , where qk is the quadratic model. The radius is then updated as:

where and are constants. The interpolation method uses the same interpolation mechanism as in the initialization to compute a new value for the trust-region radius.