24.2. Limited-Memory, Variable-Metric Method

The limited-memory, variable-metric method solves the system of equations

where Hk is a positive definite approximation to the Hessian matrix obtained by using the BFGS update formula with a limited number of previous iterates and gradient evaluations. The inverse of Hk can readily be applied to obtain the direction dk. Having obtained the direction, a More-Thuente line search is applied to compute a step length, , that approximately solves the one-dimensional optimization problem

The current iterate and Hessian approximation are updated and the process is repeated until the method converges. This algorithm is the default unconstrained minimization solver and can be selected using the TaoMethod tao_lmvm. For best efficiency, function and gradient evaluations should be performed simultaneously when using this algorithm.

The primary factors determining the behavior of this algorithm are the number of vectors stored for the Hessian approximation and the scaling matrix used when computing the direction. The number of vectors stored can be set with the command line argument -tao_lmm_vectors <int>; 5 is the default value. Increasing the number of vectors results in a better Hessian approximation and can decrease the number of iterations required to compute a solution to the optimization problem. However, as the number of vectors increases, more memory is consumed and each direction calculation takes longer to compute. Therefore, a trade off must be made between the quality of the Hessian approximation, the memory requirements, and the time to compute the direction.

During the computation of the direction, the inverse of an initial Hessian approximation H0,k is applied. The choice of H0,k has a significant impact on the quality of the direction obtained and can result in a decrease in the number of function and gradient evaluations required to solve the optimization problem. However, the calculation of H0,k at each iteration can have a significant impact on the time required to update the limited-memory BFGS approximation and the cost of obtaining the direction. By default, H0,k is a diagonal matrix obtained from the diagonal entries of a Broyden approximation to the Hessian matrix. The calculation of H0,k can be modified with the command line argument -tao_lmm_scale_type <none,scalar,broyden>. Each scaling method is described below. The scalar and broyden techniques are inspired by [(ref Gilbert-Lemarachal)].

{ none}
This scaling method uses the identity matrix as H0,k. No extra computations are required when obtaining the search direction or updating the Hessian approximation. However, the number of functions and gradient evaluations required to converge to a solution is typically much larger than the number required when using other scaling methods.
{ scalar}
This scaling method uses a multiple of the identity matrix as H0,k. The scalar value is chosen by solving the one-dimensional optimization problem

where is given, and S and Y are the matrices of past iterate and gradient information required by the limited-memory BFGS update formula. The optimal value for can be written down explicitly. This choice of attempts to satisfy the secant equation . Since this equation cannot typically be satisfied by a scalar, a least norm solution is computed. The amount of past iterate and gradient information used is set by the command line argument tao_lmm_scalar_history <int>, which must be less than or equal to the number of vectors kept for the BFGS approximation. The default value is 5. The choice for is made with the command line argument tao_lmm_scalar_alpha <double>; 1 is the default value. This scaling method offers a good compromise between no scaling and broyden scaling.

{ broyden}
This scaling method uses a positive-definite diagonal matrix obtained from the diagonal entries of the Broyden approximation to the Hessian for the scaling matrix. The Broyden approximation is a family of approximations parametrized by a constant ; gives the BFGS formula and gives the DFP formula. The value of is set with the command line argument -tao_lmm_broyden_phi <double>. The default value for is 0.125. This scaling method requires the most computational effort of available choices, but typically results in a significant reduction in the number of function and gradient evaluations taken to compute a solution.

An additional rescaling of the diagonal matrix can be applied to further improve performance when using the broyden scaling method. The rescaling method can be set with the command line argument -tao_lmm_rescale_type <none,scalar,gl>; scalar is the default rescaling method. The rescaling method applied can have a large impact on the number of function and gradient evaluations necessary to compute a solution to the optimization problem, but increases the time required to update the BFGS approximation. Each rescaling method is described below. These techniques are inspired by [(ref Gilbert-Lemarachal)].

{ none}
This rescaling method does not modify the diagonal scaling matrix.
{ scalar}
This rescaling method chooses a scalar value by solving the one-dimensional optimization problem

where and are given, H0,k is the positive-definite diagonal scaling matrix computed by using the Broyden update, and S and Y are the matrices of past iterate and gradient information required by the limited-memory BFGS update formula. This choice of attempts to satisfy the secant equation . Since this equation cannot typically be satisfied by a scalar, a least norm solution is computed. The scaling matrix used is then . The amount of past iterate and gradient information used is set by the command line argument tao_lmm_rescale_history <int>, which must be less than or equal to the number of vectors kept for the BFGS approximation. The default value is 5. The choice for is made with the command line argument tao_lmm_rescale_alpha <double>; 1 is the default value. The choice for is made with the command line argument tao_lmm_rescale_beta <double>; 0.5 is the default value.

{ gl}
This scaling method is the same as the scalar rescaling method, but the previous value for the scaling matrix H0,k-1 is used when computing . This is the rescaling method suggested in [(ref Gilbert-Lemarachal)].

Finally, a limit can be placed on the difference between the scaling matrix computed at this iteration and the previous value for the scaling matrix. The limiting type can be set with the command line argument -tao_lmm_limit_type <none,average,relative,absolute>; none is the default value. Each of these methods is described below when using the scalar scaling method. The techniques are the same when using the broyden scaling method, but are applied to each entry in the diagonal matrix.

{ none}
Set , where is the value computed by the scaling method.
{ average}
Set , where is the value computed by the scaling method, is the previous value, and is given.
{ relative}
Set , where is the value computed by the scaling method, is the previous value, and is given.
{ absolute}
Set , where is the value computed by the scaling method, is the previous value, and is given.

The value for is set with the command line argument -tao_lmm_limit_mu <double>; 1 is the default value. The value for is set with the command line argument -tao_lmm_limit_nu <double>. The default value is 100.

The default values for the scaling are based on many tests using the unconstrained problems from the MINPACK-2 test set. These tests were used to narrow the choices to a few sets of values. These values were then run on the unconstrained problems from the CUTEr test set to obtain the default values supplied.