The limited-memory, variable-metric method solves the system of equations
where Hk is a positive definite approximation to the Hessian matrix
obtained by using the BFGS update formula with a limited number of
previous iterates and gradient evaluations. The inverse of Hk can
readily be applied to obtain the direction dk. Having obtained the
direction, a More-Thuente line search is applied to compute a step
length,
, that approximately solves the one-dimensional
optimization problem
The current iterate and Hessian approximation are updated and the process is repeated until the method converges. This algorithm is the default unconstrained minimization solver and can be selected using the TaoMethod tao_lmvm. For best efficiency, function and gradient evaluations should be performed simultaneously when using this algorithm.
The primary factors determining the behavior of this algorithm are the number of vectors stored for the Hessian approximation and the scaling matrix used when computing the direction. The number of vectors stored can be set with the command line argument -tao_lmm_vectors <int>; 5 is the default value. Increasing the number of vectors results in a better Hessian approximation and can decrease the number of iterations required to compute a solution to the optimization problem. However, as the number of vectors increases, more memory is consumed and each direction calculation takes longer to compute. Therefore, a trade off must be made between the quality of the Hessian approximation, the memory requirements, and the time to compute the direction.
During the computation of the direction, the inverse of an initial Hessian approximation H0,k is applied. The choice of H0,k has a significant impact on the quality of the direction obtained and can result in a decrease in the number of function and gradient evaluations required to solve the optimization problem. However, the calculation of H0,k at each iteration can have a significant impact on the time required to update the limited-memory BFGS approximation and the cost of obtaining the direction. By default, H0,k is a diagonal matrix obtained from the diagonal entries of a Broyden approximation to the Hessian matrix. The calculation of H0,k can be modified with the command line argument -tao_lmm_scale_type <none,scalar,broyden>. Each scaling method is described below. The scalar and broyden techniques are inspired by [(ref Gilbert-Lemarachal)].
where
is given, and S and Y are the matrices of
past iterate and gradient information required by the limited-memory
BFGS update formula. The optimal value for
can be written
down explicitly. This choice of
attempts to satisfy the
secant equation
. Since this equation cannot typically
be satisfied by a scalar, a least norm solution is computed. The amount
of past iterate and gradient information used is set by the command line
argument tao_lmm_scalar_history <int>, which must be less than
or equal to the number of vectors kept for the BFGS approximation.
The default value is 5. The choice for
is made with the command
line argument tao_lmm_scalar_alpha <double>; 1 is the default
value. This scaling method offers a good compromise between no scaling
and broyden scaling.
where
and
are given, H0,k is the
positive-definite diagonal scaling matrix computed by using the Broyden
update, and S and Y are the matrices of past iterate and gradient
information required by the limited-memory BFGS update formula. This
choice of
attempts to satisfy the secant equation
. Since this equation cannot typically be satisfied
by a scalar, a least norm solution is computed. The scaling matrix used is
then
. The amount of past iterate and gradient information
used is set by the command line argument
tao_lmm_rescale_history <int>, which must be less than or equal
to the number of vectors kept for the BFGS approximation. The default value
is 5. The choice for
is made with the command
line argument tao_lmm_rescale_alpha <double>; 1 is the default
value. The choice for
is made with the command line argument
tao_lmm_rescale_beta <double>; 0.5 is the default value.
The default values for the scaling are based on many tests using the unconstrained problems from the MINPACK-2 test set. These tests were used to narrow the choices to a few sets of values. These values were then run on the unconstrained problems from the CUTEr test set to obtain the default values supplied.