5. Performance Results

A major concern in the TAO project is the performance and scalability of optimization algorithms on large problems. In this section we focus on the GPCG (gradient projection, conjugate gradient) algorithm for the solution of bound-constrained convex quadratic programming problems. Originally developed by Moré and Toraldo [(ref more-toraldo)], the GPCG algorithm was designed for large-scale problems but had only been implemented for a single processor. GPCG combines the advantages of the identification properties of the gradient projection method with the finite termination properties of the conjugate gradient method. Moreover, the performance of the TAO implementation on large optimization problems is noteworthy.


Figure 2: The journal bearing problem with $$ = 0.9

We illustrate the performance of the GPCG algorithm by presenting results for a journal bearing problem with over 2.5 million variables. The journal bearing problem is a finite element approximation to a variational problem over a rectangular two-dimensional grid. A grid with 1600 points in each direction, for example, is formulated as a bound constrained quadratic problem with 16002=2,560,000 variables. The triangulation of the grid results in a matrix that has the usual five diagonal nonzero structure that arises from a difference approximation to the Laplacian operator. The journal bearing problem contains an eccentricity parameter, , that influences the number of active variables at the solution and the difficulty in solving it. Figure shows the solution of the journal bearing problem for . The steep gradient in the solution makes this problem a difficult benchmark.

The performance results in Table are noteworthy is several ways. First of all, the number of faces visited by GPCG is remarkably small. Other strategies can lead to a large number of gradient projection iterates, but the GPCG algorithm is remarkably efficient. Another interesting aspect of these results is that due to the low memory requirements of iterative solvers, we were able to solve these problems with only p = 8 processors. Strategies that rely on direct solvers are likely to need significantly more storage, and thus more processors. Finally, these results show that the GPCG implementation has excellent efficiency. For example, the efficiency of GPCG with respect to p = 8 processors ranges between and when . This sustained efficiency is remarkable since the GPCG algorithm is solving a sequence of linear problems with a coefficient matrix set to the submatrix of the Hessian matrix with respect to the free variables for the current iterate. Thus, our implementation's repartitioning of submatrices effectively deals with the load-balancing problem that is inherent in the GPCG algorithm.

An important aspect of our results that is not apparent from Table is that for these results we were able to experiment easily with all the preconditioners offered by PETSc. In particular, we were able to compare the diagonal Jacobi preconditioner with block Jacobi and overlapping additive Schwarz preconditioners that use a zero-fill ILU solver in each block. We also experimented with a parallel zero-fill incomplete Cholesky preconditioner provided by a PETSc interface to the BlockSolve95 [(ref bs-user-ref)] package of Jones and Plassmann. Interestingly enough, the diagonal Jacobi preconditioner achieved better performance on this problem.