We illustrate the performance of the GPCG algorithm by
presenting results for a journal bearing problem
with over 2.5 million variables.
The journal bearing problem
is a finite element approximation to a variational problem
over a rectangular two-dimensional grid. A
grid with 1600 points in each direction, for example, is formulated
as a bound constrained quadratic problem with 16002=2,560,000
variables.
The triangulation of the grid results in a matrix that has the
usual five diagonal nonzero structure that arises
from a difference approximation to the Laplacian operator.
The journal bearing problem contains an eccentricity parameter,
, that influences the number of active
variables at the solution and the difficulty in solving it.
Figure shows the solution of the journal bearing problem
for
. The steep gradient in the solution
makes this problem a difficult benchmark.
The performance results in Table are noteworthy is several
ways. First of all, the number of faces visited by GPCG is remarkably
small. Other strategies can lead to a large number of gradient
projection iterates, but the GPCG algorithm is remarkably efficient.
Another interesting aspect of these results is that due to the low
memory requirements of iterative solvers, we were able to solve these
problems with only p = 8 processors. Strategies that rely on
direct solvers are likely to need significantly more storage, and thus
more processors. Finally, these results show that the GPCG
implementation has excellent efficiency. For example, the efficiency
of GPCG with respect to p = 8 processors ranges between
and
when
. This sustained efficiency
is remarkable since the GPCG algorithm is solving a sequence of linear
problems with a coefficient matrix set to the submatrix of the Hessian
matrix with respect to the free variables for the current iterate.
Thus, our implementation's repartitioning of submatrices effectively
deals with the load-balancing problem that is inherent in the GPCG
algorithm.
An important aspect of our results that is not apparent from Table is that for these results we were able to experiment easily with all the preconditioners offered by PETSc. In particular, we were able to compare the diagonal Jacobi preconditioner with block Jacobi and overlapping additive Schwarz preconditioners that use a zero-fill ILU solver in each block. We also experimented with a parallel zero-fill incomplete Cholesky preconditioner provided by a PETSc interface to the BlockSolve95 [(ref bs-user-ref)] package of Jones and Plassmann. Interestingly enough, the diagonal Jacobi preconditioner achieved better performance on this problem.