Shannon Brown, October 19, 2005

This README file explains two ways of running multilevel parallel jobs in
DAKOTA using the new mpiexec (developed at SNL's request by Pete Wyckoff)
capable of running in multilevel parallel mode.  By multilevel parallel, we
mean the case where DAKOTA spawns multiple concurrent analyses, each of which
is running in parallel.  This directory contains four files in addition to the
README:

* A PBS format job submission script (pbs_submission);

* Two DAKOTA input files (dakota_pstudy_fork.in and dakota_pstudy_system.in);

* A mpiexec wrapper script for 'text_book_par' (text_book_par_driver);

The DAKOTA input files define parameter study jobs which have been set up to
use either the fork or system interfaces available in DAKOTA.  Each of these
classes of file (job submission, input and wrapper) will be discussed in turn.

pbs_submission
==============

On the first line, the BASH invocation (via #!/bin/bash) picks up any local
shell resource settings (including those where DAKOTA is added to the $PATH).
The next two lines are resource allocation lines (those lines beginning with
'#PBS'); the significance of these settings will be made clear in the following
two sections, but to summarize: the number of concurrent jobs multiplied by the
number of processors each job runs on must be equal to the number of nodes
multiplied by the processors per node in the resource allocation.

The job submission script then sets the $DAKOTA environment variable to a
system dependent directory where DAKOTA has been installed.  There are several
alternatives already in the script, any of which may be uncommented.  The job
submission script then adds '$DAKOTA/bin' and '$DAKOTA/test' to the beginning
of $PATH.  If you already have DAKOTA in your $PATH, you may comment out all
lines concerning the environment variables $DAKOTA and $PATH.  The job
submission script uses BASH as the default shell, though it is easily changed
to use your shell resource settings if you are a CSH/TCSH user.

When the new mpiexec is being added to the $PATH, there is another issue; the
importance of knowing what version of SSH will be used by mpiexec.  This
happens because mpiexec has to log into compute nodes without a password
challenge.  If a version of SSH with a token-based authentication system
(Kerberos, for instance) is being used, nothing has to be done other than
initializing the token ('kinit' or 'k5init' for Kerberos).  If the version of
SSH being used doesn't have that capability (OpenSSH for instance), a method
of logging in without password-challenge must be used.  Public key
authentication with a null-passphrase is an acceptable solution in this case.

The job submission script then changes the current working directory to the
present directory (this may be modified), starts a copy of mpiexec running in
server mode (using the flag '-server'), and places it in the background.  If
you already have a multilevel parallel-capable version of mpiexec in your
$PATH, the absolute path may be removed.  Additionally, to receive information
on how the jobs are being tiled onto compute nodes, one or more '-verbose'
flags may be specified after '-server', with each additional flag increasing
the level of detail.  Finally, DAKOTA is started using one of the input files
described in the next section, depending on whether the fork or system
interface is desired.

dakota_pstudy_fork.in and dakota_pstudy_system.in
=================================================

Both DAKOTA input files run a number of parameter study jobs using the parallel
version of the 'text_book' program in '$DAKOTA/tests' called 'text_book_par'.
The only difference between the two of them is the use of different interfaces
(fork in one, system in the other).  This section will discuss the keywords
'evaluation_concurrency', 'analysis_driver', 'file_tag' and 'file_save'.  The
number of jobs is controlled by the keyword 'evaluation_concurrency', which is
chosen according to constraints set in the PBS job submission script (see
previous section).  The executable to be used to run each job is set using the
'analysis_driver' keyword.

Using either interface, DAKOTA is invoked after an instance of mpiexec running
in server mode is started and placed in the background (described in the job
submission script section above).  DAKOTA does not itself run mpiexec in either
case, but rather invokes 5 concurrent instances of the analysis driver (by
setting the keyword 'evaluation_concurrency' to 5).  In this case, the analysis
driver is the 'text_book_par_driver' wrapper script for mpiexec responsible for
running 'text_book_par' with 2 processors (see 'text_book_par_driver' section
below).  Each job then tiles onto the original allocation of 10 processors (5
instances * 2 processors/instance is 10 processors).

It should also be noted that 'file_tag' and 'file_save' are specified.  The
'file_tag' keyword is specified in order to save parameter and results
information in the current directory.  If it is not specified, DAKOTA will use
'tmpnam' to pick distinct file names to save parameter and results information
to, often in '/tmp'.  Since '/tmp' is local to each compute node, the spawned
compute jobs can't find the unique 'tmpnam' generated file names DAKOTA passes
them because they only exist on the compute node where DAKOTA is running.

text_book_par_driver
====================

This wrapper script around 'text_book_par' passes in the arguments from DAKOTA
for the parameter and results files for whatever evaluation number this
instance of 'text_book_par' is assigned to compute for.  It invokes mpiexec
with 2 processors (via '-np 2') as discussed in the previous section; this is
the second factor which must be taken into account in the resource allocation
lines in the job submission script (see 'pbs_submission' section above).

It should be noted that 'text_book_par_driver' has an additional operation in
it which is only necessary when the 'system' interface has been chosen.  The
operation is a simple file move which provides enough time to avoid a race
condition with DAKOTA when the system call spawns the analysis.  As with the
job submission script, BASH has been set as the shell here, but may be changed
to CSH as long as the Bourne shell constructs ($1, $2, $$) change too.
