wiki:lonestar

Getting an account

more to come...

ssh configuration

You can add the following lines to ~/.ssh/config on your local machine:

Host lonestar ls5.tacc.utexas.edu
   HostName ls5.tacc.utexas.edu
  User YOURUSERNAME
  HostKeyAlias ls5.tacc.utexas.edu
  HostbasedAuthentication no

and replace YOURUSERNAME by your lonestar5 username.

Once this is done, you can ssh lonestar5 by simply doing:

ssh lonestar

Password-less ssh

Once you have the account, you can setup a public key authentication in order to avoid having to input your password for each run. You need to have a SSH public/private key pair. If you do not, you can create a SSH public/private key pair by typing the following command and following the prompts (no passphrase necessary):

$your_localhost% ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/Users/username/.ssh/id_rsa):RETURN
Enter passphrase (empty for no passphrase):RETURN
Enter same passphrase again:RETURN
Your identification has been saved in /Users/username/.ssh/id_rsa.
Your public key has been saved in /Users/username/.ssh/id_rsa.pub.

Two files were created: your private key /Users/username/.ssh/id_rsa, and the public key /Users/username/.ssh/id_rsa.pub. The private key is read-only and only for you, it is used to decrypt all correspondence encrypted with the public key. The contents of the public key need to be copied to ~/.ssh/authorized_keys on your lonestar account:

$your_localhost%scp ~/.ssh/id_rsa.pub username@your_remosthost:~

Now on lonestar, copy the content of id_rsa.pub:

$your_remosthost%cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
$your_remosthost%rm ~/id_rsa.pub

Environment

On lonestar, add the following lines to ~/.bash_login:

export ISSM_DIR=PATHTOTRUNK
source $ISSM_DIR/etc/environment.sh

module load intel/18.0.2
module load gsl

Log out and log back in to apply this change.

Installing ISSM on lonestar

lonestar will only be used to run the code, you will use your local machine for pre and post processing, you will never use lonestar's matlab. You can check out ISSM and install the following packages:

  • PETSc (use the lonestar script, install-3.12-lonestar.sh or later)
  • m1qn3

Use the following configuration script (adapt to your needs):

./configure \
   --prefix=$ISSM_DIR \
   --enable-standalone-libraries \
   --with-wrappers=no \
   --with-metis-dir="$ISSM_DIR/externalpackages/petsc/install" \
   --with-petsc-dir=$ISSM_DIR/externalpackages/petsc/install \
   --with-petsc-arch=$ISSM_ARCH \
   --with-m1qn3-dir=$ISSM_DIR/externalpackages/m1qn3/install \
   --with-mpi-include="/opt/cray/pe/mpt/7.7.3/gni/mpich-intel/16.0/include/" \
   --with-mpi-libflags="-L/opt/cray/pe/mpt/7.7.3/gni/mpich-intel/16.0/lib/ -lmpich" \
   --with-mkl-libflags="-L$TACC_MKL_LIB -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm" \
   --with-mumps-dir=$ISSM_DIR/externalpackages/petsc/install/ \
   --with-scalapack-dir=$ISSM_DIR/externalpackages/petsc/install/ \
   --with-fortran-lib="-L/opt/apps/gcc/6.3.0/lib64/ -lgfortran -L/opt/intel/compilers_and_libraries_2018.2.199/linux/compiler/lib/intel64/ -lifcore -lifport" \
   --with-vendor="intel-lonestar"\
   --enable-debugging \
   --enable-development

Installing ISSM on Lonestar with Dakota

For Dakota to run, you you will still need to make PETSc, and m1qn3.

In addition, will need to build the external packages:

  • boost, install-1.55-lonestar.sh (It is ok if there is a message that the script failed updating 56 targets)
  • dakota, install-6.2-lonestar.sh

Then, add the following lines within the configure command of your configure.sh script:

   --with-boost-dir=$ISSM_DIR/externalpackages/boost/install \
   --with-dakota-dir=$ISSM_DIR/externalpackages/dakota/install \

lonestar_settings.m

You have to add a file in $ISSM_DIR/src/m entitled lonestar_settings.m with your personal settings on your local issm install:

cluster.login='seroussi';
cluster.codepath='/home1/03729/seroussi/trunk-jpl/bin/';
cluster.executionpath='/work/03729/seroussi/trunk-jpl/execution/';

use your username for the login and enter your code path and execution path. These settings will be picked up automatically by matlab when you do md.cluster=lonestar()

Note that the `executionpath' creates temporary binary files that can be removed once the job is complete. For this reason, you can set the path to be somewhere on the $SCRATCH filesystem, which is unlimited temporary storage on Lonestar5.

Running jobs on lonestar

On lonestar, each node has 24 cores and you can use any multiple of 24 for the total number of processors. The more nodes and the longer the requested time, the more you will have to wait in the queue. So choose your settings wisely:

md.cluster=lonestar('numnodes',2);

Before you run your job, make sure to open a port first and enter the port number in md.cluster. Here is a handy alias:

alias ls5tunnel='ssh -L 1099:localhost:22 ls5'

That will open port number 1099 that you can then use in ISSM so that you don't need to enter your password.

to have a job of 2 nodes, 12 cpus for nodes, so a total of 24 cores.

To submit a job on lonestar, do:

sbatch job.queue

Now if you want to check the status of your job and the queue you are using, type in the bash with the lonestar session:

showq -u USERNAME

You can delete your job manually by typing:

scancel JOBID

where JOBID is the ID of your job (indicated in the Matlab session). Matlab indicates too the directory of your job where you can find the files JOBNAME.outlog and JOBNAME.errlog. The outlog file contains the informations that would appear if you were running your job on your local machine and the errlog file contains the error information in case the job encounters an error.

Running PINNICLE on Lonestar6

Lonestar supports container by a software called apptainer https://apptainer.org. A precompiled image with Tensorflow v.2 backend is available at docker://chenggongdartmouth/pinnicle_ls6:v0.1

You need to build this apptainer image from the Docker on Lonestar6. First, irst create an interactive session in LS6's gpu-a100-dev or gpu-a100 queue:

idev -t 1:00:00 -N 1 -n 4 -p gpu-a100-dev

You will need to load cuda and apptainer module as follows

module load cuda/11.4 cudnn/8.2.4 nccl/2.11.4
module load tacc-apptainer

Move to your <YOUR_WORKING_PATH> directory on Lonestar6, it is in the format of /work/xxxxx/yourname/ls6

Build the apptainer image from the Docker with --nv

apptainer build --nv <YOUR_WORKING_PATH>/<YOUR_IMAGE_NAME> docker://chenggongdartmouth/pinnicle_ls6:v0.1

After building the image, you can run this Docker image by

apptainer shell --nv <YOUR_WORKING_PATH>/<YOUR_IMAGE_NAME>

You can also submit a job in the queue with the following script:

#!/bin/bash

#SBATCH -J job_name           # job name
#SBATCH -o output.%j          # output file named, output.jobID
#SBATCH -e error.%j           # error file named, error.jobID
#SBATCH -p gpu-a100           # queue name
#SBATCH -N 1                  # number of nodes requested
#SBATCH --ntasks-per-node 4   # tasks per node
#SBATCH -t 10:00:00           # time, hh:mm:ss
#SBATCH --mail-user=<EMAIL_ADDRESS>
#SBATCH --mail-type=all

module load cuda/11.4 cudnn/8.2.4 nccl/2.11.4
module load tacc-apptainer
apptainer exec --nv <YOUR_WORKING_PATH>/<YOUR_IMAGE_NAME> python test.py
Last modified 5 months ago Last modified on 07/10/24 10:51:13
Note: See TracWiki for help on using the wiki.