| | 170 | |
| | 171 | == Running PINNICLE on Lonestar6 == |
| | 172 | Lonestar supports container by a software called `apptainer` [https://apptainer.org]. A precompiled image with Tensorflow v.2 backend is available at `docker://chenggongdartmouth/pinnicle_ls6:v0.1` |
| | 173 | |
| | 174 | You need to build this apptainer image from the Docker on Lonestar6. |
| | 175 | First, irst create an interactive session in LS6's `gpu-a100-dev` or `gpu-a100` queue: |
| | 176 | {{{ |
| | 177 | idev -t 1:00:00 -N 1 -n 4 -p gpu-a100-dev |
| | 178 | }}} |
| | 179 | You will need to load `cuda` and `apptainer` module as follows |
| | 180 | {{{ |
| | 181 | module load cuda/11.4 cudnn/8.2.4 nccl/2.11.4 |
| | 182 | module load tacc-apptainer |
| | 183 | }}} |
| | 184 | |
| | 185 | Move to your `<YOUR_WORKING_PATH>` directory on Lonestar6, it is in the format of `/work/xxxxx/yourname/ls6` |
| | 186 | |
| | 187 | Build the apptainer image from the Docker **with** `--nv` |
| | 188 | {{{ |
| | 189 | apptainer build --nv <YOUR_WORKING_PATH>/<YOUR_IMAGE_NAME> docker://chenggongdartmouth/pinnicle_ls6:v0.1 |
| | 190 | }}} |
| | 191 | |
| | 192 | After building the image, you can run this Docker image by |
| | 193 | {{{ |
| | 194 | apptainer shell --nv <YOUR_WORKING_PATH>/<YOUR_IMAGE_NAME> |
| | 195 | }}} |
| | 196 | |
| | 197 | You can also submit a job in the queue with the following script: |
| | 198 | {{{ |
| | 199 | #!/bin/bash |
| | 200 | |
| | 201 | #SBATCH -J job_name # job name |
| | 202 | #SBATCH -o output.%j # output file named, output.jobID |
| | 203 | #SBATCH -e error.%j # error file named, error.jobID |
| | 204 | #SBATCH -p gpu-a100 # queue name |
| | 205 | #SBATCH -N 1 # number of nodes requested |
| | 206 | #SBATCH --ntasks-per-node 4 # tasks per node |
| | 207 | #SBATCH -t 10:00:00 # time, hh:mm:ss |
| | 208 | #SBATCH --mail-user=<EMAIL_ADDRESS> |
| | 209 | #SBATCH --mail-type=all |
| | 210 | |
| | 211 | module load cuda/11.4 cudnn/8.2.4 nccl/2.11.4 |
| | 212 | module load tacc-apptainer |
| | 213 | apptainer exec --nv <YOUR_WORKING_PATH>/<YOUR_IMAGE_NAME> python test.py |
| | 214 | }}} |