| 170 | |
| 171 | == Running PINNICLE on Lonestar6 == |
| 172 | Lonestar supports container by a software called `apptainer` [https://apptainer.org]. A precompiled image with Tensorflow v.2 backend is available at `docker://chenggongdartmouth/pinnicle_ls6:v0.1` |
| 173 | |
| 174 | You need to build this apptainer image from the Docker on Lonestar6. |
| 175 | First, irst create an interactive session in LS6's `gpu-a100-dev` or `gpu-a100` queue: |
| 176 | {{{ |
| 177 | idev -t 1:00:00 -N 1 -n 4 -p gpu-a100-dev |
| 178 | }}} |
| 179 | You will need to load `cuda` and `apptainer` module as follows |
| 180 | {{{ |
| 181 | module load cuda/11.4 cudnn/8.2.4 nccl/2.11.4 |
| 182 | module load tacc-apptainer |
| 183 | }}} |
| 184 | |
| 185 | Move to your `<YOUR_WORKING_PATH>` directory on Lonestar6, it is in the format of `/work/xxxxx/yourname/ls6` |
| 186 | |
| 187 | Build the apptainer image from the Docker **with** `--nv` |
| 188 | {{{ |
| 189 | apptainer build --nv <YOUR_WORKING_PATH>/<YOUR_IMAGE_NAME> docker://chenggongdartmouth/pinnicle_ls6:v0.1 |
| 190 | }}} |
| 191 | |
| 192 | After building the image, you can run this Docker image by |
| 193 | {{{ |
| 194 | apptainer shell --nv <YOUR_WORKING_PATH>/<YOUR_IMAGE_NAME> |
| 195 | }}} |
| 196 | |
| 197 | You can also submit a job in the queue with the following script: |
| 198 | {{{ |
| 199 | #!/bin/bash |
| 200 | |
| 201 | #SBATCH -J job_name # job name |
| 202 | #SBATCH -o output.%j # output file named, output.jobID |
| 203 | #SBATCH -e error.%j # error file named, error.jobID |
| 204 | #SBATCH -p gpu-a100 # queue name |
| 205 | #SBATCH -N 1 # number of nodes requested |
| 206 | #SBATCH --ntasks-per-node 4 # tasks per node |
| 207 | #SBATCH -t 10:00:00 # time, hh:mm:ss |
| 208 | #SBATCH --mail-user=<EMAIL_ADDRESS> |
| 209 | #SBATCH --mail-type=all |
| 210 | |
| 211 | module load cuda/11.4 cudnn/8.2.4 nccl/2.11.4 |
| 212 | module load tacc-apptainer |
| 213 | apptainer exec --nv <YOUR_WORKING_PATH>/<YOUR_IMAGE_NAME> python test.py |
| 214 | }}} |