The GPU clusters consists of two partition, study
and gpu
with 15 nodes in total. servant-[1:5]
belongs to the study
partition and all worker-*
to the gpu
partition. The configuration is listed in the following table.
Hostname | Partition | CPU-Cores | RAM | GPU | VRAM per GPU | TMP |
---|---|---|---|---|---|---|
servant-1 | study | 2 x 20 | 216 GB | 2 x Tesla P-100 | 16 GB | 800 GB |
servant-2 | study | 2 x 20 | 216 GB | 2 x Tesla P-100 | 16 GB | 800 GB |
servant-3 | study | 2 x 20 | 108 GB | 2 x GTX 1080 TI | 11 GB | 400 GB |
servant-4 | study | 2 x 20 | 108 GB | 2 x GTX 1080 TI | 11 GB | 400 GB |
servant-5 | study | 2 x 20 | 108 GB | 2 x GTX 1080 TI | 11 GB | 400 GB |
worker-[1:10] | gpu | 2 x 60 | 443 GB | 4 x NVIDIA A40 | 46 GB | 1000 GB |
All compute nodes have non-persistent local storage. Using --tmp
in your sbatch
script, will create an empty folder in /local/slurmjobs/$SLURM_JOB_ID
as well a new environment variable $SLURM_JOB_TMP
with exact this path. This folder can be used as a temporal storage for the running job, e.g. for loaded models, datasets, etc.