There are plenty of
slurm documentation online on the Net.
- Official SLURM documentation (en)
- SLURM Job Scheduler (for users) - LRI classes by Corentin Tallec & Diviyan Kalainathan (fr)
- SLURM Introduction at DKRZ
- Commandes SLURM at AMU
- Ateliers Sequenceur SLURM - exercises at LRI
- INRIA's Titanic cluster doc (en)
Quality of Service#
There are 5 different QoS on Lab-IA:
|Max jobs per user
|Max GPU per user
This QoS allows a user to run up to 6 jobs with up to 6 GPU for up to 24 hours. Jobs running on this QoS are uninterruptible, meaning that requested resources will be assign to a user for the duration of the jobs. If the jobs exceed 24 hours, Slurm will kill all its process to reclaim the resources. If a job ends earlier, the resources are freed.
This QoS works the same way that default does. The only difference is that jobs running on preempt are interruptible. If someone runs a job on default or testing, it might stop a job running on preempt. This partition is intented to run extra jobs when Lab-IA is underused.
This QoS allows a user to run 1 job with up to 2 GPU for up to 30 minutes. It is intented for testing purposes only. Please use this QoS if you need to test that a job can run on a node before running it on other partitions.
This QoS allows a user to run a single job with up to 4 GPU on the pcie partition.
This Qos allows a user to run a single job with up to 4 GPU on the nvlink partition.
There are 4 different partitions on Lab-IA:
This is the default parition. It allows any user to access every nodes.
This is the testing partition. It allows any user to test his code on every types of nodes.
This is an exclusive partition. It allows a user to access every resources on a single node (CPU and memory) where GPU are connected with PCI Express. This partition must be used if a job needs to run multi-GPU jobs. Since using this partition will prevent any other user to access the node, please use it wisely.
This is an exclusive partition. It allows a user to access every resources on a single node (CPU and memory) on which GPU are connected with NVLink. This partition must be used if a job needs to run multi-GPU jobs. Since using this partition will prevent any other user to access the node, please use it wisely.