Introduction to SLURM

Slurm is an open-source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.

It provides three key functions:

  • allocating exclusive and/or non-exclusive access to resources (nodes) to users for some duration of time so they can perform work,
  • providing a framework for starting, executing, and monitoring work (typically a parallel job such as MPI) on a set of allocated nodes, and
  • arbitrating contention for resources by managing a queue of pending jobs.
