Writing and Submitting a Job on OzStar
Submitting a job on OzSTAR is a two step process:
1) First you have to request resources from OzSTAR. This will normally include the amount of memory you need, the number of CPUs you want to use and how long you expect to be using the resources. This step is called a ‘‘resource request”. An example resource request using english could be:
Writing a Job Submission Script
A submission script is just a shell script that is formatted in a specific way such that it contains both your resource request and your job step. Here is an example of a simple submission script called my_slurm_job.sh
. This submission script will run a python script called example_python_job.py
.
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --mem=100MB
#SBATCH --time=00:30:00
module purge
module load anaconda3/5.0.1
source activate py3
python example_python_job.py
There are four main components to a submission script: The Shebang, #SBATCH, Loading Modules and The Job Step.
1 The Shebang
#!/bin/bash
The first component is the very first line: #!/bin/bash
. This line has many names including: ‘shebang’, ‘shabang’, ‘hashbang’, ‘pound-bang’, ‘hash-pling’ and probably lots of other ridiculous sounding names. This line serves the purpose of defining which ‘interpreter’ to use when executing the script.
In this case we are effectively saying: “This is a bash (Bourne-Again Shell) script, so execute this script with a bash shell”.
Since all of your submission scripts will be likely be a bash script, this line will be the same for every submission script you write.
2 #SBATCH
#SBATCH --ntasks=1
#SBATCH --mem=100MB
#SBATCH --time=00:30:00
The second component is where you request resources and it includes all the lines that start with #SBATCH
. Any line in your submission script that begins with #SBATCH
is understood by Slurm to be a resource request or an additional option related to the job.
Notice the # in front. This can be easy to forget but is important.
In this example script we are requesting: 1 CPU (#SBATCH --ntasks=1
) and 100 MB of RAM (#SBATCH --mem=100MB
) for 30 minutes (#SBATCH --time=00:30:00
). Similarly if you wanted request 2 GB of RAM on a single CPU for 12 hours your resource request would look like this:
Example
#SBATCH --ntasks=1
#SBATCH --mem=2GB
#SBATCH --time=12:00:00
There are lots of additional options that you include here, for instance you can include the option to have SLURM send you an email when your job starts and ends.
Example:
#SBATCH --mail-user=name@swin.edu.au
#SBATCH --mail-type=ALL
You can see a complete list of parameters using man sbatch
. I have also listed more examples of options that may be useful at the end of this tutorial.
A good rule of thumb is: The more resources you request, the longer it will take for your job to start.
This means that asking for way more time and RAM than you actually need is not a good idea. Unfortunately, this tutorial can not tell you how much resources your job will require. You will have have to determine that for yourself. If you are using python, you can look into using packages such as memory_profiler and line_profiler to help estimate the memory usage and timing of a script.
3 Loading Modules
module purge
module load anaconda3/5.0.1
source activate py3
The third component is where you load all the modules that are necessary to run your script. OzSTAR uses modules to manage the software it has installed. If you want to use a module you first have to load it. To load a module use the following command: module load <module_name>/<software_version>
. You can search for modules using module spider <module_name>
. We will be executing a python script using the example submission script, hence we have to load the modules for anaconda and the Conda environment that all the packages are install in.
module purge
unloads all loaded modules.module load anaconda3/5.0.1
loads Anaconda. I recommend using theanaconda3/5.0.1
version of instead ofanaconda3/5.1.0
because there is a significant bug in the later version on OzSTAR related to Conda environments.source activate py3
loads a Conda environment that I have calledpy3
. In this environment I have installed all of the necessary packages required to executeexample_python_job.py
.
Here is a list of a few common modules that you might need to load: gcc/7.3.0
, hdf5/1.10.1
, openmpi/3.0.0
.
Loading modules is not necessary if they are already loaded in your session. You can load modules in the .bashrc
file in your home directory the same way you would load them in a bash script. If you load modules in your .bashrc
those modules will automatically get loaded into your environment every time you log into Ozstar. This means you only have to write load module anaconda3/5.0.1
once in your .bashrc
and then forget about it.
If you are using python I highly recommend that you use Conda environments when working on OzSTAR. Anaconda environments will save you lots of time, stress and effort in the long run.
4 Job Step
python example_python_job.py
This final component is where you say what you want to actually run on the requested resources. This is the job step. In the example, we have a single line python example_python_job.py
indicating that our job will run the example_python_job.py
script. It is also possible to list multiple job steps in a submission script and they will be performed one after another.
Example:
python example_python_job.py
python example_python_job2.py
python_example_python_job3.py
In this case, after the first job step is completed the second job step (example_python_job2.py
) will begin. In your resource request you will have to allow for the time and memory requirements for all of the job steps.
Submitting a Job Submission Script
After you have finished writing a submission script, you can submit it to Slurm with the sbatch
command.
>>> sbatch my_slurm_job.sh
sbatch: Submitted batch job 99999999
If the job is successfully submitted, it will respond with the jobid
that was assigned to the job. In this example the jobid
is 99999999.
You can also submit a job to the queue from within a submission script. This can be useful for automating a pipeline of scripts that need to be completed in a sequence.
You can find more documentaion related to the sbatch
command on the official Slurm website here.