A Beginners Guide to OzSTAR

Writing and Submitting a Job on OzStar

Submitting a job on OzSTAR is a two step process:

1) First you have to request resources from OzSTAR. This will normally include the amount of memory you need, the number of CPUs you want to use and how long you expect to be using the resources. This step is called a ‘‘resource request”. An example resource request using english could be:

“I want to use 200 MB of memory on 1 CPU core for 2 hours”.
2) The second step is where you say what you want these resources to be doing. This includes specifying what software is needed, which scripts you want to run, and how to execute them. This step is called the ‘job step’. An example job step could be:

“I want to load python and then execute a python script”.
Luckily, you can do both of these steps at the same time using a “submission script”.

Writing a Job Submission Script

A submission script is just a shell script that is formatted in a specific way such that it contains both your resource request and your job step. Here is an example of a simple submission script called my_slurm_job.sh. This submission script will run a python script called example_python_job.py.

#!/bin/bash

#SBATCH --ntasks=1
#SBATCH --mem=100MB
#SBATCH --time=00:30:00

module purge
module load anaconda3/5.0.1

source activate py3

python example_python_job.py

There are four main components to a submission script: The Shebang#SBATCHLoading Modules and The Job Step.

1 The Shebang

#!/bin/bash 

The first component is the very first line: #!/bin/bash. This line has many names including: ‘shebang’, ‘shabang’, ‘hashbang’, ‘pound-bang’, ‘hash-pling’ and probably lots of other ridiculous sounding names. This line serves the purpose of defining which ‘interpreter’ to use when executing the script.

In this case we are effectively saying: “This is a bash (Bourne-Again Shell) script, so execute this script with a bash shell”.

Since all of your submission scripts will be likely be a bash script, this line will be the same for every submission script you write.

2 #SBATCH

#SBATCH --ntasks=1
#SBATCH --mem=100MB
#SBATCH --time=00:30:00

The second component is where you request resources and it includes all the lines that start with #SBATCH. Any line in your submission script that begins with #SBATCH is understood by Slurm to be a resource request or an additional option related to the job.

Note

Notice the # in front. This can be easy to forget but is important.

In this example script we are requesting: 1 CPU (#SBATCH --ntasks=1) and 100 MB of RAM (#SBATCH --mem=100MB) for 30 minutes (#SBATCH --time=00:30:00). Similarly if you wanted request 2 GB of RAM on a single CPU for 12 hours your resource request would look like this:

Example

#SBATCH --ntasks=1
#SBATCH --mem=2GB
#SBATCH --time=12:00:00

There are lots of additional options that you include here, for instance you can include the option to have SLURM send you an email when your job starts and ends.

Example:

#SBATCH --mail-user=name@swin.edu.au
#SBATCH --mail-type=ALL

You can see a complete list of parameters using man sbatch. I have also listed more examples of options that may be useful at the end of this tutorial.

Note

A good rule of thumb is: The more resources you request, the longer it will take for your job to start.

This means that asking for way more time and RAM than you actually need is not a good idea. Unfortunately, this tutorial can not tell you how much resources your job will require. You will have have to determine that for yourself. If you are using python, you can look into using packages such as memory_profiler and line_profiler to help estimate the memory usage and timing of a script.

3 Loading Modules

module purge
module load anaconda3/5.0.1

source activate py3

The third component is where you load all the modules that are necessary to run your script. OzSTAR uses modules to manage the software it has installed. If you want to use a module you first have to load it. To load a module use the following command: module load <module_name>/<software_version>. You can search for modules using module spider <module_name>. We will be executing a python script using the example submission script, hence we have to load the modules for anaconda and the Conda environment that all the packages are install in.

  • module purge unloads all loaded modules.
  • module load anaconda3/5.0.1 loads Anaconda. I recommend using the anaconda3/5.0.1 version of instead of anaconda3/5.1.0 because there is a significant bug in the later version on OzSTAR related to Conda environments.
  • source activate py3 loads a Conda environment that I have called py3. In this environment I have installed all of the necessary packages required to execute example_python_job.py.

Here is a list of a few common modules that you might need to load: gcc/7.3.0hdf5/1.10.1openmpi/3.0.0.

Tip

Loading modules is not necessary if they are already loaded in your session. You can load modules in the .bashrc file in your home directory the same way you would load them in a bash script. If you load modules in your .bashrc those modules will automatically get loaded into your environment every time you log into Ozstar. This means you only have to write load module anaconda3/5.0.1 once in your .bashrc and then forget about it.

Tip

If you are using python I highly recommend that you use Conda environments when working on OzSTAR. Anaconda environments will save you lots of time, stress and effort in the long run.

4 Job Step

python example_python_job.py

This final component is where you say what you want to actually run on the requested resources. This is the job step. In the example, we have a single line python example_python_job.py indicating that our job will run the example_python_job.py script. It is also possible to list multiple job steps in a submission script and they will be performed one after another.

Example:

python example_python_job.py
python example_python_job2.py
python_example_python_job3.py

In this case, after the first job step is completed the second job step (example_python_job2.py) will begin. In your resource request you will have to allow for the time and memory requirements for all of the job steps.

Submitting a Job Submission Script

After you have finished writing a submission script, you can submit it to Slurm with the sbatch command.

>>> sbatch my_slurm_job.sh
sbatch: Submitted batch job 99999999

If the job is successfully submitted, it will respond with the jobid that was assigned to the job. In this example the jobid is 99999999.

Note

You can also submit a job to the queue from within a submission script. This can be useful for automating a pipeline of scripts that need to be completed in a sequence.

You can find more documentaion related to the sbatch command on the official Slurm website here.

WordPress Theme built by Shufflehound.

Copyright © Astronomy Data and Compute Services

ADACS is delivered jointly by Swinburne University of Technology and Curtin University. ADACS is funded under Astronomy National Collaborative Research Infrastructure Strategy (NCRIS) Program via Astronomy Australia Ltd (AAL).