Introduction


  • Slurm, together with Linux system tools, can help you ensure jobs are utilizing resources effectively.
  • Slurm job arrays is a neater solution to submitting many similar jobs.
  • Slurm job dependencies can help you organise pipelines.

Monitoring a Jobs performance


  • Requesting more resources from Slurm doesn’t mean your job knows how to use them!
    • Many programs don’t work in parallel by default - either that functionality doesn’t exist, or needs to be turned on!
    • More CPUs doesn’t always mean an equivalent speedup!
  • Slurm offers multiple utilities to monitor your jobs. Each serving a slightly different purpose
    • squeue is for running/pending jobs and only provides status/request information
    • sacct and seff is best for complete jobs and provides resource utilisation where available
    • sstat is for running jobs and provides a snapshot of resource utilisation
  • The htop system tool is a great way to get live information about how effective your job is
    • is more robust and provides more details than Slurm monitoring tools
  • nvtop offers something similar to htop, but for GPU processes.

Job Arrays


  • Slurm job arrays are a great way to parallelise similar jobs!
  • The SLURM_ARRAY_TASK_ID environment variable is used to control individual array tasks’ work
  • A file with all the parameters can be used to control array task parameters
  • readarray and read are useful tools to help you parse files. But it can also be done many other ways!

Organising dependent Slurm jobs


  • Make a job depend on another with the --dependency or -d flag for sbatch
    • dependency conditions can be used to control when the dependant job starts
    • the full list of conditions can be better understood from man sbatch.
  • job dependencies can be combined with job arrays to create pipelines!

R and Python Slurm scripts


  • Besides the code itself, the only real difference between a bash Slurm script and a Python or R Slurm script, is the hash-bang statement!
  • You can change the hash-bang statement to the python or Rscript interpreter you wish to use, or you can make use of /usr/bin/env python to determine which interpreter to use from your environment.
  • When using Slurm environment variables in a Python or R script, the same environment variables are available to you, but you must access them in the Python/R way.
  • Using Python or R Slurm scripts means you can
    1. program in a language more familiar to you
    2. make use of their broad functionality and packages
    3. remove the need to have a wrapper Slurm script around your Python/R scripts.