Instructor Notes

This is a placeholder file. Please add content here.

Introduction


Instructor Note

The challenges can be done together with the researchers input.



Monitoring a Jobs performance


Instructor Note

You might get questions as to why srun should be used. In many cases it’s not important, but srun helps Slurm collect CPU efficiency, memory usage, and IO data about the command it’s being used to run. Which is important for this purpose!

The most beneficial aspect of using srun inside sbatch is that if the job fails or is cancelled, the CPU efficiency, memory usage, and IO data is saved, which makes seff and sacct still useful. If srun is not used, performance data from seff and sacct are discarded if the job ends prematurely.



Instructor Note

You may wish to explain how to distinguish processes of interest from system processes. Usually, the process of interest will be identifiable by the command that is being run.



Job Arrays


Instructor Note

Intro to Linux Command Line is specified as a prerequisite, so learners should know how to work with. But due to the infrequency of these workshops, this isn’t guaranteed, so you might want to remind people what environment variables are.



Instructor Note

You may wish to highlight that you cannot change the resource request between job array tasks. Everything controlled by an sbatch option is fixed between all the array tasks!



Organising dependent Slurm jobs


Instructor Note

As these jobs are quite small, it is quite hard to capture the full effect of using the aftercorr condition. This is because Slurm on Milton only processes the queue every 30s or so, and the pi-cpu program is expected to last only approximately 20s at most.

If you have time and wish to show the learners the full effect, the following scripts should work:

BASH

#!/bin/bash
# this job is submitted first. It sleeps for ID * 60s
# e.g., ID 2: sleeps for 120s.
# it then prints the date.

#SBATCH --job-name=1st
#SBATCH --output=%x-%a.out
#SBATCH --array=1-5

let "t = 60 * ${SLURM_ARRAY_TASK_ID}"
sleep $t
date

BASH

#!/bin/bash
# this job is submitted second and just prints the date.

#SBATCH --job-name=2nd
#SBATCH --output=%x-%a.out
#SBATCH --array=1-5

date


R and Python Slurm scripts