Instructor Notes
This is a placeholder file. Please add content here.
Introduction
Instructor Note
The challenges can be done together with the researchers input.
Monitoring a Jobs performance
Instructor Note
You might get questions as to why srun
should be used.
In many cases it’s not important, but srun
helps Slurm
collect CPU efficiency, memory usage, and IO data about the command it’s
being used to run. Which is important for this purpose!
The most beneficial aspect of using srun
inside
sbatch
is that if the job fails or is cancelled, the CPU
efficiency, memory usage, and IO data is saved, which makes
seff
and sacct
still useful. If
srun
is not used, performance data from seff
and sacct
are discarded if the job ends prematurely.
Instructor Note
You may wish to explain how to distinguish processes of interest from system processes. Usually, the process of interest will be identifiable by the command that is being run.
Job Arrays
Instructor Note
Intro to Linux Command Line is specified as a prerequisite, so learners should know how to work with. But due to the infrequency of these workshops, this isn’t guaranteed, so you might want to remind people what environment variables are.
Instructor Note
You may wish to highlight that you cannot change the resource request
between job array tasks. Everything controlled by an sbatch
option is fixed between all the array tasks!
Organising dependent Slurm jobs
Instructor Note
As these jobs are quite small, it is quite hard to capture the full
effect of using the aftercorr
condition. This is because
Slurm on Milton only processes the queue every 30s or so, and the
pi-cpu
program is expected to last only approximately 20s
at most.
If you have time and wish to show the learners the full effect, the following scripts should work:
BASH
#!/bin/bash
# this job is submitted first. It sleeps for ID * 60s
# e.g., ID 2: sleeps for 120s.
# it then prints the date.
#SBATCH --job-name=1st
#SBATCH --output=%x-%a.out
#SBATCH --array=1-5
let "t = 60 * ${SLURM_ARRAY_TASK_ID}"
sleep $t
date
BASH
#!/bin/bash
# this job is submitted second and just prints the date.
#SBATCH --job-name=2nd
#SBATCH --output=%x-%a.out
#SBATCH --array=1-5
date