What is a HPC?


  • Using High Performance Computing (HPC) typically involves connecting to very large computing systems that provides a high computational power.
  • These systems can be used to do work that would either be impossible or much slower on smaller systems.
  • HPC resources are shared by multiple users.
  • The resources found on independent compute nodes can vary in volume and type (amount of RAM, processor architecture, availability of shared filesystems, etc.).
  • The standard method of interacting with HPC systems is via a command line interface.

Accessing Milton


  • HPC systems typically provide login nodes and a set of compute nodes.
  • Files saved on one node are available on all nodes.
  • Milton has multiple different file systems that have different policies and characteristics.
  • Throughout a research project, research data may move between file systems according to backup and retention requirements, and to improve performance.

Environment Modules


  • Load software with module load softwareName.
  • Unload software with module unload or module purge.
  • The module system handles software versioning and package conflicts for you automatically.

Lunch Break


Introducing Slurm


  • The scheduler handles how compute resources are shared between users.
  • A job is just a shell script.
  • Request slightly more resources than you will need.
  • Backfilling improves system utilisation and maximises job throughput. You can take advantage of backfilling by requesting only what you need.
  • Milton Slurm has multiple partitions with different specification that fit the different types of jobs.

Submitting a Job


  • sbatch is used to submit the job
  • squeue is used to list jobs in the Slurm queue
    • passing the -u <username> option will show jobs for just that user.
  • sacct is used to show job details
  • #SBATCH directives are used in submission scripts to set Slurm directives
  • Setting up job resources is a challenge and you might not get the first time

Evaluating Jobs


  • Use seff to evaluate completed jobs
  • Slurm Environment variables are handy to use in your script

Break


Slurm Commands


  • Slurm commands are handy to view information about queued jobs, nodes and partitions
  • You will commonly use sbatch, squeue, salloc, sinfo and sacct

Interactive Slurm Jobs


  • Use salloc to start a new interactive Slurm job on Milton.
  • Use --x11 with salloc to run remote graphics in your interactive job.