Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

How to use and get the best out of slurm, slurm scripts and efficient use fo the cluster.
LINKS TO BE UPDATED STILL

Slurm

Jobs on eRI are submitted in the form of a batch script containing the code you want to run and a header of information needed by our job scheduler Slurm.
For full documentation on slurm and it’s usage see the Slurm website

Creating a batch script

Create a new file and open it with eg nano myjob.sl, the following should be considered as required for a job to start.

Code Block
#!/bin/bash -e
#SBATCH --job-name=SerialJob # job name (shows up in the queue)
#SBATCH --account=2024-mjb-sandbox # project to record usage against
#SBATCH --time=00:01:00      # Walltime (days-HH:MM:SS)
#SBATCH --mem=512MB          # Memory in MB or GB

pwd # Prints working directory

Copy in the above text and save and exit the text editor with 'ctrl + x'.

Note:#!/bin/bashis expected by Slurm
Note: if you are a member of multiple accounts/projects you should add the line #SBATCH --account=<projectcode>use the relevant project code for the work to apply fairshare correctly.

Submitting

Jobs are submitted to the scheduler using:

...

You should receive an output Submitted batch job 1748836

sbatchcan take command line arguments similar to those used in the shell script through SBATCH pragmas

You can find more details on its use on the Slurm Documentation123456

Job Queue

The currently queued jobs can be checked using 

...

Code Block
squeue -u <userid>@agresearch.co.nz
squeue -u matt.bixley@agresearch.co.nz

You can also filter to just your jobs using

Code Block

squeue --me

You can find more details on its use on the Slurm DocumentationYou can check all jobs submitted by you in the past day using:

Code Block
sacct

Or since a specified date using:

Code Block
               #all your job today
sacct -S YYYY-MM-DD

Each job will show as multiple lines, one line for the parent job and then additional lines for each job step.

Tips

sacct -X Only show parent processes.

...

 #since a specified date
sacct -X            #Only show parent processes.
sacct --state=PENDING/RUNNING/FAILED/CANCELLED/TIMEOUT # Filter jobs by

...

 state

Interactive Jobs

You can create an interactive session on the compute nodes (CPUs, MEM, time) for testing code and resource usage. Rather than using the login node which can result in system slowdowna nd blockages

...

Job Efficiency

How did my job run, did I waste resourceswhat resources we acctually used. The outcome of which is others users are potentially blocked and/or your priority lowers. seff <JOBID>

...

A complete list of Slurm commands can be found here, or by entering man slurm into a terminal

sbatchbatch

sbatch submit.sl

Submits the Slurm script submit.sl

squeue

squeue

Displays entire queue.

squeue --me

Displays your queued jobs.

squeue -p compute

Displays queued jobs on the compute partition.

sacct

sacct

Displays all the jobs run by you that day.

sacct -S 2024-01-01

Displays all the jobs run by you since the 1st Jan 2024

sacct -j 123456

Displays job 123456

scancel

scancel 123456

Cancels job 123456

scancel --me

Cancels all your jobs.

sshare

sshare -U

Shows the Fair Share scores for all projects of which you are a member.

sinfo

sinfo

Shows the current state of the Slurm partitions.

 

...

 

...

 

 

 

sbatch options

A complete list of sbatch options can be found here, or by running “man sbatch”

Options can be provided on the command line or in the batch file as an #SBATCH directive.  The option name and value can be separated using an '=' sign e.g. #SBATCH --account=nesi999992024-mjb-sandbox or a space e.g. #SBATCH --account nesi999992024-mjb-sandboxBut not both!

General options

...

Filter by label (Content by label)
showLabelsfalse
max5
spacescom.atlassian.confluence.content.render.xhtml.model.resource.identifiers.SpaceResourceIdentifier@1855c
sortmodified
showSpacefalse
reversetrue
typepage
labelskb-how-to-article
cqllabel = "kb-how-to-article" and type = "page" and space = "eRI"