Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

Release Update - 07 March 2024

New and Improved

  • Valid Slurm account required due to enablement of Fairshare (see Slurm: Reference Guide )

  • Implement default settings for per job requested memory-per-CPU (3500MB/CPU)

Fixes

N/A

Explanation

Last week we implemented some configuration changes to the HPC cluster workload manager (Slurm), aiming to improve the eRI HPC experience and bring it closer to how other Slurm clusters that NeSI operates (e.g. Mahuika) work.

In particular, fairshare has now been enabled. This means that Slurm will balance priority of queued jobs from different projects, depending on the recent usage patterns of those projects. E.g. a project which has used lots of cluster resources recently will have a lower fairshare score, and so that project’s jobs will fall lower in the queue, as compared to a project which has not used the cluster at all recently. This helps ensure that no one project can monopolise cluster resources by putting lots of work into the queue. More information on how fairshare works (on NeSI) and how to interpret it is available here https://support.nesi.org.nz/hc/en-gb/articles/360000743536-Fair-Share, In particular the section How does Fair Share work? For the AgResearch eRI cluster there are no allocations sizes so fairshare reflects relative use to others on the cluster.

The enablement of fairshare also means it is now necessary to require that all jobs be associated to a Slurm account, i.e.,
#SBATCH --account=2024-mjb-sandbox and for more information on slurm commands for eRI see Slurm: Reference Guide

We have / also plan to implement default settings for per job requested memory-per-cpu (3.5GB/cpu) and time limits, this will ensure that jobs that do not specify their required memory and time will not accidentally block all available memory on a compute node or run forever. We also plan to implement a maximum time limit, though would like to gather input from users before making this change.

  • No labels