HPC resource manager: Slurm

14. HPC resource manager: Slurm#

Slurm, https://slurm.schedmd.com/documentation.html, is a resource manager and job scheduler that allows to organize and share resources in a HPC environment to get an optimal usage. It allows to specify usage policies, limits in terms of memory or cpu etc, to get metrics regarding the cluster usage, and so on. You can watch a good intro at https://www.youtube.com/watch?v=NH_Fb7X6Db0 and related videos.

https://hpc.llnl.gov/sites/default/files/styles/with_sidebar_1_up/public/workloadMgrWorkflow.png?itok=Hrhz5ZVZ

In particular, slurm follows the following arquitechture

Slurm common workflows:

Single-core jobs
Multi-core (OpenMP): Needs cpu per task specification
Multi-node (MPI): Need nodes specification
Array jobs for parameter sweeps

Here we will use a very simple installation in the computer room. Our goal will be to learn some of the basic commands to get the possible resources, how to specify a partition, limits, resources and so on.

In general, you should run all cluster jobs through slurm, not run them directly on each node.

For the computer room: you need to connect to the public server ip

First of all, log into a client and use the command sinfo to get information about partitions:

sinfo --all

PARTITION	AVAIL	TIMELIMIT	NODES	STATE	NODELIST
4threads	up	infinite	5	idle	sala[16-20]
6threads	up	infinite	3	idle	sala[13-15]
8threads	up	infinite	3	idle	sala[11-12,21]
12threads*	up	infinite	8	idle	sala[7-10,26-29]
16threads	up	infinite	9	idle	sala[2-6,22-25]
GPU	up	infinite	1	idle	sala2

As you can see in this example, there are several partitions available to be used. The 12threads partition is the default. Some nodes might not working and will be shown as down. There is not time limit besides the login node (which actually should not be used for any job). Use the manual and get some other info about the resources.

A more powerful cluster can be seen here:

To see the state of your running processes, use

squeue

Now let’s run some simple commands in the cluster. To do so, we will use the simple command srun (check the manual)

srun hostname

As you can see here, the command actually ran in the 12threads partition since we did not specify the actual partitions and 12threads is the default.

EXERCISE

Run 18 instances of the same command in a non-default partition. You should get something like (12threads partition). Use the parameter -N.

SERVER
sala7.salafis.net
sala7.salafis.net
sala7.salafis.net
sala7.salafis.net
sala7.salafis.net
sala7.salafis.net
sala7.salafis.net
sala7.salafis.net
sala7.salafis.net
sala7.salafis.net
sala8.salafis.net
sala8.salafis.net
sala8.salafis.net
sala8.salafis.net
sala8.salafis.net
sala8.salafis.net
sala8.salafis.net
sala8.salafis.net

As you can see, the jobs where magically distributed among two nodes necessary to run 18 processes (each node allows for 12 processes, two nodes corresponds to 24 processes in total). If you want to see this better, use the stress command with a timeout of 10 seconds and check that as soon as you launch the process, two nodes will be using their cpus at full (have some htop command running on both nodes):

srun -p 12threads -n 18  stress -t 10 -c 1

You can also cancel a slurm process using scancel. For a quick slurm command overview, check https://www.carc.usc.edu/user-guides/hpc-systems/using-our-hpc-systems/slurm-cheatsheet and https://docs.unity.uri.edu/documentation/jobs/slurm/

EXERCISE

Run 24 stress processes on 3 nodes , running 8 processes per node. At the same time, check with htop the system utilization on at least one of the nodes. Do you see your command running?

14.1. Creating `slurm` scripts for batch processing#

In general, it is advisable to put your slurm commands in a bash script to be able to improve them, have a historical track and so on. You can employ a special syntax in your script to give all the info to slurm and then use the command sbatch to launch your script , and squeue to check its state.

For our example, we will need to generate and adapt to finally get something like

#!/bin/bash -l
#SBATCH --job-name="testmulti"
# #SBATCH --account="HerrComp" # not used
#SBATCH --mail-type=ALL
#SBATCH --mail-user=wfoquendop@unal.edu.co
#SBATCH --time=01:00:00
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=12
#SBATCH --cpus-per-task=1
#SBATCH --partition=12threads

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

srun hostname

and then run it as

sbatch run.sh

You can get info about the jobs (if it is running, pending, cancelled, etc) using the command squeue .

By default you will get the standar output and error written in some *.out and *.err files, respectively.


Slurm.org
run.sh
slurm-70.out

Using a slurm script allows for a very general way to both run commands and specifiy , for instance, what modules to load, like using ml somehpclib or spack load something.

There are many env vars that you can access with slurm: check https://slurm.schedmd.com/sbatch.html and https://docs.hpc.shef.ac.uk/en/latest/referenceinfo/scheduler/SLURM/SLURM-environment-variables.html#gsc.tab=0

You can use a script generator to make this task easier:

EXERCISE

Adapt and run the following script to show some env vars automatically set by slurm

...
echo "Job ID: $SLURM_JOB_ID"
echo "Node: $(hostname)"
echo "CPUs: $SLURM_CPUS_ON_NODE"
echo "Dir: $SLURM_SUBMIT_DIR"

14.2. Exercises#

Create a script to run the stress command in some partition, including several nodes. Log into those nodes and check the actual usage. Share with other students.
Create a slurm script to run the openmp vector average, also including the scalling study. Check if sharing the nodes affects or not the times measured.
Create a slurm script to run the eigen matrix matmul example with openmp. Also use another one for the scaling study.
Create a slurm script to run some program that needs spack

14.3. Job array#

A job array allows slurm to run many commands parametrically.

Write a script that prints the value of a parameter using $SLURM_ARRAY_TASK_ID

Submit with –array=1-5

EXERCISE

Run a job array job to compute the time metrics for eigen c++ solving a linear algebra problem. Notice that now you dont need to worry about using only the available number of cpus.

HPC resource manager: Slurm

Contents

14. HPC resource manager: Slurm#

14.1. Creating slurm scripts for batch processing#

14.2. Exercises#

14.3. Job array#

14.1. Creating `slurm` scripts for batch processing#