Terminology

Here’s a quick list of terms to refer to, to get you up to speed. If you’re confused what a term means, please ask us!

Slurm Commands (in order of importance)

This list comes from the slurm quickstart documentation, but is better organized.

Asking for Resources

srun is used to submit a job for execution or initiate job steps in real time. srun has a wide variety of options to specify resource requirements, including: minimum and maximum node count, processor count, specific nodes to use or not use, and specific node characteristics (so much memory, disk space, certain required features, etc.). A job can contain multiple job steps executing sequentially or in parallel on independent or shared resources within the job’s node allocation.

sbatch is used to submit a job script for later execution. The script will typically contain one or more srun commands to launch parallel tasks.

salloc is used to allocate resources for a job in real time. Typically this is used to allocate resources and spawn a shell. The shell is then used to execute srun commands to launch parallel tasks.

How am I doing? / How busy is exacloud?

sinfo reports the state of partitions and nodes managed by Slurm. It has a wide variety of filtering, sorting, and formatting options.

squeue reports the state of jobs or job steps. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the running jobs in priority order and then the pending jobs in priority order.

Oops!

scancel is used to cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step.

How do I know how much resources to request?

sacct is used to report job or job step accounting information about active or completed jobs.