Skip to content

Practical Examples

sinfo - general system state information

First we determine what partitions exist on the system, what nodes they include, and general system state. This information is provided by the sinfo command.

The "*" in the partiton name indicates that this is the default partition for submitted jobs. We see that all partitions are in different states - idle (up), alloc (allocated by user) or down. The information about each partition may be split over more than one line so that nodes in different states can be identified.

The nodes in the down state marked by down* indicate the nodes are not responding.

Example - different states of nodes, output of sinfo command
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
testing      up      30:00      2   idle login[01-02]
gpu          up 2-00:00:00      4    mix n[141-143,148]
gpu          up 2-00:00:00      1  alloc n144
gpu          up 2-00:00:00      3   idle n[145-147]
short*       up 1-00:00:00     22 drain* n[014-021,026-031,044-051]
short*       up 1-00:00:00     10    mix n[001-002,025,052,058,067,073,079,081,105]
short*       up 1-00:00:00     86  alloc n[003-008,012-013,022-024,032-033,036-043,053-057,059-066,068-072,074,077-078,080,082-094,097-099,102-104,106-116,119-127,131,135-136,140]
short*       up 1-00:00:00     22   idle n[009-011,034-035,075-076,095-096,100-101,117-118,128-130,132-134,137-139]
medium       up 2-00:00:00     22 drain* n[014-021,026-031,044-051]
medium       up 2-00:00:00     10    mix n[001-002,025,052,058,067,073,079,081,105]
medium       up 2-00:00:00     86  alloc n[003-008,012-013,022-024,032-033,036-043,053-057,059-066,068-072,074,077-078,080,082-094,097-099,102-104,106-116,119-127,131,135-136,140]
medium       up 2-00:00:00     22   idle n[009-011,034-035,075-076,095-096,100-101,117-118,128-130,132-134,137-139]
long         up 4-00:00:00     22 drain* n[014-021,026-031,044-051]
long         up 4-00:00:00     10    mix n[001-002,025,052,058,067,073,079,081,105]
long         up 4-00:00:00     86  alloc n[003-008,012-013,022-024,032-033,036-043,053-057,059-066,068-072,074,077-078,080,082-094,097-099,102-104,106-116,119-127,131,135-136,140]
long         up 4-00:00:00     22   idle n[009-011,034-035,075-076,095-096,100-101,117-118,128-130,132-134,137-139]

The sinfo command has many options to easily let you view the information of interest to you in whatever format you prefer.

See the man page or type sinfo --help for more information.

squeue - information about submitted jobs

Next we determine what jobs exist on the system using the squeue command.

Example - output of squeue command
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
16048     short xgboost_    user1 PD       0:00      1 (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions)
15739     short test3232    user2 PD       0:00      2 (Priority)
15365     short   DHAI-b    user1 PD       0:00      1 (Priority)
15349       gpu     gpu8     test  R       0:00      1 n141

Explanation of all fields

Info

The JOBID field is giving information about JOB ID in SLURM. You can work with this number in your SLURM scripts via $SLURM_JOB_ID variable.

Info

The PARTITION field is showing on which partition the job is running.

Info

The NAME field is showing specified name of the job by user.

Info

The USER field is showing the account username of person who has submitted the job.

Info

The ST field is givig information about job state. The following job states are possible:

  • running state - an abbreviation R
  • pending state - an abbreviation PD

Info

The TIME field shows how long the jobs have run for using the format days:hours:minutes:seconds

Info

The NODES field is showing the number of allocated nodes.

Info

The NODELIST(REASON) field indicates where the job is running or the reason it is still pending. Typical reasons for pending jobs are:

  • Resources (waiting for resources to become available)
  • Priority (queued behind a higher priority job).

The squeue command has many options to easily let you view the information of interest to you in whatever format you prefer.

See the man page for more information or type squeue --help.

scontrol - more detailed information

The scontrol command can be used to report more detailed information about nodes, partitions, jobs, job steps, and configuration. It can also be used by system administrators to make configuration changes. A couple of examples are shown below.

See the man page for more information or type scontrol --help

Example - scontrol show partition
PartitionName=testing
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=N/A
   DefaultTime=00:30:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=1 MaxTime=00:30:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=login[01-02]
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
   OverTimeLimit=NONE PreemptMode=OFF
   State=UP TotalCPUs=64 TotalNodes=2 SelectTypeParameters=NONE
   JobDefaults=(null)
   DefMemPerCPU=4000 MaxMemPerNode=UNLIMITED
   TRES=cpu=64,mem=250G,node=2,gres/gpu=2
   TRESBillingWeights=CPU=0.0

PartitionName=gpu
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=N/A
   DefaultTime=2-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=1 MaxTime=2-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=n[141-148]
   PriorityJobFactor=0 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
   OverTimeLimit=NONE PreemptMode=OFF
   State=UP TotalCPUs=512 TotalNodes=8 SelectTypeParameters=NONE
   JobDefaults=(null)
   DefMemPerCPU=4000 MaxMemPerNode=UNLIMITED
   TRES=cpu=512,mem=2000G,node=8,billing=512,gres/gpu=32
   TRESBillingWeights=CPU=1.0,Mem=0.256G,GRES/gpu=16.0

PartitionName=short
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=N/A
   DefaultTime=1-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=8 MaxTime=1-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=n[001-140]
   PriorityJobFactor=2 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
   OverTimeLimit=NONE PreemptMode=OFF
   State=UP TotalCPUs=8960 TotalNodes=140 SelectTypeParameters=NONE
   JobDefaults=(null)
   DefMemPerCPU=4000 MaxMemPerNode=UNLIMITED
   TRES=cpu=8960,mem=35000G,node=140,billing=8960
   TRESBillingWeights=CPU=1.0,Mem=0.256G

PartitionName=medium
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=N/A
   DefaultTime=2-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=4 MaxTime=2-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=n[001-140]
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
   OverTimeLimit=NONE PreemptMode=OFF
   State=UP TotalCPUs=8960 TotalNodes=140 SelectTypeParameters=NONE
   JobDefaults=(null)
   DefMemPerCPU=4000 MaxMemPerNode=UNLIMITED
   TRES=cpu=8960,mem=35000G,node=140,billing=8960
   TRESBillingWeights=CPU=1.0,Mem=0.256G

PartitionName=long
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=N/A
   DefaultTime=4-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=1 MaxTime=4-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=n[001-140]
   PriorityJobFactor=0 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
   OverTimeLimit=NONE PreemptMode=OFF
   State=UP TotalCPUs=8960 TotalNodes=140 SelectTypeParameters=NONE
   JobDefaults=(null)
   DefMemPerCPU=4000 MaxMemPerNode=UNLIMITED
   TRES=cpu=8960,mem=35000G,node=140,billing=8960
   TRESBillingWeights=CPU=1.0,Mem=0.256G
Example - scontrol show node n148
NodeName=n148 Arch=x86_64 CoresPerSocket=32 
   CPUAlloc=1 CPUEfctv=64 CPUTot=64 CPULoad=1.04
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=gpu:A100-SXM4-40GB:4
   NodeAddr=n148 NodeHostName=n148 Version=22.05.7
   OS=Linux 3.10.0-1160.71.1.el7.x86_64 #1 SMP Tue Jun 28 15:37:28 UTC 2022 
   RealMemory=256000 AllocMem=64000 FreeMem=67242 Sockets=2 Boards=1
   State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=gpu 
   BootTime=2023-09-06T10:29:48 SlurmdStartTime=2023-09-18T14:25:33
   LastBusyTime=2023-09-18T14:02:52
   CfgTRES=cpu=64,mem=250G,billing=64,gres/gpu=4
   AllocTRES=cpu=1,mem=62.50G,gres/gpu=1
   CapWatts=n/a
   CurrentWatts=0 AveWatts=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

scancel - canceling jobs

scancel is used to signal or cancel jobs, job arrays or job steps.

An arbitrary number of jobs or job steps may be signaled using job specification filters or a space separated list of specific job and/or job step IDs. If the job ID of a job array is specified with an array ID value then only that job array element will be cancelled. If the job ID of a job array is specified without an array ID value then all job array elements will be cancelled. While a heterogeneous job is in a PENDING state, only the entire job can be cancelled rather than its individual components.

See the man page for more information or type scancel --help

srun - run parallel jobs

It is possible to create a resource allocation and launch the tasks for a job step in a single command line using the srun command. Depending upon the MPI implementation used, MPI jobs may also be launched in this manner. In this example we execute /bin/hostname on four nodes (-N 4) and include task numbers on the output (-l). The default partition will be used. One task per node will be used by default. Note that the srun command has many options available to control what resource are allocated and how tasks are distributed across those resources.

srun -N 4 /bin/hostname
n058
n057
n059
n060

One common mode of operation is to submit a script for later execution. In this example the script name is script.sh and we explicitly use the nodes n067 and n066 (-w “n0[66-67]”, note the use of a node range expression). We also explicitly state that the subsequent job steps will spawn four tasks each, which will ensure that our allocation contains at least four processors (one processor per task to be launched). The output will appear in the file my.stdout (“-o my.stdout”). This script contains a timelimit for the job embedded within itself. Other options can be supplied as desired by using a prefix of “#SBATCH” followed by the option at the beginning of the script (before any commands to be executed in the script). Options supplied on the command line would override any options specified within the script. Note that script.sh contains the command /bin/hostname that executed on the first node in the allocation (where the script runs) plus two job steps initiated using the srun command and executed sequentially.

Running srun within sbatch
user@login02 test > cat script.sh
#!/bin/bash
#SBATCH --time=1
#SBATCH --tasks-per-node=2
/bin/hostname
srun -l /bin/hostname
srun -l /bin/pwd

user@login02 test > sbatch -n 4 -w "n0[66-67]" -o my.stdout script.sh
Submitted batch job 38793

user@login02 test > cat my.stdout
n066
3: n067
2: n067
0: n066
1: n066
3: /home/user/test
1: /home/user/test
0: /home/user/test
2: /home/user/test

Last update: September 22, 2023