Practical Examples
sinfo - general system state information¶
First we determine what partitions exist on the system, what nodes they include, and general system state. This information is provided by the sinfo
command.
The "*" in the partiton name indicates that this is the default partition for submitted jobs. We see that all partitions are in different states - idle (up), alloc (allocated by user) or down. The information about each partition may be split over more than one line so that nodes in different states can be identified.
The nodes in the down state marked by down* indicate the nodes are not responding.
Example - different states of nodes, output of sinfo command
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
testing up 30:00 2 idle login[01-02]
gpu up 2-00:00:00 4 mix n[141-143,148]
gpu up 2-00:00:00 1 alloc n144
gpu up 2-00:00:00 3 idle n[145-147]
short* up 1-00:00:00 22 drain* n[014-021,026-031,044-051]
short* up 1-00:00:00 10 mix n[001-002,025,052,058,067,073,079,081,105]
short* up 1-00:00:00 86 alloc n[003-008,012-013,022-024,032-033,036-043,053-057,059-066,068-072,074,077-078,080,082-094,097-099,102-104,106-116,119-127,131,135-136,140]
short* up 1-00:00:00 22 idle n[009-011,034-035,075-076,095-096,100-101,117-118,128-130,132-134,137-139]
medium up 2-00:00:00 22 drain* n[014-021,026-031,044-051]
medium up 2-00:00:00 10 mix n[001-002,025,052,058,067,073,079,081,105]
medium up 2-00:00:00 86 alloc n[003-008,012-013,022-024,032-033,036-043,053-057,059-066,068-072,074,077-078,080,082-094,097-099,102-104,106-116,119-127,131,135-136,140]
medium up 2-00:00:00 22 idle n[009-011,034-035,075-076,095-096,100-101,117-118,128-130,132-134,137-139]
long up 4-00:00:00 22 drain* n[014-021,026-031,044-051]
long up 4-00:00:00 10 mix n[001-002,025,052,058,067,073,079,081,105]
long up 4-00:00:00 86 alloc n[003-008,012-013,022-024,032-033,036-043,053-057,059-066,068-072,074,077-078,080,082-094,097-099,102-104,106-116,119-127,131,135-136,140]
long up 4-00:00:00 22 idle n[009-011,034-035,075-076,095-096,100-101,117-118,128-130,132-134,137-139]
The sinfo command has many options to easily let you view the information of interest to you in whatever format you prefer.
See the man page or type sinfo --help
for more information.
squeue - information about submitted jobs¶
Next we determine what jobs exist on the system using the squeue command.
Example - output of squeue command
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
16048 short xgboost_ user1 PD 0:00 1 (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions)
15739 short test3232 user2 PD 0:00 2 (Priority)
15365 short DHAI-b user1 PD 0:00 1 (Priority)
15349 gpu gpu8 test R 0:00 1 n141
Explanation of all fields
Info
The JOBID field is giving information about JOB ID in SLURM. You can work with this number in your SLURM scripts via $SLURM_JOB_ID variable.
Info
The PARTITION field is showing on which partition the job is running.
Info
The NAME field is showing specified name of the job by user.
Info
The USER field is showing the account username of person who has submitted the job.
Info
The ST field is givig information about job state. The following job states are possible:
- running state - an abbreviation R
- pending state - an abbreviation PD
Info
The TIME field shows how long the jobs have run for using the format days:hours:minutes:seconds
Info
The NODES field is showing the number of allocated nodes.
Info
The NODELIST(REASON) field indicates where the job is running or the reason it is still pending. Typical reasons for pending jobs are:
- Resources (waiting for resources to become available)
- Priority (queued behind a higher priority job).
The squeue command has many options to easily let you view the information of interest to you in whatever format you prefer.
See the man page for more information or type squeue --help
.
scontrol - more detailed information¶
The scontrol command can be used to report more detailed information about nodes, partitions, jobs, job steps, and configuration. It can also be used by system administrators to make configuration changes. A couple of examples are shown below.
See the man page for more information or type scontrol --help
Example - scontrol show partition
PartitionName=testing
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=00:30:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=1 MaxTime=00:30:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=login[01-02]
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=64 TotalNodes=2 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerCPU=4000 MaxMemPerNode=UNLIMITED
TRES=cpu=64,mem=250G,node=2,gres/gpu=2
TRESBillingWeights=CPU=0.0
PartitionName=gpu
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=2-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=1 MaxTime=2-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=n[141-148]
PriorityJobFactor=0 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=512 TotalNodes=8 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerCPU=4000 MaxMemPerNode=UNLIMITED
TRES=cpu=512,mem=2000G,node=8,billing=512,gres/gpu=32
TRESBillingWeights=CPU=1.0,Mem=0.256G,GRES/gpu=16.0
PartitionName=short
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=1-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=8 MaxTime=1-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=n[001-140]
PriorityJobFactor=2 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=8960 TotalNodes=140 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerCPU=4000 MaxMemPerNode=UNLIMITED
TRES=cpu=8960,mem=35000G,node=140,billing=8960
TRESBillingWeights=CPU=1.0,Mem=0.256G
PartitionName=medium
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=2-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=4 MaxTime=2-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=n[001-140]
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=8960 TotalNodes=140 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerCPU=4000 MaxMemPerNode=UNLIMITED
TRES=cpu=8960,mem=35000G,node=140,billing=8960
TRESBillingWeights=CPU=1.0,Mem=0.256G
PartitionName=long
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=4-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=1 MaxTime=4-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=n[001-140]
PriorityJobFactor=0 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=8960 TotalNodes=140 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerCPU=4000 MaxMemPerNode=UNLIMITED
TRES=cpu=8960,mem=35000G,node=140,billing=8960
TRESBillingWeights=CPU=1.0,Mem=0.256G
Example - scontrol show node n148
NodeName=n148 Arch=x86_64 CoresPerSocket=32
CPUAlloc=1 CPUEfctv=64 CPUTot=64 CPULoad=1.04
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=gpu:A100-SXM4-40GB:4
NodeAddr=n148 NodeHostName=n148 Version=22.05.7
OS=Linux 3.10.0-1160.71.1.el7.x86_64 #1 SMP Tue Jun 28 15:37:28 UTC 2022
RealMemory=256000 AllocMem=64000 FreeMem=67242 Sockets=2 Boards=1
State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=gpu
BootTime=2023-09-06T10:29:48 SlurmdStartTime=2023-09-18T14:25:33
LastBusyTime=2023-09-18T14:02:52
CfgTRES=cpu=64,mem=250G,billing=64,gres/gpu=4
AllocTRES=cpu=1,mem=62.50G,gres/gpu=1
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
scancel - canceling jobs¶
scancel is used to signal or cancel jobs, job arrays or job steps.
An arbitrary number of jobs or job steps may be signaled using job specification filters or a space separated list of specific job and/or job step IDs. If the job ID of a job array is specified with an array ID value then only that job array element will be cancelled. If the job ID of a job array is specified without an array ID value then all job array elements will be cancelled. While a heterogeneous job is in a PENDING state, only the entire job can be cancelled rather than its individual components.
See the man page for more information or type scancel --help
srun - run parallel jobs¶
It is possible to create a resource allocation and launch the tasks for a job step in a single command line using the srun
command. Depending upon the MPI implementation used, MPI jobs may also be launched in this manner. In this example we execute /bin/hostname
on four nodes (-N 4)
and include task numbers on the output (-l). The default partition will be used. One task per node will be used by default. Note that the srun command has many options available to control what resource are allocated and how tasks are distributed across those resources.
srun -N 4 /bin/hostname
n058
n057
n059
n060
One common mode of operation is to submit a script for later execution. In this example the script name is script.sh and we explicitly use the nodes n067 and n066 (-w “n0[66-67]”, note the use of a node range expression). We also explicitly state that the subsequent job steps will spawn four tasks each, which will ensure that our allocation contains at least four processors (one processor per task to be launched). The output will appear in the file my.stdout (“-o my.stdout”). This script contains a timelimit for the job embedded within itself. Other options can be supplied as desired by using a prefix of “#SBATCH” followed by the option at the beginning of the script (before any commands to be executed in the script). Options supplied on the command line would override any options specified within the script. Note that script.sh contains the command /bin/hostname that executed on the first node in the allocation (where the script runs) plus two job steps initiated using the srun command and executed sequentially.
Running srun within sbatch
user@login02 test > cat script.sh
#!/bin/bash
#SBATCH --time=1
#SBATCH --tasks-per-node=2
/bin/hostname
srun -l /bin/hostname
srun -l /bin/pwd
user@login02 test > sbatch -n 4 -w "n0[66-67]" -o my.stdout script.sh
Submitted batch job 38793
user@login02 test > cat my.stdout
n066
3: n067
2: n067
0: n066
1: n066
3: /home/user/test
1: /home/user/test
0: /home/user/test
2: /home/user/test