Job States

Job Status and Reason Codes¶

The squeue command provides detailed information about an active job’s status, including job state codes and job reason codes.

Job state codes describe a job’s current state in the queue (e.g., pending, completed, running).
Job reason codes explain why a job is in a specific state.

The following tables outline a variety of job state and reason codes you may encounter when using squeue to check on your jobs.

Status	Code	Explaination
CANCELLED	`CA`	The job was explicitly cancelled by the user or system administrator.
COMPLETED	`CD`	The job has completed successfully.
COMPLETING	`CG`	The job is finishing but some processes are still active.
DEADLINE	`DL`	The job terminated on deadline
FAILED	`F`	The job terminated with a non-zero exit code and failed to execute.
NODE_FAIL	`NF`	The job terminated due to failure of one or more allocated nodes
OUT_OF_MEMORY	`OOM`	The Job experienced an out of memory error.
PENDING	`PD`	The job is waiting for resource allocation. It will eventually run.
PREEMPTED	`PR`	The job was terminated because of preemption by another job.
RUNNING	`R`	The job currently is allocated to a node and is running.
SUSPENDED	`S`	A running job has been stopped with its cores released to other jobs.
STOPPED	`ST`	A running job has been stopped with its cores retained.
TIMEOUT	`TO`	Job terminated upon reaching its time limit.

See the squeue documentation for more information or type squeue --help, and/or sacct documentation documentation or type sacct --help.

Job debugging

If the job terminated with a non-zero exit code it can be beneficial to add -x flag to script header to enable higher verbosity.

#!/bin/bash -x
#SBATCH ...

Reason Code	Explanation
`Priority`	One or more higher priority jobs is in queue for running. Your job will eventually run.
`Dependency`	This job is waiting for a dependent job to complete and will run afterwards.
`Resources`	The job is waiting for resources to become available and will eventually run.
`InvalidAccount`	The job’s account is invalid. Cancel the job and rerun with correct account.
`InvaldQoS`	The job’s QoS is invalid. Cancel the job and rerun with correct account.
`QOSGrpCpuLimit`	All CPUs assigned to your job’s specified QoS are in use; job will run eventually.
`QOSGrpMaxJobsLimit`	Maximum number of jobs for your job’s QoS have been met; job will run eventually.
`QOSGrpNodeLimit`	All nodes assigned to your job’s specified QoS are in use; job will run eventually.
`PartitionCpuLimit`	All CPUs assigned to your job’s specified partition are in use; job will run eventually.
`PartitionMaxJobsLimit`	Maximum number of jobs for your job’s partition have been met; job will run eventually.
`PartitionNodeLimit`	All nodes assigned to your job’s specified partition are in use; job will run eventually.
`AssociationCpuLimit`	All CPUs assigned to your job’s specified association are in use; job will run eventually.
`AssociationMaxJobsLimit`	Maximum number of jobs for your job’s association have been met; job will run eventually.
`AssociationNodeLimit`	All nodes assigned to your job’s specified association are in use; job will run eventually.

For a full list of job state codes, refer to the Slurm documentation.

Created by: Marek Štekláč, U-LAPTOP-2L06MISN\Marek