Skip to content

Job States

Job Status and Reason Codes

The squeue command provides detailed information about an active job’s status, including job state codes and job reason codes.

  • Job state codes describe a job’s current state in the queue (e.g., pending, completed, running).
  • Job reason codes explain why a job is in a specific state.

The following tables outline a variety of job state and reason codes you may encounter when using squeue to check on your jobs.

Job State Codes

Status Code Explaination
CANCELLED CA The job was explicitly cancelled by the user or system administrator.
COMPLETED CD The job has completed successfully.
COMPLETING CG The job is finishing but some processes are still active.
DEADLINE DL The job terminated on deadline
FAILED F The job terminated with a non-zero exit code and failed to execute.
NODE_FAIL NF The job terminated due to failure of one or more allocated nodes
OUT_OF_MEMORY OOM The Job experienced an out of memory error.
PENDING PD The job is waiting for resource allocation. It will eventually run.
PREEMPTED PR The job was terminated because of preemption by another job.
RUNNING R The job currently is allocated to a node and is running.
SUSPENDED S A running job has been stopped with its cores released to other jobs.
STOPPED ST A running job has been stopped with its cores retained.
TIMEOUT TO Job terminated upon reaching its time limit.

See the squeue documentation for more information or type squeue --help, and/or sacct documentation documentation or type sacct --help.

Job debugging

If the job terminated with a non-zero exit code it can be beneficial to add -x flag to script header to enable higher verbosity.

#!/bin/bash -x
#SBATCH ...

Job Reason Codes

Reason Code Explanation
Priority One or more higher priority jobs is in queue for running. Your job will eventually run.
Dependency This job is waiting for a dependent job to complete and will run afterwards.
Resources The job is waiting for resources to become available and will eventually run.
InvalidAccount The job’s account is invalid. Cancel the job and rerun with correct account.
InvaldQoS The job’s QoS is invalid. Cancel the job and rerun with correct account.
QOSGrpCpuLimit All CPUs assigned to your job’s specified QoS are in use; job will run eventually.
QOSGrpMaxJobsLimit Maximum number of jobs for your job’s QoS have been met; job will run eventually.
QOSGrpNodeLimit All nodes assigned to your job’s specified QoS are in use; job will run eventually.
PartitionCpuLimit All CPUs assigned to your job’s specified partition are in use; job will run eventually.
PartitionMaxJobsLimit Maximum number of jobs for your job’s partition have been met; job will run eventually.
PartitionNodeLimit All nodes assigned to your job’s specified partition are in use; job will run eventually.
AssociationCpuLimit All CPUs assigned to your job’s specified association are in use; job will run eventually.
AssociationMaxJobsLimit Maximum number of jobs for your job’s association have been met; job will run eventually.
AssociationNodeLimit All nodes assigned to your job’s specified association are in use; job will run eventually.

For a full list of job state codes, refer to the Slurm documentation.