Array job
Array jobs¶
Job arrays are a convenient way to submit and manage large numbers of similar and idependent jobs quickly and easily. Job arrays are particularly useful when running similar jobs, such as performing the same analysis with different inputs or parameters.
There are several ways to define job arrays, such as specifying the range of indices or providing a list of indices in a file. Slurm also offers various features to manage and track job arrays, such as options to simultaneously suspend, resume, or cancel all jobs in the array.
Following is an example batch script for such an array job:
#!/bin/bash
#SBATCH --job-name=J "array test"
#SBATCH --partition=short
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=2
#SBATCH --output=out_array_%A_%a.out
#SBATCH --error=err_array_%A_%a.err
#SBATCH --array=1-8
# Print the task index.
echo "My SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID
srun ./myapp --input input_data_${SLURM_ARRAY_TASK_ID}.inp
In this example 6 array jobs will be launched (--array=1-8
), with 4 tasks (--ntasks=4
) per an array job and 2 CPUs per task (--cpus-per-task=2
).
The SLURM_ARRAY_TASK_ID
variable identifies each array task uniquely, allowing user to pass different input files for each array jobs and/or pass SLURM control commands to them.
If you want to reuse the same batch script for different array ranges, you can omit the --array
parameter in the batch script and specify the range on command line.
login01:~$ sbatch --array=1-8 array.job.sh
Defining the array range¶
The are several ways to define the range of the index values for a job array:
# Job array with tasks index values from 0 to 15
#SBATCH --array=0-15
# Job array with tasks index values 1, 2, 9, 22 and 31
#SBATCH --array=1,2,9,22,31
# Job array with tasks index values 1, 3, 5 and 7
#SBATCH --array=1-7:2
# Job array with tasks index values 1, 3, 5, 7 and 20
#SBATCH --array=1-7:2,20
Managing array jobs¶
The squeue
command can be used to view the state of the your array jobs.
The still pending array tasks are shown as one entry,
while the running ones are shown as individual entries with their job IDs taking the form <jobid>_<arrayindex>
.
login01:~$ squeue -username
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
123456_[4-8 ] small example user1 PD 0:00 1 (Resources)
123456_1 small example user1 R 0:17 1 n024
123456_2 small example user1 R 0:23 1 n025
123456_3 small example user1 R 0:29 1 n025
If you wish to cancel some of the array tasks of a job array, you can use the scancel
command as with any other job.
For example, to cancel array tasks with indexes from 1 to 3 from job array 2024, use the following command:
login01:~$ scancel 123456_[1-3]
On the other hand, if you want to cancel the whole job array, only specifying the job ID suffice.
login01:~$ scancel 123456
Environment variables¶
In addition to the SLURM_ARRAY_TASK_ID
variabale, SLURM will set the following environment variables that describe the job:
Variable | Descritpion |
---|---|
SLURM_ARRAY_TASK_ID |
Job array index value |
SLURM_ARRAY_TASK_COUNT |
Number of array tasks in the job array |
SLURM_ARRAY_TASK_MIN |
Value of the highest job array index |
SLURM_ARRAY_TASK_MAX |
Value of the lowest job array index |