Your SLURM job can be submitted into a specific partition which defines not only the specific hardware (such as GPU nodes) for the job, but also constrains other job parameters (maximum job size, time limits and priorities).

For example, the default partition is called "short" and the jobs submitted to it can consume up to 8 generic nodes (or 512 cores) for 24 hours. If you need a production access to the GPU nodes, you need to assing your job to the gpu partition, where you can use up to 64 cores and 4 NVidia A100 cards for two days. The purpose of testing partition is to allow short-time access to the resources for development and testing purposes. This should be helpful for developers in situations when the cluster is fully utilized.

If your job requirements don't match the limits set for the available partitions, contact us via our helpdesk.

To select a given partition with a [Slurm command], use the -p <partition> option:

srun|srun|salloc|sinfo|squeue... -p <partition> [...]

List of Partitions and Their Parameters

Partition Nodes Time limit
Job size limit
GPUs Priority factor
testing login01,login02 0-00:30 1/16 1 0
gpu n141-n148 2-00:00 1/64 4 0
short n001-n140 1-00:00 8/512 0 2
medium n001-n140 2-00:00 4/256 0 1
long n001-n140 4-00:00 1/64 0 0

Partition State Information

For detailed about all available partitions and their definition/limits:

login01:~$ scontrol show partitions <name>

Long partition information
login01:~$ scontrol show partitions long
    AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
    AllocNodes=ALL Default=NO QoS=N/A
    DefaultTime=4-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
    MaxNodes=1 MaxTime=4-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
    PriorityJobFactor=0 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
    OverTimeLimit=NONE PreemptMode=OFF
    State=UP TotalCPUs=8960 TotalNodes=140 SelectTypeParameters=NONE
    DefMemPerCPU=4000 MaxMemPerNode=UNLIMITED

Partition Limits

At partition level, only the following limits can be enforced:

  • DefaultTime: Default time limit
  • MaxNodes: Maximum number of nodes per job
  • MinNodes: Minimum number of nodes per job
  • MaxCPUsPerNode: Maximum number of CPUs job can be allocated on any node
  • MaxMemPerCPU/Node: Maximum memory job can be allocated on any CPU or node
  • MaxTime: Maximum length of time user's job can run