Partitions¶
Your SLURM job can be submitted into a specific partition which defines not only the specific hardware (such as GPU nodes) for the job, but also constrains other job parameters (maximum job size, time limits and priorities).
For example, the default partition is called "short" and the jobs submitted to it
can consume up to 8 generic nodes (or 512 cores) for 24 hours.
If you need a production access to the GPU nodes, you need to assing your job to the gpu
partition,
where you can use up to 64 cores and 4 NVidia A100 cards for two days.
The purpose of testing
partition is to allow short-time access to the resources for development and testing purposes.
This should be helpful for developers in situations when the cluster is fully utilized.
If your job requirements don't match the limits set for the available partitions, contact us via our helpdesk.
To select a given partition with a [Slurm command], use the -p <partition>
option:
srun|srun|salloc|sinfo|squeue... -p <partition> [...]
List of Partitions and Their Parameters¶
Partition | Nodes | Time limit (d-hh:mm) |
Job size limit (nodes/cores) |
GPUs | Priority factor |
---|---|---|---|---|---|
testing |
login01,login02 | 0-00:30 | 1/16 | 1 | 0 |
gpu |
n141-n148 | 2-00:00 | 1/64 | 4 | 0 |
short |
n001-n140 | 1-00:00 | 8/512 | 0 | 2 |
medium |
n001-n140 | 2-00:00 | 4/256 | 0 | 1 |
long |
n001-n140 | 4-00:00 | 1/64 | 0 | 0 |
Partition State Information¶
For detailed about all available partitions and their definition/limits:
login01:~$ scontrol show partitions <name>
Long partition information
login01:~$ scontrol show partitions long
ParrtitionName=long
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=4-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=1 MaxTime=4-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=n[001-140]
PriorityJobFactor=0 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=8960 TotalNodes=140 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerCPU=4000 MaxMemPerNode=UNLIMITED
TRES=cpu=8960,mem=35000G,node=140,billing=8960
TRESBillingWeights=CPU=1.0,Mem=0.256G
Partition Limits¶
At partition level, only the following limits can be enforced:
DefaultTime
: Default time limitMaxNodes
: Maximum number of nodes per jobMinNodes
: Minimum number of nodes per jobMaxCPUsPerNode
: Maximum number of CPUs job can be allocated on any nodeMaxMemPerCPU/Node
: Maximum memory job can be allocated on any CPU or nodeMaxTime
: Maximum length of time user's job can run