Partitions and priorities
Partitions¶
Your SLURM job can be submitted into a specific partition which defines not only the specific hardware (such as GPU nodes) for the job, but also constrains other job parameters (maximum job size, time limits and priorities).
For example, the default partition is called "short" and the jobs submitted to it can consume up to 8 generic nodes (or 512 cores) for 24 hours. If you need a production access to the GPU nodes, you need to assing your job to the "gpu" partition, where you can use up to 64 cores and 4 NVidia A100 cards for two days. The purpose of "testing" partition is to allow short-time access to the resources for development and testing purposes. This should be helpful for developers in situations when the cluster is fully utilized.
If your job requirements don't match the limits set for the available partitions, contact us via our helpdesk.
List of partitions and their parameters¶
Partition | Nodes | Time limit (d-hh:mm) |
Job size limit (nodes/cores) |
GPUs | Priority factor |
---|---|---|---|---|---|
testing |
login01,login02 | 0-00:30 | 1/16 | 1 | 0 |
gpu |
n141-n148 | 2-00:00 | 1/64 | 4 | 0 |
short |
n001-n140 | 1-00:00 | 8/512 | 0 | 2 |
medium |
n001-n140 | 2-00:00 | 4/256 | 0 | 1 |
long |
n001-n140 | 4-00:00 | 1/64 | 0 | 0 |
Job priorities¶
When the cluster is occupied, the submitted jobs will wait in queue for execution. The waiting jobs will be ordered acccording to their priority attribute (the higher the number, the sooner will the job be launched). Job priorities are calculated as follows:
PRIO=1000000*USER_FS+JOB_AGE+345600*JOB_PARTITION, where
- USER_FS represents a fairshare factor, which gives a penalty to the users with regards to their past cluster usage, see sshare command
- JOB_AGE is equal to the time (in seconds) the job is waiting in queue
- JOB_PARTITION factor is assigned to the job by partition selection, see partions list
sprio command¶
This command shows the priorities (and their components) of waiting jobs.
Example (sort all waiting job by their priority)
demovic@login02 ~ > sprio -S -y
JOBID PARTITION PRIORITY SITE AGE FAIRSHARE JOBSIZE PARTITION
36386 ncpu 3777 0 1 2679 99 1000
36387 ncpu 3777 0 0 2679 99 1000
36339 ncpu 2910 0 25 1786 99 1000
36388 ncpu 2885 0 0 1786 99 1000
36389 ncpu 2885 0 0 1786 99 1000
36390 ncpu 2885 0 0 1786 99 1000
sshare command¶
This command displays fairshare information based on the hierarchical account structure. In our case we will use it to determine the fairshare factor used in job priority calculation. Since the fairshare factor value depends on the account (AKA user project) as well, we have to define it as well.
In this case we know, that our user (demovic) has access to the project called "p70-23-t". Therefore we can display the fairshare factor (shown here in the last column) as follows:
demovic@login01 ~ > sshare -A p70-23-t
Account User RawShares NormShares RawUsage EffectvUsage FairShare
-------------------- ---------- ---------- ----------- ----------- ------------- ----------
p70-23-t 1 0.333333 122541631 0.364839
p70-23-t demovic 1 0.111111 4798585 0.039159 0.263158
You can display all project accounts available to you using sprojects command.