Skip to content

Job priorities

Demand for HPC resources typically surpasses supply, thus a method which establishes an order when a job can run has to be implemented. By default, the scheduler allocates on a simple "first-in, first-out" (FIFO) approach. When the cluster is occupied, the submitted jobs will wait in queue for execution. The waiting jobs will be ordered acccording to their priority attribute (the higher the number, the sooner will the job be launched). However the applications of rules and policies can change the priority of a job, which will be expressed as a number to the scheduler. Some of the main factors are:

Priority factors

  • Job size : The number of nodes, cores, or memory that a job is requesting. A higher priority is given to larger jobs.
  • Wait time: The priority of a job increases the longer it has been in the queue.
  • Fairshare: Fairshare takes into account the resources used by a project's jobs in the last 14 days. The more resources used by a project's jobs in the last 14 days, the lower the priority of the new jobs for that project.
  • Backfilling: This allows lower priority jobs to run as long as the batch system knows they will finish before the higher priority job needs the resources. This makes it very important that the users specify their CPU, memory and walltime requirements accurately, to make best use of the backfilling system.
  • Partition and QoS: A factor associated with each node partition.

Job priorities are calculated as follows:

PRIO=1000000*USER_FS+JOB_AGE+345600*JOB_PARTITION, where

  • USER_FS represents a fairshare factor, which gives a penalty to the users with regards to their past cluster usage, see sshare command
  • JOB_AGE is equal to the time (in seconds) the job is waiting in queue
  • JOB_PARTITION factor is assigned to the job by partition selection, see partions list

Managing priorities

There are several commands that allow user to manage/view priorities of submitted jobs, chief among them sprio and sshare. Sprio command shows the priorities (and their components) of waiting jobs. Sshare can be used to determine faishare factor that is used in job priority calculation.

sprio - jobs scheduling priority information

Demand for HPC resources typically surpasses supply, thus a method which establishes an order when a job can run has to be implemented. By default, the scheduler allocates on a simple "first-in, first-out" (FIFO) approach. However the applications of rules and policies can change the priority of a job, which will be expressed as a number to the scheduler. sprio command can be used to view the priorities (and their components) of waiting jobs.

Sorting all waitings jobs by their priority

login01:~$ sprio -S -y
   36386 ncpu            3777          0          1       2679         99       1000
   36387 ncpu            3777          0          0       2679         99       1000
   36339 ncpu            2910          0         25       1786         99       1000
   36388 ncpu            2885          0          0       1786         99       1000
   36389 ncpu            2885          0          0       1786         99       1000
   36390 ncpu            2885          0          0       1786         99       1000

See the slurm documentation page for more information or type sprio --help.

sshare - list shares of associations

This command displays fairshare information based on the hierarchical account structure. In our case we will use it to determine the fairshare factor used in job priority calculation. Since the fairshare factor value depends on the account (AKA user project) as well, we have to define it as well.

In this case we know, that our user1 has access to the project called "p70-23-t". Therefore we can display the fairshare factor (shown here in the last column) as follows:

login01:~ $ sshare -A p70-23-t 
   Account                    User  RawShares  NormShares    RawUsage  EffectvUsage  FairShare 
   -------------------- ---------- ---------- ----------- ----------- ------------- ---------- 
   p70-23-t                                 1    0.333333   122541631      0.364839            
   p70-23-t               user1             1    0.111111     4798585      0.039159   0.263158 

You can display all project accounts available to you using sprojects command.

See the slurm documentation for more information or type sshare --help.

Created by: Martin Blaško