Hopper implements a shared-maximum model of scheduling which guarantees that each principle investor and their lab group has access to the resources that they have purchased, while also providing extra computational power through leveraging underutilized processors. This model relies heavily on Moab “reservations” which are similar to traditional queues, but are defined in terms of ownership. Like queues, reservations serve as the gateway to predefined groups of nodes and place basic constraints on their use. However, Moab reservations are much more capable and intelligent than a traditional queue, as is reflected in the large number of potential configuration options. Moab reservations grant access to the system’s global pool of researchers when unused, but will initiate job preemption when they are eventually requested by the owner.

If your job(s) consume more than your share (or your sponsor’s share) of available resources, they have a high chance of being preempted. Therefore, cluster researchers are encouraged to be mindful of their primary allocation of cores, the system load, and the current demand from fellow researchers when requesting resources from the Workload Manager using the commands described below.

Torque\Moab provide commands that give users information about resource availability in order to obtain quicker job turnaround times and to more fully utilize the system. System administrators have also build scripts to help you get information on your lab’s resources and their usage. Familiarity with these commands is essential in gleaning useful work from the machine.

Moab’s showres command provides basic information on the reservations to which you have been assigned. As demonstrated below, the user here has primary access to 2 nodes / 4 cores…

$ showres

ReservationID       Type S     Start        End       Duration     N/P       StartTime

physics_lab.48340   User    1:04:47:47   INFINITY    INFINITY     2/40   Thu Jul 7 09:08:13

Here we see that our current reservation has an ID of hpcadmin_lab.48340, and we have 2 nodes (40 cores) reserved for us.

To see our current reservation usage, we can use the familiar showq command with the -R switch…

$ showq -R physics_lab.48391

active jobs------------------------

10938      user1      Running      20        06:23:13:24     Fri Jul 8 14:14:35

To maintain a guaranteed level of ownership and provide global access, we can implement a special type of reservation (a “standing reservation“) which represents a persistent block of processing time for a given group of nodes, infinite in length. As with a traditional reservation, a name (or in our case a group of users) is associate with each, granting priority access to specified resources.

In some cases, to avoid preemption or simply to make better use of their owned resources, a user may want to submit directly to their assigned reservations. This can be done with the ADVRES flag…

qsub -l nodes=1,ppn=20,walltime=1:00:00 -W x=FLAGS:ADVRES:physics_lab.48391 testjob.cmd

Because reservation IDs can change over time, you will want to make sure to find your current reservation ID with the showres command before submitting in this way. You can then use the output there to submit your job to your current primary reservation.

You may also like...

Leave a Reply