Reservations

Getting a reservation

Who?

If you are an IDLab member on a tight deadline and experiencing long waits, you can request a reservation for some dedicated resources on GPULab.

How?

Please email gpulab@ilabt.imec.be mentioning which resources you want to reserve and for which deadline (paper/PhD/project/…). Also mention your GPULab username (or the project name if you want to share the reservation) you want to use the reservation in.

Using your Reservation

When a reservation is made for you, you will receive a reservation ID. This is a UUID, example: 123e4567-e89b-12d3-a456-426614174000

You need to use this reservation ID when starting a GPULab job or a Jupyterhub notebook.

For a GPULab job, place the ID in request.scheduling.reservationId. For example:

{
  "name": "myjob",
  "description": "Example job that uses a reservation",
  "request": {
    "docker": {
      "command": "/project/start_job.sh",
      "image": "debian:stable",
      "storage": [ { "hostPath": "/project_ghent", "containerPath": "/project" } ]
    },
    "resources": {
      "clusterId": 42,
      "gpuModel": [ "V100" ],
      "cpus": 2,
      "gpus": 1,
      "cpuMemoryGb": 2,
    },
    "scheduling": {
      "reservationIds": ["123e4567-e89b-12d3-a456-426614174000"],
    }
  }
}

For Jupyterhub, click the “Show Advanced Options” button. Then fill in the ID in the “Reservation ID” field that appears.

Attention

Between the start end end of your reservation, resources will be unavailable for other users.

If you have finished your work before the end of the reservation, please delete your reservation. This is much appreciated, as it will immediately release the resources for use by other users.

You can delete the reservation from the GPULab site: Click the reservation button in the top right, and look for your reservation.

Reservations do not limit where jobs run

Specifying a reservation in a job does not in itself limit that job to only use the reserved nodes! It only “unlocks” these resources for use by your job.

The scheduler first makes a list of all nodes it can assign your job to. It uses the job’s reservation(s) specified in this step to unlock the reserved nodes.

In a second step, the scheduler chooses which of these possible nodes your job will be started on. In this step the reserved nodes are not always granted priority over other nodes (but they often are given priority).

Therefor, when using reservations, you still need to specify the requirements of your job. This can be the gpuModel, or the clusterId.

An example scenario:

  • Assume there are 2 clusters, each with 5 GPUs free.
  • Cluster A has less powerful GPUs than cluster B.
  • You want to use the powerful GPUs and have reserved 3 of them in cluster B.

Example 1:

  • You start 4 jobs that use the reservation, but do not specify which cluster of GPU type you want.
  • Result: Your 4 jobs might run on cluster A or cluster B or a mix of both. You can not assume they run on cluster B. (Though the scheduler will usually prefer cluster B in this case.)

Example 2:

  • You start 4 jobs that use the reservation, you specify the GPU type of cluster B in the jobs.
  • Result: Your 4 jobs will run on cluster B. Even though you have only 3 reserved GPUs, the 4th job still starts because there are free GPU’s outside of the reservation.

Example 3:

  • Assume there is a 3rd cluster C, with the same powerful GPUs as cluster B, and 1 free GPU.
  • You start 6 jobs that use the reservation, you specify the GPU type of cluster B in the jobs.
  • Result: 5 jobs will run on cluster B, and 1 job on cluster C.

If you want to limit how much jobs will run “in your reservation” (for example, if you do not want to use resources outside of your reservation), use the maxSimultaneousJobs option.

Running jobs before future reservations

Reservations made for other people that start in the future, will make sure you cannot start a job that will still run when the reservation starts.

This is because reservations will never stop a job to enforce the reservation.

To ensure that your jobs can keep starting on clusters with future reservations, you need to explicitly specify a maxDuration which does not extend past the start of this reservation. You can set the maxDuration under scheduling. For example:

{
    "name": "Short job",
    "description": "Job with a max duration of 6 hours",
    "request": {
      "docker": { ... },
      "resources": {... },
      "scheduling": {
        "maxDuration": "6 hours"
      }
    }
}