Usage Guide

Typical Workflow

  1. Run a Jupyter notebook or an interactive job to develop and test your job script. Develop using a limited number of GPU’s.
  2. Create a jobDefinition and submit your job(s). Your job will use more GPUs, CPUs and memory.
  3. Check your results and repeat as needed.

Store your scripts, data and logs on the shared project space.

Do’s

  1. As GPULab is a shared system, it is in everyones best interest to use the GPU and CPU resources efficiently:
    • Make sure you do not have idle jobs running. Cancel jobs if you are not currently using the GPU’s and CPU’s.
    • Make sure you actually use the resources you requested. (For now, you need to check this manually inside your job, GPULab will later collect these statistics itself.)
    • Be economical with the disk space on the shared project space. (For the moment, the disk is too full to ignore disk space usage. This should change in the future.)
  2. For long running jobs, we strongly advice you to use checkpoints so that your job is resumable.
  3. GPULab only stores the first few MB of the output of your job. So make sure you log to the shared storage if you need more logging.
  4. Contact the GPULab admins if there are not enough resources, if you want to reserve a large number in advance, if you spot bugs, or if you have any questions.

Maximum Simultanious Jobs

There is a limit on the number of simultanious jobs that each user can run. By default, this limit is 10 jobs. This limit enables you to submit many jobs, while not blocking all resources for other users.

This limit can be changed for each user on request. It can be lowered (if you want less simultanious jobs to run), or increased (in which case you’ll need to take care not to block other users). Just ask the GPULab admins.