The Job Definition

This page gives detailed information on some of the fields available in the Job Definition JSON.

You can find some examples in the tutorials section.

clusterId

The jobDefinitions include a clusterId. This corresponds to one or more slave nodes used by GPULab to execute the jobs.

To request info about the available worker nodes and clusters, use the following command:

$ gpulab-cli clusters --short
+---------+---------+----------------------+---------+---------+---------+-----------------+
| Cluster | Version | Host                 | Workers | GPUs    | CPUs    | Memory (GB)     |
+---------+---------+----------------------+---------+---------+---------+-----------------+
| 1       | stable  | gpu2                 |   2/2   |   2/2   |  16/16  |   31.18/31.18   |
| 1       | stable  | gpu1                 |   2/2   |   2/2   |  16/16  |   31.18/31.18   |
+---------+---------+----------------------+---------+---------+---------+-----------------+

Ommitting --short results in more info, including the GPU model etc.

resources

The jobDefinitions also includes resources allows you to specify the amount of GPU’s ("gpus"), CPU’s ("cpuCores") and memory ("systemMemory") needed.

If more than one GPU or CPU is requested, the job will never run when less than requested is available.

The amount of systemMemory needs to be specified in MB (so systemMemory: 2000 means 2GB). If more memory is requested than is available, the job will not run.

dockerImage

You can specify the Docker image that needs to be executed here.

This image can be specified in one of 3 formats:

  1. For docker hub images: Use <image_name>:<tag> for example: ubuntu:bionic or osrf/ros:melodic-desktop-full-bionic

  2. For images in public docker registries: Use the format <reg_url>:<reg_port>/<image_name>:<tag> for example: gpulab.ilabt.imec.be:5000/jupyter/tensorflow-notebook:latest

  3. For images in private docker registries: Use the format <username>:<password>@<reg_url>:<reg_port>/<image_name>:<tag> for example: gitlab+deploy-token-3:XXXXXXXXXXXXX@gitlab.ilabt.imec.be:4567/ilabt/gpulab/sample:nvidia-smi

    Note: this 3rd format is not a standard docker format: it’s a GPULab extension. The 1st and 2nd format are default docker formats.

iLab.t offers 2 options to store your docker images:

  • The iLab.t gitlab provides a private docker registry for each project.

    You can find all instruction on how to use this is the “Registry” section accessible from the left toolbar in gitlab. If this is missing for your project, you first need to enable it: In your gitlab project, go to Settings - General - Permissions. Here enable “Container registry”.

    To use the images you push to this registry in GPULab, you’ll need to setup a read-only deploy key for the registry (in Settings - Repository - Deploy Key). Use the 3rd format described above to pass the image and the deploy key to GPULab, for example: gitlab+deploy-token-3:XXXXXXXXXXXXX@gitlab.ilabt.imec.be:4567/ilabt/gpulab/sample:nvidia-smi

    Example of building an image and pushing it to this repository:

    docker build -t gitlab.ilabt.imec.be:4567/ilabt/myproj/sample:v1 .
    docker login gitlab.ilabt.imec.be:4567
    docker push gitlab.ilabt.imec.be:4567/ilabt/myproj/sample:v1
    
  • GPULab has a shared docker repository gpulab.ilabt.imec.be:5000. You can freely use it to store your custom docker images.

    Be aware that as this is a shared repository, anyone can access the image stored in it (full read and write access!). So do not store sensitive data inside images on this repository. Also note that there are no backups for this repository. You are responsible to keep your docker images backed up.

    Example of building an image and pushing it to this repository:

    docker build -t gpulab.ilabt.imec.be:5000/myname/sample:v1 .
    docker push gpulab.ilabt.imec.be:5000/myname/sample:v1
    

jobDataLocations

jobDataLocations allows you to specify which volumes must be attached to the Docker container.

GPULab containers can access the same storage as Virtual Wall 2 projects . To mount the project-folder to /project, you specify it as mountPoint:

"jobDataLocations": [
    {
       "mountPoint": "/project"
    }
],

This will cause a directory /project to be bound inside your docker container.

It will contain the same data as in /groups/wall2-ilabt-iminds-be/MyProject/. As the same NFS share is mounted behind the scenes, the data is instantly shared and never deleted.

You can also mount only sub directories of the /project dir this way:

"jobDataLocations": [
    {
       "sharePath": "/project/mycode/",
       "mountPoint": "/work/code/"
    }
],

portMappings

Sometimes, you want to access network services that run on the docker container. For example, a webserver showing status info might be running. To access web services, the ports need to be “exposed” by docker. You need to specify this in the job definition.

You can specify zero, one or more port mappings. An example:

"portMappings": [ { "containerPort": 80 } ]

This will map port 80 to a port on the host machine. The output of the jobs <job id> command will show which port.

You can also choose the port of the host machine, but this might cause the job to fail if the port is already in use:

"portMappings": [ { "containerPort": 80, "hostPort" : 8080 } ]