Tutorials

Using project storage

First create a jobDefinition which uses the shared project storage

cat > my-second-jobDefinition.json <<EOF
 {
     "jobDefinition": {
         "name": "my-2nd-gpulab-job",
         "description": "hello again world",
         "clusterId": 1,
         "dockerImage": "gpulab.ilabt.imec.be:5000/sample:nvidia-smi",
         "jobType": "BATCH",
         "command": "/project/start-my-gpulab-app.sh",
         "resources": {
             "gpus": 1,
             "systemMemory": 2000,
             "cpuCores": 1
         },
         "jobDataLocations": [
             {
                 "mountPoint": "/project"
             }
         ],
         "portMappings": []
     }
 }
EOF

Note that systemMemory is in MB, so systemMemory: 2000 means 2 GB.

Now, create start-my-gpulab-app.sh which is the start command in the jobDefinition. This will override any start command in the docker container.

$ cat > start-my-gpulab-app.sh <<EOF
#!/bin/bash
echo 'This is my test gpulab app.'
sleep 30
echo 'Ok, all the hard work is done now'
EOF

Upload this script to your wall2 shared project dir. (See storage documentation)

Submit the job:

$ gpulab-cli submit --project=MyProject < my-second-jobDefinition.json
db56c279f4d5499585856a76caf28ef2

Check its status and logs:

$ gpulab-cli jobs db56c279f4d5499585856a76caf28ef2

$ gpulab-cli log db56c279f4d5499585856a76caf28ef2

A real world example

Suppose the following typical usecase: we have software on our laptop to crunch data and we want to run this on GPUlab.

As the software, we take gpuburn (http://wili.cc/blog/gpu-burn.html), so we can download the source, but need to compile. For this, you need a linux machine with the CUDA SDK. First important thing, the current docker containers are only providing v8 of the lib, not v9. So download CUDA from https://developer.nvidia.com/cuda-80-ga2-download-archive. Then compile the software, and move the software to the Virtual Wall homedir /groups/wall2-ilabt-iminds-be/projectname. (much easier: use a virtual wall machine to download cuda and compile the software)

Second step, make the script to execute and put it also in the NFS homedir:

cat > start-my-gpulab-app.sh <<EOF
#!/bin/bash
ls /project
#ls /usr/local/cuda/
#ls /usr/local/cuda/lib64

date
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH}
cd /project/gpuburn/
./gpu_burn 3600 > /project/gpuburn/output_`date +%s`_$RANDOM.log 2>&1
sleep 30
echo 'ok'
date
EOF

A couple of important points here:

  • use /bin/bash at the first line to make sure that e.g. ls works
  • watch the export of LD_LIBRARY_PATH
  • because gpu_burn has also a file compare.ptx which is needed, you need to cd in that directory (might be true for other software as well)
  • GPULab is a bit sensitive at too much logging, so redirect your logging. Be careful if you launch multiple instances of the same job/script that the logfiles are unique per job (the example uses linux timestamp and a random number)
  • date comes in handy to know start and stop times

Then define the job description for this job (not needed on the homedir, should be your local laptop or e.g. agnesi.intec.ugent.be):

cat > my-third-jobDefinition.json <<EOF
{
 "jobDefinition": {
     "name": "projectdir",
     "description": "hello again world",
     "clusterId": 1,
     "dockerImage": "gpulab.ilabt.imec.be:5000/sample:nvidia-smi",
     "jobType": "BATCH",
     "command": "/project/start-my-gpulab-app.sh",
     "resources": {
         "gpus": 1,
         "systemMemory": 2000,
         "cpuCores": 1
     },
     "jobDataLocations": [
         {
             "mountPoint": "/project"
         }
     ],
     "portMappings": []
 }

}

So, we can use the same nvidia-smi docker container with the cuda 8 libs. Then launch this job, and verify that that the output comes in a logfile in your homedir.

gpulab-cli --cert bvermeu2_decrypted.pem  --dev  submit  --project=bvermeul9 < my-third-jobDefinition.json

You can launch this 12 times to fully load dev gpulab :-).

Using a custom docker image

This example features a custom docker image.

The docker registry is initially not secured and it is shared by all users of gpulab. This will be changed later.

First let’s look at the images available on the docker registry:

$ curl -X GET https://gpulab.ilabt.imec.be:5000/v2/_catalog
{"repositories":["alpine","sample"]}
$ curl -X GET https://gpulab.ilabt.imec.be:5000/v2/alpine/tags/list
{"name":"alpine","tags":["latest"]}
$ curl -X GET https://gpulab.ilabt.imec.be:5000/v2/sample/tags/list
{"name":"sample","tags":["nbody","matrixMulCUBLAS","bandwidthTest","deviceQuery","nvidia-smi","vectorAdd"]}

(TODO There might be a better way to query this info.)

Fetch an image locally:

docker pull gpulab.ilabt.imec.be:5000/sample:nvidia-smi

You can investigate the image if you like, but for this tutorial, that is not required:

$ docker run -t -i --entrypoint bash gpulab.ilabt.imec.be:5000/sample:nvidia-smi
root@ea1bb2a5bae1:/# ls /usr/local/cuda/bin
bin2c        crt       cuda-gdbserver  cudafe    cuobjdump  gpu-library-advisor  nvcc.profile  nvlink  nvprune
computeprof  cuda-gdb  cuda-memcheck   cudafe++  fatbinary  nvcc                 nvdisasm      nvprof  ptxas
root@7f2e1c1ded19:/# exit

Next, we’ll create a custom image to run our application.

mkdir my-first-gpulab-app
cd my-first-gpulab-app
cat > Dockerfile <<EOF
# Start from a sample image from nvidia
FROM gpulab.ilabt.imec.be:5000/sample:nvidia-smi

# Set the working directory to /project
#  (which will be linked to your project in the jobDefinition)
WORKDIR /project

# Install packages inside the container (if needed, openssl used as an example here)
RUN apt-get update
RUN apt-get install -y openssl

# Start your applications startup script in the /project dir
#  (which will be linked to your project in the jobDefinition)
CMD ["/project/start-my-gpulab-app.sh"]
EOF

Now build the image:

$ docker build -t gpulab.ilabt.imec.be:5000/my-first-gpulab-app .
Sending build context to Docker daemon  3.072kB
Step 1/5 : FROM gpulab.ilabt.imec.be:5000/sample:nvidia-smi
 ---> 3c8f5c1a3ca0
Step 2/5 : WORKDIR /project
 ---> Using cache
 ---> 2f16c8f05768
Step 3/5 : RUN apt-get update
 ---> Running in a6a7c73e4c60
Get:1 http://security.ubuntu.com/ubuntu xenial-security InRelease [102 kB]
Ign:2 http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  InRelease

... etc ...

Processing triggers for libc-bin (2.23-0ubuntu9) ...
 ---> ebcb9595c0ae
Removing intermediate container 95b110fd232b
Step 5/5 : CMD /project/start-my-gpu-app.sh
 ---> Running in 305d2e446862
 ---> 6aa40ec0a05e
Removing intermediate container 305d2e446862
Successfully built 6aa40ec0a05e
Successfully tagged my-first-gpulab-app:latest

And upload it to the repository:

docker push gpulab.ilabt.imec.be:5000/my-first-gpulab-app:latest

Next, create a job that uses this image:

cat > my-third-jobDefinition.json <<EOF
 {
     "jobDefinition": {
         "name": "my-2nd-gpulab-job",
         "description": "hello again world",
         "clusterId": 1,
         "dockerImage": "gpulab.ilabt.imec.be:5000/my-first-gpulab-app:latest",
         "jobType": "BATCH",
         "command": "",
         "resources": {
             "gpus": 1,
             "systemMemory": 2000,
             "cpuCores": 1
         },
         "jobDataLocations": [
             {
                 "mountPoint": "/project"
             }
         ],
         "portMappings": []
     }
 }
EOF

Note that in this case, no command is specified in the jobDefinition, so the command specified in CMD in the Dockerfile will be used.

Now that the preparations are done, submit the job:

$ gpulab-cli submit --project=MyProject < my-third-jobDefinition.json
e4f5652f51794cfa924c0503f6ffa2dc

Check it’s status and logs:

$ gpulab-cli jobs e4f
         Job ID: e4f5652f51794cfa924c0503f6ffa2dc
        Project: MyProject
       Username: username
   Docker image: gpulab.ilabt.imec.be:5000/my-first-gpulab-app:latest
        Command:
         Status: FINISHED (last change: 2017-08-25T16:03:51.128169)
        Created: 1503676999
         Worker: gpu2.gpulab.wall2-ilabt-iminds-be.wall2.ilabt.iminds.be#1
        Started: 2017-08-25T16:03:20
       Duration: 30 seconds
       Finished: 2017-08-25T16:03:51
$ gpulab-cli log e4f
2017-08-25T10:03:20.900523045Z  This is my test gpulab app. It is executed by the docker image if no command is specified
2017-08-25T10:03:50.902423420Z  Ok, all the hard work is done now

Running Jupyter Notebook

You can use GPUlab to run an interactive jupyter notebook server.

Docker image

Use this docker image: gpulab.ilabt.imec.be:5000/jupyter-example:v2

Or build a similar one. This is the Dockerfile of the image above:

FROM gpulab.ilabt.imec.be:5000/sample:nvidia-smi

RUN apt-get update && apt-get install -y build-essential libssl-dev libffi-dev python-dev && rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install -y python-pip python3-pip && rm -rf /var/lib/apt/lists/*
RUN pip install ipywidgets bcolz sympy ujson pandas matplotlib graphviz pydot jupyter
RUN pip3 install ipywidgets bcolz sympy ujson pandas matplotlib graphviz pydot jupyter

EXPOSE 8888
ENTRYPOINT ["/usr/local/bin/jupyter", "notebook", "--allow-root", "--no-browser", "--ip=0.0.0.0", "--port=8888", "--ContentsManager.root_dir='/project/'", "--NotebookApp.shutdown_no_activity_timeout=1800", "--MappingKernelManager.cull_idle_timeout=3600"]

These commands where used to build it and push it to the repository:

docker build -t gpulab.ilabt.imec.be:5000/jupyter-example:v2 .
docker push gpulab.ilabt.imec.be:5000/jupyter-example:v2

Running the jupyter notebook as a job

This is the job definition:

{
    "jobDefinition": {
        "name": "Jupyter-ex",
        "description": "Jupyter Notebook Example",
        "clusterId": 1,
        "dockerImage": "gpulab.ilabt.imec.be:5000/jupyter-example:v2",
        "jobType": "BATCH",
        "command": "",
        "resources": {
             "gpus": 1,
             "systemMemory": 4000,
             "cpuCores": 1
        },
        "jobDataLocations": [
            {
                "mountPoint": "/project"
            }
        ],
        "portMappings": [ { "containerPort": 8888 } ]
    }
}

Save this to a file named jupyterEx-jobDefinition.json

Then execute gpulab:

$ gpulab-cli submit --project YOURPROJECT --wait-run < jupyterEx-jobDefinition.json
eb8e6578-6251-11e8-b435-8b225f12d88d

Once done, get some info about the job:

$ gpulab-cli jobs eb8e6578

Look for the following lines to get the hostname and port:

Port Mappings: 8888/tcp -> 33043
  Worker Host: n085-01.wall2.ilabt.iminds.be

Also look in the job logs, to find the jupyter token. You can use grep to quickly find it:

$ gpulab-cli --dev log eb8e6578 | grep token=
2018-05-28T08:34:43.368617581Z [I 08:34:43.368 NotebookApp] http://e4807d176222:8888/?token=2469168f238518cc894a0c9349ff2091ffb0d3a123628269
2018-05-28T08:34:43.369187810Z         http://e4807d176222:8888/?token=2469168f238518cc894a0c9349ff2091ffb0d3a123628269&token=2469168f238518cc894a0c9349ff2091ffb0d3a123628269

Open your browser and use the corrected URL. In the example, this is:

http://n085-01.wall2.ilabt.iminds.be:33043/?token=2469168f238518cc894a0c9349ff2091ffb0d3a123628269

Note that you need IPv6 to access some of the worker nodes.

Once inside the jupyter notebook, you can open and save files. Note that the /project dir (wall2 shared project dir) is used as the “root” dir inside the notebook. This means that your files will be stored so that you can still use them when you exit and restart a notebook.

Attention

As long as the notebook is running, you’re reserving a GPU, so no one else can use that GPU. Please do not leave idle jupyter notebooks running, shut them down when they are not used, and restart them when you need them again.

The default settings of this job will shut down the notebook automatically, if there is no activity for 1,5 hour.

You can also manually use the “Quit” button at the top left of the window, to close the Jupyter notebook. This will stop the gpulab job. You can also stop the job using the cli:

$ gpulab-cli cancel eb8e6578-6251-11e8-b435-8b225f12d88d