GPULab Client CLI

Prerequisites

An Account

To use GPULab, you need an account on either:

After signing up, you need to download your Login Certificate (PEM), which you can find on the bottom of the Profile-page of your portal.

Python

To run the CLI, you need pip for python3 to install the gpulab-client. To install it on Debian/Ubuntu, try:

sudo apt-get install python3-pip

Make sure you have at least Python 3.4. You can check with:

python3 --version

Tip: Using pyenv to install GPULab in a separate environment

If your Linux distribution does not have a recent enough Python, try using pyenv which works on (almost) any Linux:

curl -L https://raw.githubusercontent.com/pyenv/pyenv-installer/master/bin/pyenv-installer | bash
pyenv update
#optional on debian:   apt-get install libbz2-dev libreadline-dev libsqlite3-dev
pyenv install 3.6.2
pyenv local 3.6.2
pyenv versions
python3 --version

The last command should show that you now have a recent enough python version.

(Note: If you install python locally using this method, you do not need to add sudo in front of the installation command in the next section.)

Installation

To install, run:

pip3 install imecilabt-gpulab-cli

The python “pip” system will take care of all details. You will end up with a local install of gpulab-cli

GPULab requires python >= 3.7, which is default in ubuntu from 19.04

Please report any bugs to: gpulab@ilabt.imec.be

Basic CLI usage

After installation, the gpulab-cli command is available:

$ gpulab-cli --help
Usage: gpulab-cli [OPTIONS] COMMAND [ARGS]...

   GPULab client version 2.1.0

   This is the general help. For help with a specific command, try:
          gpulab-cli <command> --help

   Send bugreports, questions and feedback to: gpulab@ilabt.imec.be

   Documentation: https://doc.ilabt.imec.be/ilabt/gpulab/
   Overview page: https://gpulab.ilabt.imec.be/

Options:
  --cert PATH          Login certificate  [required]
  -p, --password TEXT  Password associated with the login certificate
  --dev                Use the GPULab development environment
  --servercert PATH    The file containing the servers (self-signed)
                       certificate. Only required when the server uses a self
                       signed certificate.
  --version            Print the GPULab client version number and exit.
  -h, --help           Show this message and exit.

Commands:
  cancel    Cancel running job
  clusters  Retrieve info about the available clusters
  debug     Retrieve a job's debug info. (Do not rely on the presence or
            format of this info. It will never be stable between versions. If
            this has the only source o info you need, ask the developrs to
            expose that info in a different way!)
  hold      Hold queued job(s). Status will change from QUEUED to ONHOLD
  jobs      Get info about one or more jobs
  log       Retrieve a job's log
  release   Release held job(s). Status will change from ONHOLD to QUEUED
  rm        Remove job
  submit    Submit a job request to run
  wait      Wait for a job to change state

To get a list of currently running jobs:

$ gpulab-cli --cert /home/me/my_wall2_login.pem jobs
TASK ID                              NAME                 COMMAND              CREATED                   USER         PROJECT         STATUS
7eaec798-ac49-11e9-93a1-cfd87533270a JupyterHub-singleuse ***                  2019-07-22T08:25:26+02:00 pbonte       Orca            RUNNING
ca38f0c4-ac23-11e9-93a1-1757063cb6eb JupyterHub-singleuse ***                  2019-07-22T03:55:32+02:00 ykuno        DeepBeamforming RUNNING
da112f92-ac1f-11e9-93a1-cbc72e039c23 JupyterHub-singleuse ***                  2019-07-22T03:27:21+02:00 ykuno        DeepBeamforming CANCELLED

Error ee too small

When trying to a login certificate from the legacy portal (authority.ilabt.iminds.be) on a recent Linux Distro, you can run into an error ee too small, caused by key length of the legacy root certificate which is considered too small - and thus unsafe - by today’s standards.

You can fix this by lowering the SECLEVEL which OpenSSL requires.

openssl.cnf.patch
--- openssl.cnf.bak	2020-08-28 10:39:51.130359500 +0200
+++ openssl.cnf	2020-08-28 10:43:38.950359500 +0200
@@ -22,6 +22,8 @@
 # (Alternatively, use a configuration file that has only
 # X.509v3 extensions in its main [= default] section.)
 
+openssl_conf = default_conf
+
 [ new_oids ]
 
 # We can add new OIDs in here for use by 'ca', 'req' and 'ts'.
@@ -348,3 +350,13 @@
 				# (optional, default: no)
 ess_cert_id_alg		= sha1	# algorithm to compute certificate
 				# identifier (optional, default: sha1)
+
+[default_conf]
+ssl_conf = ssl_sect
+
+[ssl_sect]
+system_default = system_default_sect
+
+[system_default_sect]
+MinProtocol = TLSv1.1
+CipherString = DEFAULT@SECLEVEL=1

To apply this patch, execute the following commands:

cd /etc/ssl
patch < openssl.cnf.patch

Recommendation

The GPULab CLI also supports getting your login certificate information from an environment variable. This allows you to omit the --cert argument from all feature commands.

export GPULAB_CERT='/home/me/my_wall2_login.pem'
export GPULAB_DEV='False'

If you append these exports to ~/.bashrc you’ll never have to type them again!

To same command to get a list of currently running jobs is now much shorter:

gpulab-cli jobs

Note

Using the CLI without password

You can use the CLI without password. Be aware that this lowers security.

You need to install openssl to execute the commands below. On Debian, try:

sudo apt-get install openssl

The password is “stored” in the PEM file, because it is used to encrypt the private RSA key inside the PEM file. You can decrypt the RSA key and store it, to remove the password. Below, we assume that your (password protected) wall2 PEM file is in my_wall2_login.pem. The commands will create the file my_wall2_login_decrypted.pem which will not be password protected.

Use these commands:

openssl rsa -in my_wall2_login.pem > my_wall2_login_decrypted.pem
sed -ne '/-----BEGIN CERTIFICATE-----/,/-----END CERTIFICATE-----/p' < my_wall2_login.pem >> my_wall2_login_decrypted.pem

(The first command will ask your password, the second won’t)

Submitting a GPULab Job

A GPULab job is defined by a JSON job request, which looks as follows:

my-first-jobRequest.json
{
    "name": "helloworld",
    "description": "Hello World!",
    "request": {
        "resources": {
            "cpus": 2,
            "gpus": 1,
            "cpuMemoryGb": 2,
            "clusterId": 4
        },
        "docker": {
            "image": "debian:stable",
            "command": "nvidia-smi"
        },
        "scheduling": {
            "interactive": true
        }
    }
}

To submit the job, you’ll have to specify the name of the project on the wall2 authority in which you want it to run. The command is:

$ gpulab-cli submit --project=myproject < my-first-jobRequest.json
78125766-0b45-11e8-be1c-0fbd357c0b05

A hash representing the job ID is returned.

Getting information on a running job

You can query the status of this job using (an unique prefix of) the job ID:

$ gpulab-cli jobs 7812
          Job ID: 78125766-0b45-11e8-be1c-0fbd357c0b05
           Name: helloworld
        Project: fed4fire
       Username: wvdemeer
   Docker image: debian:stable
        Command: nvidia-smi
         Status: FINISHED
        Created: 2018-02-06T13:56:02-07:00
      Worker ID: -
Worker hostname: 192.168.0.1
        Started: 2018-02-06T14:09:44-07:00
       Duration: 1 second
       Finished: 2018-02-06T14:09:45-07:00
       Deadline: 2018-02-07T00:09:44-07:00

Getting logs of a job

You can view the command line output of the job using log:

$ gpulab-cli log 7812
2018-02-06T14:09:45.185400608Z
2018-02-06T14:09:45.185451167Z ==============NVSMI LOG==============
2018-02-06T14:09:45.185459009Z
2018-02-06T14:09:45.185466771Z Timestamp                           : Tue Feb  6 14:09:45 2018
2018-02-06T14:09:45.185471972Z Driver Version                      : 390.12
2018-02-06T14:09:45.185477068Z
2018-02-06T14:09:45.185490896Z Attached GPUs                       : 1
2018-02-06T14:09:45.185612743Z GPU 00000000:02:00.0
2018-02-06T14:09:45.185713708Z     Product Name                    : GeForce GTX 580
2018-02-06T14:09:45.186226030Z     Product Brand                   : GeForc
...

Getting the GPULab event log of a job

You can view the internal event log of GPULab. This is mostly useful for debugging purposes. It can contain error messages thrown by the GPULab code which allow you to find the error in your job request, or to submit a bugreport.

$ gpulab-cli debug ffe249

2019-07-22T11:13:41+02:00 STATE CHANGED -> QUEUED
2019-07-22T11:13:45+02:00 STATE CHANGED -> STARTING
2019-07-22T11:13:45+02:00 LOG INFO: Running job on cluster 5 host slave5a (slave5a.wall2.ilabt.iminds.be) worker 0
2019-07-22T11:13:46+02:00 LOG INFO: Fetching latest version of image 'jupyter/minimal-notebook:latest' (no auth)
2019-07-22T11:13:48+02:00 LOG INFO: Fetched image 'jupyter/minimal-notebook:latest' with hash sha256:37c4f3f362331d1107fcd8ea8d16a8b6b8171ad437e8b933bd0d35bd98dea718
2019-07-22T11:13:48+02:00 LOG INFO: Launching JupyterHub-singleuser (image jupyter/minimal-notebook:latest with command ['start-notebook.sh', '--notebook-dir=/project/', '--NotebookApp.default_url=/lab']
2019-07-22T11:13:48+02:00 LOG DEBUG: OK: Project dir "/data/twalcari-test" exists on slave5a-f2333333333333333333@stable-cluster5
2019-07-22T11:13:48+02:00 LOG DEBUG: OK: Project dir "/data/twalcari-test" exists on slave5a-f2333333333333333333@stable-cluster5
2019-07-22T11:13:48+02:00 LOG DEBUG:    Mem_limit=2048m  (job requested 2048m)
2019-07-22T11:13:48+02:00 LOG DEBUG:    CPU ID restriction: None
2019-07-22T11:13:48+02:00 LOG DEBUG: Successfully setup SSH pubkey access step 1. ssh_username=D5MTPCUG
2019-07-22T11:13:49+02:00 LOG INFO: Started container 577de083df9fce36fef3a90573faa27b053641b39dc2421833986463831b385d
2019-07-22T11:13:49+02:00 LOG DEBUG: port_mapping_info={"8888/tcp": [{"HostIp": "0.0.0.0", "HostPort": "32851"}]}
2019-07-22T11:13:49+02:00 LOG DEBUG: ReadLogsThread for stdout stderr started
2019-07-22T11:13:49+02:00 LOG DEBUG: Successfully setup SSH pubkey access step 2. ssh_username=D5MTPCUG
2019-07-22T11:13:49+02:00 LOG DEBUG: Job run_details have been updated
2019-07-22T11:13:49+02:00 STATE CHANGED -> RUNNING
2019-07-22T11:14:08+02:00 LOG DEBUG: Docker container status is now "running". Job status is "JobState.STARTING".
2019-07-22T11:14:27+02:00 LOG DEBUG: Docker container status is now "running". Job status is "JobState.RUNNING".

Getting console access to a GPULab job

You can gain console access to a job by using the ssh-command of the gpulab-cli. GPULab emulates SSH-support on a Docker container by starting a Bash-shell via docker exec, and piping the input/output over the SSH-connection. Because of this, GPULab SSH doesn’t support SFTP, SCP, tunnels and other advanced SSH-features.

Example:

$ gpulab-cli ssh ffe249

On connectivity

The GPULab-slaves have no public IPv4 addresses. To access your job over SSH, you need either IPv6 connectivity, or you must connect to the IDLab VPN to access them via their private IPv4 address.

GPULab will automatically try to use a proxy if you have no IPv6 connectivity.

Note

You should only use this function for debugging and developing your jobs. Generally speaking jobs should be able to run without manual intervention. To achieve this, you can setup the environment for your job by creating a custom Docker container and/or by running a startup script.

Interactive jobs

When you’re preparing a job, you don’t always know the command you want to run yet, or the script you want to run is not yet ready. In these cases, you want to test things out inside a GPULab container, but don’t want to container to stop when the command stops. You can start a job with sleep 600 as command, and log in with gpulab-cli ssh, GPULab also has a specific command for this: gpulab-cli interactive.

gpulab-cli interactive requires you to specify the docker image, the duration and your project:

gpulab-cli interactive --project my-project --duration-minutes 10 --docker-image debian:stable

This will start a job, which waits for 10 minutes before stopping. GPULab also automatically connects over ssh, so that you can run commands. Inside the job, you’ll find a file /running. If you delete it, the job will stop (you can also cancel the job using gpulab-cli cancel from another terminal). An example:

~ $ gpulab-cli --dev interactive --project my-project --duration-minutes 10 --docker-image debian:stable
30c1247c-b1f3-11e9-93a1-2f2d78d6eb89
2019-07-29 13:22:46 +0200 -            - Waiting for Job to start running...
2019-07-29 13:22:48 +0200 - 2 seconds  - Job is in state STARTING
2019-07-29 13:22:50 +0200 - 4 seconds  - Job is in state RUNNING
2019-07-29 13:22:50 +0200 - 4 seconds  - Job is now running
root@86b0f7321a8f:/# ls
bin   dev  home  lib64  mnt  proc     root  running  srv  tmp  var
boot  etc  lib   media  opt  project  run   sbin     sys  usr
root@86b0f7321a8f:/# rm /running
root@86b0f7321a8f:/# Connection to n085-02.wall2.ilabt.iminds.be closed.
wim@tolstoy ~ $

If you log out of the ssh session (without removing /running), the job does not stop, and you can reconnect to it using gpulab-cli ssh.

Check gpulab-cli interactive --help for options that let you specify the cluster ID, the number of CPUs and GPUs, the amount of memory, and more.