Logical CPU’s and Hyperthreading

resources.cpu in the job request refers to “logical” CPU cores, as returned by /proc/cpuinfo in Linux. In the linux kernel’s definition of logical CPU, there are 2 CPUs for each physical CPU core with hyperthreading. This means that hyper-threading doubles the number of logical CPUs.

For example, a system with 2 CPUs with 2 physical cores each and hyperthreading, has 8 “logical processors”. So in GPULab, this system has 8 cpus available.

However, it is important to know that 2 hyperthreads on the same core give performance gains from -5% to 50% compared to using a single thread on the core. CPU intensive computation tasks typically have low gains. I/O intensive tasks have the highest gains. So if some other user’s job were to use a logical CPU on the same physical CPU core as your job, you’d see big performance fluctuations, which is undesirable.

To prevent this, when you request multiple cpus in GPULab, you will receive logical CPUs sharing the same core, as much as possible. That means that if you request cpus: 8, GPULab will (try hard) to give you 4 dedicated CPU cores, on each of which you’ll get the 2 (hyper)threads, totalling the requested 8 logical CPUs. GPULab passes the logical CPU ID’s to docker when starting the containers for the job, using the docker --cpuset-cpus option. This uses the cpuset functionality of the linux kernel.

Inside a container, you’ll see that /proc/cpuinfo and lscpu return all CPUs in on the host, not just the CPUs assigned to your job. If you need to know the IDs of the CPUs your job is restricted to, check either /sys/fs/cgroup/cpuset/cpuset.cpus or $GPULAB_CPUS_RESERVED. An example:

$ gpulab-cli --dev interactive --project ilabt-dev --cluster-id 1 --duration-minutes 10 --docker-image debian:stable --cpus 2
cf137c7c-a89f-11e9-93a1-db841704c121
2019-07-17 16:33:14 +0200 -            - Waiting for Job to start running...
2019-07-17 16:33:16 +0200 - 2 seconds  - Job is in state RUNNING
2019-07-17 16:33:16 +0200 - 2 seconds  - Job is now running
root@6d902d7ac334:/# cat /sys/fs/cgroup/cpuset/cpuset.cpus
2,8
root@6d902d7ac334:/# echo $GPULAB_CPUS_RESERVED
2,8

lscpu -e provides a human readable overview of how logical CPU ID’s are mapped to physical sockets/cores. An example:

root@6d902d7ac334:/# lscpu -e
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE    MAXMHZ    MINMHZ
  0    0      0    0 0:0:0:0          yes 3200.0000 1550.0000
  1    0      0    1 1:1:1:0          yes 3200.0000 1550.0000
  2    0      0    2 2:2:2:0          yes 3200.0000 1550.0000
  3    0      0    3 3:3:3:1          yes 3200.0000 1550.0000
  4    0      0    4 4:4:4:1          yes 3200.0000 1550.0000
  5    0      0    5 5:5:5:1          yes 3200.0000 1550.0000
  6    0      0    0 0:0:0:0          yes 3200.0000 1550.0000
  7    0      0    1 1:1:1:0          yes 3200.0000 1550.0000
  8    0      0    2 2:2:2:0          yes 3200.0000 1550.0000
  9    0      0    3 3:3:3:1          yes 3200.0000 1550.0000
 10    0      0    4 4:4:4:1          yes 3200.0000 1550.0000
 11    0      0    5 5:5:5:1          yes 3200.0000 1550.0000

So practically, how much CPUs should you request?

  • For some jobs, CPU power is unimportant. This is not the case for jobs that do GPU computations! (See tip: “Required CPU cores for GPU computations”). For these jobs, it’s fine to request cpus: 1. You’ll get at least half a physical CPU core’s worth of performance.

  • If processing power is important (you’re doing CPU or GPU based computations), always request a multiple of 2 CPUs. This makes sure you don’t share a physical CPU core with another job.

  • If you want to run parallel CPU intensive computation tasks, request double the ammount of logical CPUs as the number of wanted parallel computation tasks.

    For example, 4 parallel compute threads/processes you typically want to request cpus: 8. Note that if you have 8 logical CPUs available, and run only 4 processes/threads, the kernel will always schedule these 4 to use different physical CPU cores.

Required cpus for GPU computations

Jobs that use one or more GPUs to do their computations, do in fact need enough CPU cores for optimal performance.

For jobs that do computations on a single GPU, using cpus: 2 instead of cpus: 1 typically gives a big performance boost. Using more than 2 cpus usually does not improve performance significantly.

For this reason, we recommend using twice the ammount of cpus per GPU used.

So for jobs using 1 GPU use cpus: 2. For jobs using 2 GPU cores use cpus: 4. For jobs using 3 GPU cores use cpus: 6, …

Tip: Application Threads

How many threads/processes should the application in a job use, if it has X physical CPU cores available (= cpus: X*2)?

This depends on the application. Normally, if there is no I/O or synchronisation, but just pure calculation, X threads is optimal. You might see some small hyperthreading performance gain by doubling the number parallel compute threads/processes, but you have to test to find out.

If there is I/O involved, you will gain by using more threads. If there’s only very light I/O, doubling the threads to use hyperthreading is probably a good guess, a heavy I/O bottleneck might require many more threads for optimal speed. Note that concepts like “asyncio” can be used instead of threads.

In the end, you’ll need to benchmark this, if you really need to know how many threads is right for your application.