Skip to content

Performance characteristics of the DelftBlue 'gpu' nodes

We focus on the two hardware characteristics that determine performance of typical CS\&E workflows: Memory bandwidth and floating-point performance. More detailed benchmark results for all node types (and including some applications) can be found in this report.

Memory bandwidth

The bandwidth (in GB/s) achieved by different operations (load/store ratios) was measured as

benchmark V100s A100
load 570 1560
store 1120 1780
triad 1010 1690
--------- ----- ----

Floating-point performance

Depending on their computational intensity (ratio of computation to data transfers), applications are either memory- or compute-bound. For floating-point operations in double, single and half precision, the achieved performance (in GFlop/s) is shown below.

V100s (Phase 1) A100 (Phase 2)
V100s "mixbench" results A100  "mixbench" results
------------------------------------------------------------- ------------------------------------------------------------

Measuring GPU usage of your application

On the GPU nodes, you can use the nvidia-smi command to monitor the GPU usage by your executable. Add the following to your submission scripts:

# Measure GPU usage of your job (initialization)
previous=$(nvidia-smi --query-accounted-apps='gpu_utilization,mem_utilization,max_memory_usage,time' --format='csv' | /usr/bin/tail -n '+2')

# Use this simple command to check that your sbatch settings are working (it should show the GPU that you requested)
nvidia-smi

# Your job commands go below here

# load modules you need...

# Computations should be started with 'srun'. For example:
#srun python my_program.py

# Your job commands go above here

# Measure GPU usage of your job (result)
nvidia-smi --query-accounted-apps='gpu_utilization,mem_utilization,max_memory_usage,time' --format='csv' | /usr/bin/grep -v -F "$previous"