Performance characteristics of the DelftBlue 'gpu' nodes¶

We focus on the two hardware characteristics that determine performance of typical CS\&E workflows: Memory bandwidth and floating-point performance. More detailed benchmark results for all node types (and including some applications) can be found in this report.

Memory bandwidth¶

The bandwidth (in GB/s) achieved by different operations (load/store ratios) was measured as

benchmark	V100s	A100
load	570	1560
store	1120	1780
triad	1010	1690
---------	-----	----

Floating-point performance¶

Depending on their computational intensity (ratio of computation to data transfers), applications are either memory- or compute-bound. For floating-point operations in double, single and half precision, the achieved performance (in GFlop/s) is shown below.

V100s (Phase 1)	A100 (Phase 2)

-------------------------------------------------------------	------------------------------------------------------------

Measuring GPU usage of your application¶

On the GPU nodes, you can use the nvidia-smi command to monitor the GPU usage by your executable. Add the following to your submission scripts:

# Measure GPU usage of your job (initialization)
previous=$(nvidia-smi --query-accounted-apps='gpu_utilization,mem_utilization,max_memory_usage,time' --format='csv' | /usr/bin/tail -n '+2')

# Use this simple command to check that your sbatch settings are working (it should show the GPU that you requested)
nvidia-smi

# Your job commands go below here

# load modules you need...

# Computations should be started with 'srun'. For example:
#srun python my_program.py

# Your job commands go above here

# Measure GPU usage of your job (result)
nvidia-smi --query-accounted-apps='gpu_utilization,mem_utilization,max_memory_usage,time' --format='csv' | /usr/bin/grep -v -F "$previous"