Performance characteristics of the DelftBlue 'gpu' nodes¶
We focus on the two hardware characteristics that determine performance of typical CS\&E workflows: Memory bandwidth and floating-point performance. More detailed benchmark results for all node types (and including some applications) can be found in this report.
Memory bandwidth¶
The bandwidth (in GB/s) achieved by different operations (load/store ratios) was measured as
| benchmark | V100s | A100 |
|---|---|---|
| load | 570 | 1560 |
| store | 1120 | 1780 |
| triad | 1010 | 1690 |
| --------- | ----- | ---- |
Floating-point performance¶
Depending on their computational intensity (ratio of computation to data transfers), applications are either memory- or compute-bound. For floating-point operations in double, single and half precision, the achieved performance (in GFlop/s) is shown below.
| V100s (Phase 1) | A100 (Phase 2) |
|---|---|
![]() |
![]() |
| ------------------------------------------------------------- | ------------------------------------------------------------ |
Measuring GPU usage of your application¶
On the GPU nodes, you can use the nvidia-smi command to monitor the GPU usage by your executable. Add the following to your submission scripts:
# Measure GPU usage of your job (initialization)
previous=$(nvidia-smi --query-accounted-apps='gpu_utilization,mem_utilization,max_memory_usage,time' --format='csv' | /usr/bin/tail -n '+2')
# Use this simple command to check that your sbatch settings are working (it should show the GPU that you requested)
nvidia-smi
# Your job commands go below here
# load modules you need...
# Computations should be started with 'srun'. For example:
#srun python my_program.py
# Your job commands go above here
# Measure GPU usage of your job (result)
nvidia-smi --query-accounted-apps='gpu_utilization,mem_utilization,max_memory_usage,time' --format='csv' | /usr/bin/grep -v -F "$previous"

