Welcome to DelftBlue¶
Welcome to DelftBlue documentation pages. We depend on your active participation to improve our services. Please interact with us and the entire community on Mattermost. Here you can also exchange your experiences, ask questions and share your coolest results in order to get in touch with other researchers, teachers and students. Here is some info on how to use Mattermost.
For technical issues, or if you want to contact the DHPC team and contact group of faculty experts directly, find DelftBlue in the TU Delft Self-Service Portal.
Important
Make sure you have looked through our Frequently Asked Questions before creating a call!
Status, News, and Maintenance Announcements¶
DelftBlue status
-
🟢 DelftBlue is up and running.
Recent Updates
Recent updates are published in the corresponding Mattermost channel.
Latest changes, effective from 1 September 2024
-
2024r1
software stack release. The next iteration of our software stack will be available for all node-types. The stack features new packages and new versions of software, improved installations of Julia and R, and multiple versions of OpenFOAM. With the release of2024r1
, the2022r2
and2024rc1
stacks will be removed. If you have been using these stacks, update your scripts to use2024r1
after September 1st (and before September 15). If you have been using2023r1
or2023r1-gcc11
, you are advised to switch, but you can keep using those for another year. -
Reduction of maximum job duration on GPU nodes. We have observed that the "fair share" idea sometimes fails on DelftBlue because the ratio of users to hardware (in particular GPU cards) is too high, and the allowed job duration too long. As buying more hardware or eliminating users are not an option, we opted to reduce the allowed runtime. From September 1st, jobs running on a GPU partition will be limited to a maximum of 48 hours for the research and project accounts (education will stay at 24 hours). This measure is one of several that aim at improving job throughput and the fair use of the system. Apply the new limit immediately. Any jobs that are still pending on September 1st that exceed the new limit won't be able to run. If your workloads take longer than 48 hours, you should write a checkpoint file to disk and restart the computation in a follow-up job. If you need help implementing this kind of "checkpoint-restart" mechanism, contact DHPC via the usual channels, or come to our office hours (see below).
-
Dedicated node for small GPU jobs. To allow more users to work with a GPU, in particular for experimental/development work, we will split the GPUs on one of the A100 nodes. This means that the partition
gpu-a100
will retain 9 nodes, and a new partitiongpu-a100-small
will be made available. The partition has one node with 28 GPUs, each of which has 10GB of device memory. These devices can be used for a runtime of up to four hours. Note that you should request only one or two CPU cores per GPU to avoid blocking access to the other GPUs implicitly. -
DHPC Office Hours. In order to help you with problems that are not suitable for quick, public discussion on Mattermost or a TOPdesk ticket, DHPC staff will be available on-site on Fridays, from 10-11. The location is the EEMCS building, room HB03.270.
- Upcoming maintenance
-
Next university-wide maintenance weekend: from Friday, 11 October 2024, 18:00 to Monday, 14 October 2024, 09:00. During the weekend, DelftBlue will be unavailable due to maintenance.
What is DHPC?¶
Delft High Performance Computing Centre (DHPC) is the central TU Delft computing center. Together with ICT, and in close collaboration with the TU Delft HPC community, we operate the DelftBlue supercomputer. Each faculty or institution that holds a share of the machine can denominate a member for the DHPC contact group, which is actively involved in providing training and advice for our users.
What is DelftBlue?¶
DelftBlue is a high-performance computer (HPC). Phase 1 consists of:
- 228 Intel Xeon compute nodes with 48 CPU cores and 185 GB RAM each
- 10 GPU nodes with four NVIDIA Tesla V100S GPUs with 32 GB video RAM each
- 10 high memory nodes (compute nodes with more RAM)
Phase 2 consists of:
- 90 additional Intel Xeon compute nodes with 64 CPU cores and 250 GB RAM each
- 9 additional GPU nodes with four NVIDIA A100 GPUs with 80 GB video RAM each
- 1 additional GPU node with NVIDIA A100 GPUs partitioned into 28 instances with 10 GB video RAM each
The nodes are interconnected with a high performance network that allows scalable parallel computing across the whole machine. For details, see the system description.
DelftBlue is not intended for sequential programs: your code or application should run on at least 1/4 of a node (12 cores). Jobs that efficiently use multiple nodes are given higher priority by the queueing system.
Accessing and using DelftBlue¶
Any employee with a NetID can log in remotely on DelftBlue using ssh or OpenOndemand via their browser. If you are outside the TU Delft network, you have to start an EduVPN connection to be able to connect.
After logging in, you will be on a login node. These nodes are intended only for setting up and starting your jobs. On a supercomputer, you do not run your program directly. Instead you write a job script containing the resources you need and the commands to execute, and submit it to a queue via the SLURM workload manager.
If you are a Master student, you have to request an account here. If you are a Bachelor student, you have to be a part of the course.
For guest researchers, the procedure is described here.
Accounting of computing time¶
By default, your jobs will run with relatively low priority, and if the machine is busy, it may take a long time until they get scheduled. To be able to run jobs with high priority, you can request access to your faculty's share. In order to do so, please use the TU Delft TOPdesk request form.
In the future, computing time will be increasingly assigned to projects that have requested dedicated funding for using HPC resources.
Data storage and transfer¶
Disk space on DelftBlue is comparatively limited. A small partition is available as home directories, but larger data sets for input to and output from your runs should in general be transferred to and from the machine's scratch disk (/scratch/<netid>
) before and after a (series of) job(s). Here you can find instructions on how to implement such a workflow. The scratch disk is regularly cleared of old files and must not be used as permanent storage!
Software¶
The DelftBlue's module system provides a wide variety of pre-install software, including compilers, libraries, and specialized applications.
For more software guides and recipes, please refer to pages in the Howtos section on the left menu (this opens a submenu with more subsections).
DelftBlue: Basic Performance Assessment and Optimization¶
We did some measurements and summarized the most important processor features in order to help you to make efficient use of the processors, network and file system in DelftBlue.
Note
The system has been reconfigured since these experiments were run, if you find your application does not perform as expected, please let us know.
How to cite us?¶
We will be very grateful if you could cite DelftBlue in your scientific publications, whenever you made use of our computational resources. Please refer to How to cite DHPC page for exact guidelines.
Courses and events¶
DelftBlue Crash-course for absolute beginners
Linux Command Line and DelftBlue Basics Course
Note
The DelftBlue part of the course (slides and excercises) can be found here.
Note
The recording of the last DelftBlue Online Introduction session can be found here.
DelftBlue Introduction for WI4049TU Students
Note
A shorter version of the presentation for WI4049TU students of EEMCS can be found here.