DelftBlue Policies¶
This page tries to explain some of the policies implemented on DelftBlue.
Disk quota and scratch space¶
We allow each user to use up to 30GB of disk space in their /home
directory, and at most 5TB on /scratch
. Data in the home directory is backed up, data in scratch should be regularly cleaned up. In case the scratch disk becomes too full, ICT may announce an automatic clean-up a few days in advance so that you have time to save important data to your project drive.
Rationale: Data-intensive workflows benefit a lot from the availability of fast scratch storage. All nodes have access to /scratch
(and /home
), so if you explicitly transfer data to DelftBlue (by using the file transfer nodes), you can speed-up the overall execution of the workflow significantly (at least a factor of 10) compared to having every process access the central storage. It may take a one-time effort to adjust your workflow, but afterwards you will reap the fruits! Furthermore, having the data transfer to DelftBlue as a separate job allows you to run multiple applications on the same data (after each other or in parallel).
Jobs are limited to 120 hours¶
The slurm queue currently allows jobs to have a run time of at most 120 hours. In the future, we may introduce a 'long' queue that allows using a smaller number of nodes for longer.
Rationale:
- The machine is still unstable and ICT needs regular maintenance opportunities.
- We want to encourage users to optimize and parallelize their applications: that is the intended use of a supercomputer, after all.
- Regularly checkpointing your application if it runs for more than a few hours should be in your best interest because there is always the risk of a hardware or software failure. Once you have the checkpointing implemented/turned on, it is no longer a problem to restart in case your job did not finish before it hit the time limit.
- having long-running jobs prohibits shorter but more parallel jobs from being scheduled, and as mentioned before, those are the kind of applications that a supercomputer is designed and optimized for.
Limited number of GPU nodes¶
DelftBlue serves the DCSE community as a whole and is not primarily an AI platform. Due to the limited high-bandwidth memory and the relatively complex programming of GPUs, their use is not mainstream in CS&E. Furthermore, the availablility of many compute nodes makes it easier to accomodate many users at the same time.
The design of the system and selection of hardware was based on input from all groups of DCSE, and at this level decisions on future hardware investments will be made.