Skip to content

Conda

Install your own packages locally

Warning

Please be aware: package managers such as conda are known to install a lot of tiny files locally. This is important for several reasons:

  1. These local installations might occupy a lot of space and often use your /home directory as their default destination. You might want to redirect them from /home to /scratch (see below for more info).

  2. These local installation might rely on the /tmp folder as an intermediate storage for unpacking/compiling. Please be aware that the collectively used /tmp might get overfilled! More info here.

  3. /home and /scratch rely on the parallel file system BeeGFS. While this file system provides high speed for truly parallel jobs (many processes reading/writing from/to one big file), it might struggle with processes generating a lot of tiny files. As such, installing packages via conda might take noticeably longer than you would expect. This is normal, and only manifests itself once, during the installation. Once installed, accessing these packages should be very fast.

Do you need to install a package that is not available as a module? You can for example use conda to install it locally in your own /home or /scratch directory.

As the storage space needed for these package can increase rapidly you might run out of space on your home directory. Typically the following directories are used to store your packages:

installer location purpose
conda $HOME/.conda installed and cached packages

You can create these directories on the scratch storage and link to them in your home directory. For example when using conda:

mkdir -p /scratch/${USER}/.conda
ln -s /scratch/${USER}/.conda $HOME/.conda

Warning

The scratch storage is shared among all users and can fill up over time. As per design this storage can be cleaned up regularly! Make sure you save your commands for the installation of your packages for reinstallation after a clean-up, or backup your conda installation to a network drive (preferably to staff-umbrella, as staff-homes also has 8 GB quota).**

To backup your conda installation to a network drive, you can, for example, use the rsync command:

rsync -av /scratch/${USER}/.conda /tudelft.net/staff-umbrella/<project folder>/

Using conda

Loading the miniconda3 module:

module load miniconda3
Local conda environment on login nodes

First, load the miniconda module:

module load miniconda3

Then, create your own conda environment:

conda create -y -p [my-conda-env]

After this, you might need to re-login! Otherwise, you might encounter the CommandNotFoundError: Your shell has not been properly configured to use 'conda activate' error.

After you re-login, you can activate your new environment:

conda activate [my-conda-env]

You should see the environment activated, which is indicated by the prefix to the login prompt:

(my-conda-env) [<netid>@login01 ~]$

Now you can install your own conda packages:

conda install [your-package-name]

To de-activate your environment, simply issue the following:

conda deactivate

To remove your environment, issue the following:

conda env remove --name [your-package-name]

To list all environments, issue the following:

conda env list

Warning

Even though conda activate [my-conda-env] works on the login node prompt, it might fail on the worker nodes.

The problem is that conda init adds the path of the currently active conda installation to your .bashrc, which is probably not what you want, as the conda might change depending on whether you are in compute or gpu mode. And it might not actually work on worker nodes.

It may be best to avoid conda init altogether and directly call conda.sh that comes with the installed version. That can be done with the following command, which calls conda.sh directly by extracting the long string from conda info):

unset CONDA_SHLVL
source "$(conda info --base)/etc/profile.d/conda.sh"
Why unset is necessary before conda init

If there are multiple versions/instances of conda on the system, the PATH may end up being resolved to the wrong python executable when running conda activate. To avoid this, unsetting the conda is required before activating your environment.

More details can be found here.

After running this command, conda activate works on all nodes.

Warning

miniconda3 might conflict with the vendor-preinstalled git!!! To avoid this conflict, load the new openssh and git modules from the DelftBlue software stack!

module load miniconda3
module load openssh
module load git

Example of a conda submission script

A typical example of a conda submission script looks as follows:

#!/bin/bash
#SBATCH --job-name="jobname"
#SBATCH --time=01:00:00
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G
#SBATCH --partition=compute
#SBATCH --account=research-<faculty>-<department>

# Load modules:
module load 2022r2
module load openmpi
module load miniconda3

# Set conda env:
unset CONDA_SHLVL
source "$(conda info --base)/etc/profile.d/conda.sh"

# Activate conda, run job, deactivate conda
conda activate <name-of-my-conda-environment>
srun python myscript.py
conda deactivate

Warning

If you are running your conda environment on gpu nodes, you have to install/compile things on gpu nodes as well! Installing packages in your conda environemnt on the login node and then trying to run them on the gpu node will most likely not work, because the hardware is completely different. What you need to do is:

  1. Reserve a GPU node interactively.

  2. Load whatever modules you need (e.g. 2022r2, openmpi, miniconda3).

  3. Create a new virtual environment and install whatever packages you want to have in it.

  4. Test that it generally works.

  5. Logout from the GPU node.

  6. Submit a job from the login node. In your submission script, again, load the necessary modules, and source the virtual environment that you created on a gpu node. Do not forget to add #SBATCH --partition=gpu and #SBATCH --gpus-per-task=.