Conda¶
Install your own packages locally¶
Warning
Please be aware: package managers such as conda
are known to install a lot of tiny files locally. This is important for several reasons:
-
These local installations might occupy a lot of space and often use your
/home
directory as their default destination. You might want to redirect them from/home
to/scratch
(see below for more info). -
These local installation might rely on the
/tmp
folder as an intermediate storage for unpacking/compiling. Please be aware that the collectively used/tmp
might get overfilled! More info here. -
/home
and/scratch
rely on the parallel file system BeeGFS. While this file system provides high speed for truly parallel jobs (many processes reading/writing from/to one big file), it might struggle with processes generating a lot of tiny files. As such, installing packages viaconda
might take noticeably longer than you would expect. This is normal, and only manifests itself once, during the installation. Once installed, accessing these packages should be very fast.
Do you need to install a package that is not available as a module? You can for example use conda
to install it locally in your own /home
or /scratch
directory.
As the storage space needed for these package can increase rapidly you might run out of space on your home directory. Typically the following directories are used to store your packages:
installer | location | purpose |
---|---|---|
conda | $HOME/.conda |
installed and cached packages |
You can create these directories on the scratch storage and link to them in your home directory. For example when using conda:
Warning
The scratch storage is shared among all users and can fill up over time. As per design this storage can be cleaned up regularly! Make sure you save your commands for the installation of your packages for reinstallation after a clean-up, or backup your conda
installation to a network drive (preferably to staff-umbrella
, as staff-homes
also has 8 GB quota).**
To backup your conda
installation to a network drive, you can, for example, use the rsync
command:
Using conda¶
Loading the miniconda3
module:
Local conda environment on login nodes
First, load the miniconda
module:
Then, create your own conda
environment:
After this, you might need to re-login! Otherwise, you might encounter the CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'
error.
After you re-login, you can activate your new environment:
You should see the environment activated, which is indicated by the prefix to the login prompt:
Now you can install your own conda
packages:
To de-activate your environment, simply issue the following:
To remove your environment, issue the following:
To list all environments, issue the following:
Debudding conda init
issues
Even though conda activate [my-conda-env]
works on the login node prompt, it might fail on the worker nodes.
The problem is that conda init
adds the path of the currently active conda
installation to your .bashrc
, which is probably not what you want, as the conda
might change depending on whether you are in compute
or gpu
mode. And it might not actually work on worker nodes.
It may be best to avoid conda init
altogether and directly call conda.sh
that comes with the installed version. That can be done with the following command, which calls conda.sh
directly by extracting the long string from conda info
):
If there are multiple versions/instances of conda
on the system, the PATH
may end up being resolved to the wrong python executable when running conda activate
. To avoid this, unsetting the conda is required before activating your environment.
More details can be found here.
After running this command, conda activate
works on all nodes.
Debugging git
issues
miniconda3
might conflict with the vendor-preinstalled git
!!! To avoid this conflict, load the new openssh
and git
modules from the DelftBlue software stack!
Example of a conda submission script¶
A typical example of a conda submission script looks as follows:
#!/bin/bash
#SBATCH --job-name="jobname"
#SBATCH --time=01:00:00
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G
#SBATCH --partition=compute
#SBATCH --account=research-<faculty>-<department>
# Load modules:
module load 2023r1
# Activate conda, run job, deactivate conda
conda activate <name-of-my-conda-environment>
srun python myscript.py
conda deactivate
Warning
Certain packages may not work if you install them on the login node and try to use them on a GPU node.
Reasons for this include (i) that the host processors on the gpu-v100
partition have a different architecture (AMD),
and (ii) the software may try to detect the available GPU(s) during installation time. If you encounter such issues, read
the installation instructions of the respective software package carefully to see how to specify the target architecture and type
of GPU to install for. If the software does not allow this kind of "cross-installation", either specify the exact package version
to use, or download the package from conda forge and install it from within an interactive job on a GPU node.