Skip to content

Python

Python 3.8

Python version 3.8.12 is available in DelftBlue software stack:

module load 2023r1
module load python/3.8.12

Then we have:

[<netid>@login01 ~]$ python
Python 3.8.12 (default, Mar 18 2022, 12:47:02)
[GCC 8.5.0 20210514 (Red Hat 8.5.0-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

Python modules

Many standard Python modules can be loaded as modules as well, for instance:

[<netid>@login01 ~]$ module load py-
py-beniget                             py-ply/3.11-a4o4kwv
py-beniget/0.4.1-4wt7gvc               py-protobuf
py-certifi                             py-protobuf/3.17.3-to7ecva
py-certifi/2021.10.8-ots3y7m           py-pybind11
py-cppy                                py-pybind11/2.6.2-6cjef4d
py-cppy/1.1.0-u4mnqrh                  py-pyparsing
py-cycler                              py-pyparsing/2.4.7-oqywzod
py-cycler/0.10.0-pamvavy               py-pytest-runner
py-cython                              py-pytest-runner/5.1-ikpagbz
py-cython/0.29.24-yxy6jml              py-python-dateutil
py-future                              py-python-dateutil/2.8.2-ynffg3x
py-future/0.18.2-sgjtepw               py-pythran
py-gast                                py-pythran/0.9.12-d6gcxcx
py-gast/0.5.2-zogveax                  py-pyyaml
py-joblib                              py-pyyaml/5.3.1-uut6fra
py-joblib/1.0.1-mnurghi                py-scikit-learn
py-kiwisolver                          py-scikit-learn/1.0.1-5ocfay5
py-kiwisolver/1.3.2-wbazcq4            py-scipy
py-mako                                py-scipy/1.7.1-nzraa6n
py-mako/1.1.4-5g4gyqu                  py-setuptools
py-markupsafe                          py-setuptools/57.4.0-ob7ew4x
py-markupsafe/2.0.1-74nr6zv            py-setuptools-scm
py-matplotlib                          py-setuptools-scm/6.3.2-7h2pp44
py-matplotlib/3.4.3-g5dyffg            py-six
py-numpy                               py-six/1.16.0-xq6htn3
py-numpy/1.21.3-rxwjzzh                py-threadpoolctl
py-packaging                           py-threadpoolctl/2.0.0-2kc7zps
py-packaging/21.0-d4wb4w2              py-tomli
py-pillow                              py-tomli/1.2.1-jadn5vr
py-pillow/8.0.0-kbhb6ix                py-tqdm
py-pip                                 py-tqdm/4.62.3-txurj4a
py-pip/21.1.2-vejdkam                  py-typing-extensions
py-ply                                 py-typing-extensions/3.10.0.2-lrwnaxm

For example, if we need numpy, scipy, and matplotlib, we need to load the following modules:

module load py-numpy
module load py-scipy
module load py-matplotlib

Then we have:

[<netid>@login01 ~]$ python
Python 3.8.12 (default, Mar 18 2022, 12:47:02)
[GCC 8.5.0 20210514 (Red Hat 8.5.0-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> import scipy
>>> import matplotlib
>>>

Install your own packages locally

Warning

Please be aware: package managers such as pip or conda are known to install a lot of tiny files locally. This is important for several reasons:

  1. These local installations might occupy a lot of space and often use your /home directory as their default destination. You might want to redirect them from /home to /scratch (see below for more info).

  2. These local installation might rely on the /tmp folder as an intermediate storage for unpacking/compiling. Please be aware that the collectively used /tmp might get overfilled! More info here.

  3. /home and /scratch rely on the parallel file system BeeGFS. While this file system provides high speed for truly parallel jobs (many processes reading/writing from/to one big file), it might struggle with processes generating a lot of tiny files. As such, installing packages via pip or conda might take noticeably longer than you would expect. This is normal, and only manifests itself once, during the installation. Once installed, accessing these packages should be very fast.

Do you need to install a python package that is not available as a module? You can for example use pip or conda to install it locally in your home directory. Pip is the preferred method as it typically installs fewer dependencies and is therefore more economic.

pip

First, load pip:

module load py-pip

Then, use pip to install packages locally, e.g.:

python -m pip install --user [your-package-name]

Warning

Mixing module-loaded packages and your locally installed packages might lead to dependency conflicts!

Avoiding version clash

For example, the standard py-numpy package is at the moment version 1.19.5. You can update this locally with the following command:

python -m pip install --upgrade --user numpy

And this will make a more recent (1.22.3) version available:

[<netid>@login03 ~]$ python -m pip list
Package       Version
------------- -------
...
numpy         1.22.3
...

However, if you then load the standard py-scipy package (currently, version 1.5.2), it will re-enable the default numpy version:

[<netid>@login03 ~]$ python -m pip list
Package       Version
------------- -------
...
numpy         1.19.5
scipy         1.5.2
...

In this case, you might want to install your own updated scipy version locally as well to avoid the version conflict:

python -m pip install --upgrade --user scipy

And then we have both more recent versions of numpy and scipy installed locally:

[<netid>@login03 ~]$ python -m pip list
Package       Version
------------- -------
...
numpy         1.19.5
scipy         1.8.0
...

conda

A similar result can be achieved by loading the miniconda3 module:

module load miniconda3
Local conda environment on login nodes

First, load the miniconda module:

module load miniconda3

Then, create your own conda environment:

conda create -y -p [my-conda-env]

After this, you might need to re-login! Otherwise, you might encounter the CommandNotFoundError: Your shell has not been properly configured to use 'conda activate' error.

After you re-login, you can activate your new environment:

conda activate [my-conda-env]

You should see the environment activated, which is indicated by the prefix to the login prompt:

(my-conda-env) [<netid>@login01 ~]$

Now you can install your own conda packages:

conda install [your-package-name]

To de-activate your environment, simply issue the following:

conda deactivate

To remove your environment, issue the following:

conda env remove --name [your-package-name]

To list all environments, issue the following:

conda env list

Warning

Even though conda activate [my-conda-env] works on the login node prompt, it might fail on the worker nodes.

The problem is that conda init adds the path of the currently active conda installation to your .bashrc, which is probably not what you want, as the conda might change depending on whether you are in compute or gpu mode. And it might not actually work on worker nodes.

It may be best to avoid conda init altogether and directly call conda.sh that comes with the installed version. That can be done with the following command, which calls conda.sh directly by extracting the long string from conda info):

unset CONDA_SHLVL
source "$(conda info --base)/etc/profile.d/conda.sh"
Why unset is necessary before conda init

If there are multiple versions/instances of conda on the system, the PATH may end up being resolved to the wrong python executable when running conda activate. To avoid this, unsetting the conda is required before activating your environment.

More details can be found here.

After running this command, conda activate works on all nodes.

Warning

miniconda3 might conflict with the vendor-preinstalled git!!! To avoid this conflict, load the new openssh and git modules from the DelftBlue software stack!

module load miniconda3
module load openssh
module load git

Python only prints STDOUT in a file after the job is finished

Example situation: I am running a Python code that contains print statements via Slurm. Normally when I run the Python code directly via python program.py the print statements appear in the terminal. When I run my program via Slurm, the print statements are written either in the output file specified in the submission script, or in the slurm-XXX.out. However, sometimes the contents of the slurm-XXX.out only appear after the job is actually finished, and not during the run, as I would expect.

This behaviour has to do with the buffering of the python's print command. You can either use the flush=True statement in the print command to flush the buffer to force the output to be printed:

print("hey", flush=True)

Or, if you know what you are doing, you can run python unbuffered:

python -u program.py

More details can be found here.