Python¶
Python 3.8¶
Python version 3.8.12 is available in DelftBlue software stack:
Then we have:
[<netid>@login01 ~]$ python
Python 3.8.12 (default, Mar 18 2022, 12:47:02)
[GCC 8.5.0 20210514 (Red Hat 8.5.0-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
Python modules¶
Many standard Python modules can be loaded as modules as well, for instance:
[<netid>@login01 ~]$ module load py-
py-beniget py-ply/3.11-a4o4kwv
py-beniget/0.4.1-4wt7gvc py-protobuf
py-certifi py-protobuf/3.17.3-to7ecva
py-certifi/2021.10.8-ots3y7m py-pybind11
py-cppy py-pybind11/2.6.2-6cjef4d
py-cppy/1.1.0-u4mnqrh py-pyparsing
py-cycler py-pyparsing/2.4.7-oqywzod
py-cycler/0.10.0-pamvavy py-pytest-runner
py-cython py-pytest-runner/5.1-ikpagbz
py-cython/0.29.24-yxy6jml py-python-dateutil
py-future py-python-dateutil/2.8.2-ynffg3x
py-future/0.18.2-sgjtepw py-pythran
py-gast py-pythran/0.9.12-d6gcxcx
py-gast/0.5.2-zogveax py-pyyaml
py-joblib py-pyyaml/5.3.1-uut6fra
py-joblib/1.0.1-mnurghi py-scikit-learn
py-kiwisolver py-scikit-learn/1.0.1-5ocfay5
py-kiwisolver/1.3.2-wbazcq4 py-scipy
py-mako py-scipy/1.7.1-nzraa6n
py-mako/1.1.4-5g4gyqu py-setuptools
py-markupsafe py-setuptools/57.4.0-ob7ew4x
py-markupsafe/2.0.1-74nr6zv py-setuptools-scm
py-matplotlib py-setuptools-scm/6.3.2-7h2pp44
py-matplotlib/3.4.3-g5dyffg py-six
py-numpy py-six/1.16.0-xq6htn3
py-numpy/1.21.3-rxwjzzh py-threadpoolctl
py-packaging py-threadpoolctl/2.0.0-2kc7zps
py-packaging/21.0-d4wb4w2 py-tomli
py-pillow py-tomli/1.2.1-jadn5vr
py-pillow/8.0.0-kbhb6ix py-tqdm
py-pip py-tqdm/4.62.3-txurj4a
py-pip/21.1.2-vejdkam py-typing-extensions
py-ply py-typing-extensions/3.10.0.2-lrwnaxm
For example, if we need numpy, scipy, and matplotlib, we need to load the following modules:
Then we have:
[<netid>@login01 ~]$ python
Python 3.8.12 (default, Mar 18 2022, 12:47:02)
[GCC 8.5.0 20210514 (Red Hat 8.5.0-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> import scipy
>>> import matplotlib
>>>
Install your own packages locally¶
Warning
Please be aware: package managers such as pip
or conda
are known to install a lot of tiny files locally. This is important for several reasons:
-
These local installations might occupy a lot of space and often use your
/home
directory as their default destination. You might want to redirect them from/home
to/scratch
(see below for more info). -
These local installation might rely on the
/tmp
folder as an intermediate storage for unpacking/compiling. Please be aware that the collectively used/tmp
might get overfilled! More info here. -
/home
and/scratch
rely on the parallel file system BeeGFS. While this file system provides high speed for truly parallel jobs (many processes reading/writing from/to one big file), it might struggle with processes generating a lot of tiny files. As such, installing packages viapip
orconda
might take noticeably longer than you would expect. This is normal, and only manifests itself once, during the installation. Once installed, accessing these packages should be very fast.
Do you need to install a python package that is not available as a module? You can for example use pip
or conda
to install it locally in your home directory.
Pip is the preferred method as it typically installs fewer dependencies and is therefore more economic.
pip
¶
First, load pip
:
Then, use pip
to install packages locally, e.g.:
Warning
Mixing module-loaded packages and your locally installed packages might lead to dependency conflicts!
Avoiding version clash
For example, the standard py-numpy
package is at the moment version 1.19.5. You can update this locally with the following command:
And this will make a more recent (1.22.3) version available:
However, if you then load the standard py-scipy
package (currently, version 1.5.2), it will re-enable the default numpy
version:
[<netid>@login03 ~]$ python -m pip list
Package Version
------------- -------
...
numpy 1.19.5
scipy 1.5.2
...
In this case, you might want to install your own updated scipy
version locally as well to avoid the version conflict:
And then we have both more recent versions of numpy
and scipy
installed locally:
conda
¶
A similar result can be achieved by loading the miniconda3
module:
Local conda environment on login nodes
First, load the miniconda
module:
Then, create your own conda
environment:
After this, you might need to re-login! Otherwise, you might encounter the CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'
error.
After you re-login, you can activate your new environment:
You should see the environment activated, which is indicated by the prefix to the login prompt:
Now you can install your own conda
packages:
To de-activate your environment, simply issue the following:
To remove your environment, issue the following:
To list all environments, issue the following:
Warning
Even though conda activate [my-conda-env]
works on the login node prompt, it might fail on the worker nodes.
The problem is that conda init
adds the path of the currently active conda
installation to your .bashrc
, which is probably not what you want, as the conda
might change depending on whether you are in compute
or gpu
mode. And it might not actually work on worker nodes.
It may be best to avoid conda init
altogether and directly call conda.sh
that comes with the installed version. That can be done with the following command, which calls conda.sh
directly by extracting the long string from conda info
):
Why unset
is necessary before conda init
If there are multiple versions/instances of conda
on the system, the PATH
may end up being resolved to the wrong python executable when running conda activate
. To avoid this, unsetting the conda is required before activating your environment.
After running this command, conda activate
works on all nodes.
Warning
miniconda3
might conflict with the vendor-preinstalled git
!!! To avoid this conflict, load the new openssh
and git
modules from the DelftBlue software stack!
Python only prints STDOUT in a file after the job is finished¶
Example situation: I am running a Python code that contains print statements via Slurm
. Normally when I run the Python code directly via python program.py
the print statements appear in the terminal. When I run my program via Slurm
, the print statements are written either in the output file specified in the submission script, or in the slurm-XXX.out
. However, sometimes the contents of the slurm-XXX.out
only appear after the job is actually finished, and not during the run, as I would expect.
This behaviour has to do with the buffering of the python's print
command. You can either use the flush=True
statement in the print
command to flush the buffer to force the output to be printed:
Or, if you know what you are doing, you can run python unbuffered: