Ray cluster¶
How to run a ray cluster on DelftBlue¶
1. Install ray
:
Install conda
and install ray
in the conda
environment using pip
.
2. Create an executable script conda-run.sh
to be used later for correct loading of the conda
environment:
#!/bin/bash
# Activate a conda environment on a node and run a command
# This script takes two positional arguments:
# 1) conda environment name
# 2) command to execute
export LC_ALL=C.UTF-8
export LANG=C.UTF-8
# dirty hack: https://github.com/conda/conda/issues/9392#issuecomment-617345019
unset CONDA_SHLVL
source "$(conda info --base)""/etc/profile.d/conda.sh"
conda activate "$1"
# Use this to suppress error codes that may interrupt the job or interfere with reporting.
# Do show the user that an error code has been received and is being suppressed.
# see https://stackoverflow.com/questions/11231937/bash-ignoring-error-for-a-particular-command
eval "${@:2}" || echo "Exit with error code $? (suppressed)"
3. Create another executable script ray-on-conda-on-slurm.sh
to be later used from the slurm
script to launch a ray server:
#!/bin/bash
# This script is meant to be called from a SLURM job file.
# It launches a ray server on each compute node and starts a python script on the head node.
# It is assumed that a conda environment is required to activate ray.
# The script has four command line options:
# --python_runfile="..." name of the python file to run
# --python_arguments="..." optional arguments to the python script
# --conda_env="..." name of the conda environment (default: base)
# --rundir="..." name of the working directory (default: current)
# --temp_dir="..." location to store ray temporary files (default: /tmp/ray)
# Author: Simon Tindemans, s.h.tindemans@tudelft.nl
# Version: 20 April 2022
#
# Ray-on-SLURM instruction and scripts used for inspiration:
# https://docs.ray.io/en/latest/cluster/slurm.html
# https://github.com/NERSC/slurm-ray-cluster
# https://github.com/pengzhenghao/use-ray-with-slurm
# set defaults
python_runfile="MISSING"
python_arguments=""
conda_env="base"
rundir="."
temp_dir="/tmp/ray"
# parse command line parameters
# use solution adapted from https://unix.stackexchange.com/questions/129391/passing-named-arguments-to-shell-scripts
for argument in "$@"
do
key=$(echo ${argument} | cut -f1 -d=)
key_length=${#key}
value="${argument:${key_length}+1}"
declare ${key#"--"}=${value}
done
# Abort if no main python file name is given
if [[ ${python_runfile} == "MISSING" ]]
then
echo "Missing python_runfile option. Aborting."
exit
fi
if [[ $RAY_TIMELINE == "1" ]]
then
echo "RAY PROFILING MODE ENABLED"
fi
# generate password for node-to-node communication
redis_password=$(uuidgen)
export redis_password
# get node names and identify IP addresses
nodes=$(scontrol show hostnames "$SLURM_JOB_NODELIST")
nodes_array=($nodes)
node_1=${nodes_array[0]}
ip=$(srun --nodes=1 --ntasks=1 -w "$node_1" hostname --ip-address) # making redis-address
# if we detect a space character in the head node IP, we'll
# convert it to an ipv4 address. This step is optional.
if [[ "$ip" == *" "* ]]; then
IFS=' ' read -ra ADDR <<< "$ip"
if [[ ${#ADDR[0]} -gt 16 ]]; then
ip=${ADDR[1]}
else
ip=${ADDR[0]}
fi
echo "IPV6 address detected. We split the IPV4 address as $ip"
fi
# set up head node
port=6379
ip_head=$ip:$port
export ip_head
echo "IP Head: $ip_head"
# launch head node, leaving one core unused for the main python script
echo "STARTING HEAD at $node_1"
srun --job-name="ray-head" --nodes=1 --ntasks=1 --cpus-per-task=$((SLURM_CPUS_PER_TASK-1)) -w "$node_1" \
conda-run.sh "$conda_env" \
"ray start --head --temp-dir="${temp_dir}" --include-dashboard=false --num-cpus=$((SLURM_CPUS_PER_TASK-1)) --node-ip-address=$ip --port=$port --redis-password=$redis_password --block" &
sleep 10 # was sleep 30
# launch worker nodes
worker_num=$((SLURM_JOB_NUM_NODES - 1)) #number of nodes other than the head node
for ((i = 1; i <= worker_num; i++)); do
node_i=${nodes_array[$i]}
echo "STARTING WORKER $i at $node_i"
srun --job-name="ray-worker" --nodes=1 --ntasks=1 -w "$node_i" \
conda-run.sh "$conda_env" \
"ray start --temp-dir="${temp_dir}" --num-cpus=$SLURM_CPUS_PER_TASK --address=$ip_head --redis-password=$redis_password --block" &
sleep 5
done
# export RAY_ADDRESS, so that ray can be initialised using ray.init(), without address
RAY_ADDRESS=$ip_head
export RAY_ADDRESS
# launch main program file on a single core. Wait for it to exit
srun --job-name="main" --unbuffered --nodes=1 --ntasks=1 --cpus-per-task=1 -w "$node_1" \
conda-run.sh "$conda_env" \
"ray status; cd ${rundir} ; python -u ${python_runfile}"
# if RAY_TIMELINE == 1, save the timeline
if [[ $RAY_TIMELINE == "1" ]]
then
srun --job-name="timeline" --nodes=1 --ntasks=1 --cpus-per-task=1 -w "$node_1" \
conda-run.sh "$conda_env" \
"ray timeline"
cp -n /tmp/ray-timeline-* ${temp_dir}
fi
# stop the ray cluster
srun --job-name="ray-stop" --nodes=1 --ntasks=1 --cpus-per-task=1 -w "$node_1" \
conda-run.sh "$conda_env" \
"ray stop"
# wait for everything so we don't cancel head/worker jobs that have not had time to clean up
wait
4. Finally, prepare a slurm
submission script slurm-submit.sbatch
:
#!/bin/bash
# shellcheck disable=SC2206
#SBATCH --job-name=ray-job
#SBATCH --output=ray-job-%j.log
#SBATCH --partition=compute
### Select number of nodes
#SBATCH --nodes=1
### Always run exclusive to avoid issues with conflicting ray servers (needs fixing in the future)
#SBATCH --exclusive
#SBATCH --cpus-per-task=48
#SBATCH --mem-per-cpu=2GB
### Give all resources to a single Ray task, ray can manage the resources internally
#SBATCH --ntasks-per-node=1
#SBATCH --gpus-per-task=0
#SBATCH --time=01:00:00
# Load Delft Blue modules here
# NOTE: git and openssh are included to not break dependencies
# NOTE: python is not required if it is included in the conda environment.
# If python is needed, it must be loaded before miniconda to avoid issues with conda activation (module conda not found within internal python calls)
module load 2022r2 miniconda3 git openssh
# Option: save timeline info for debugging (yes="1", no="0"); see https://docs.ray.io/en/latest/ray-core/troubleshooting.html
declare -x RAY_TIMELINE="0"
# Adapt this to your job requirements
. ray-on-conda-on-slurm.sh \
--python_runfile="runstorage_ray.py" \
--python_arguments="" \
--conda_env="raytest" \
--rundir="../code" \
--temp_dir="/scratch/${USER}/ray"
5. In your main python script invoked by the slurm
script above, use the following lines:
The use of the RAY_ADDRESS
environment variable in the slurm
script means there is no need to point it to the right address.
Some notes on what this script does:
- Launch a ray server on each node as a single task with the assigned number of cores.
- On the head node, reserve 1 core for the main script.
- Launch a python script on the head node (only!). This script dispatches tasks to all worker processes.
- Clean up when done. It needs some hacks to suppress error codes as ray stop does not exit nicely (error codes 15 or -15).
Warning: the ray server needs additional configuration for multiple ray servers to run on a single node. With the current script, running two jobs on 12 cores each is likely to cause issues, as the same node may be used and port assignments may clash.