Skip to content

Ray cluster

How to run a ray cluster on DelftBlue

1. Install ray:

Install conda and install ray in the conda environment using pip.

2. Create an executable script conda-run.sh to be used later for correct loading of the conda environment:

#!/bin/bash
# Activate a conda environment on a node and run a command
# This script takes two positional arguments:
# 1) conda environment name
# 2) command to execute

export LC_ALL=C.UTF-8
export LANG=C.UTF-8

# dirty hack: https://github.com/conda/conda/issues/9392#issuecomment-617345019 
unset CONDA_SHLVL

source "$(conda info --base)""/etc/profile.d/conda.sh"
conda activate "$1"

# Use this to suppress error codes that may interrupt the job or interfere with reporting.
# Do show the user that an error code has been received and is being suppressed.
# see https://stackoverflow.com/questions/11231937/bash-ignoring-error-for-a-particular-command 
eval "${@:2}" || echo "Exit with error code $? (suppressed)"

3. Create another executable script ray-on-conda-on-slurm.sh to be later used from the slurm script to launch a ray server:

#!/bin/bash
# This script is meant to be called from a SLURM job file. 
# It launches a ray server on each compute node and starts a python script on the head node.
# It is assumed that a conda environment is required to activate ray.

# The script has four command line options:
# --python_runfile="..." name of the python file to run
# --python_arguments="..." optional arguments to the python script
# --conda_env="..." name of the conda environment (default: base)
# --rundir="..." name of the working directory (default: current)
# --temp_dir="..." location to store ray temporary files (default: /tmp/ray)

# Author: Simon Tindemans, s.h.tindemans@tudelft.nl
# Version: 20 April 2022
#
# Ray-on-SLURM instruction and scripts used for inspiration:
# https://docs.ray.io/en/latest/cluster/slurm.html
# https://github.com/NERSC/slurm-ray-cluster
# https://github.com/pengzhenghao/use-ray-with-slurm

# set defaults
python_runfile="MISSING"
python_arguments=""
conda_env="base"
rundir="."
temp_dir="/tmp/ray"

# parse command line parameters
# use solution adapted from https://unix.stackexchange.com/questions/129391/passing-named-arguments-to-shell-scripts 
for argument in "$@"
do
  key=$(echo ${argument} | cut -f1 -d=)

  key_length=${#key}
  value="${argument:${key_length}+1}"

  declare ${key#"--"}=${value}
done

# Abort if no main python file name is given
if [[ ${python_runfile} == "MISSING" ]]
then
  echo "Missing python_runfile option. Aborting."
  exit  
fi

if [[ $RAY_TIMELINE == "1" ]]
then
echo "RAY PROFILING MODE ENABLED"
fi

# generate password for node-to-node communication
redis_password=$(uuidgen)
export redis_password

# get node names and identify IP addresses
nodes=$(scontrol show hostnames "$SLURM_JOB_NODELIST")
nodes_array=($nodes)

node_1=${nodes_array[0]}
ip=$(srun --nodes=1 --ntasks=1 -w "$node_1" hostname --ip-address) # making redis-address

# if we detect a space character in the head node IP, we'll
# convert it to an ipv4 address. This step is optional.
if [[ "$ip" == *" "* ]]; then
  IFS=' ' read -ra ADDR <<< "$ip"
  if [[ ${#ADDR[0]} -gt 16 ]]; then
    ip=${ADDR[1]}
  else
    ip=${ADDR[0]}
  fi
  echo "IPV6 address detected. We split the IPV4 address as $ip"
fi

# set up head node
port=6379
ip_head=$ip:$port
export ip_head
echo "IP Head: $ip_head"

# launch head node, leaving one core unused for the main python script
echo "STARTING HEAD at $node_1"
srun --job-name="ray-head" --nodes=1 --ntasks=1 --cpus-per-task=$((SLURM_CPUS_PER_TASK-1)) -w "$node_1" \
    conda-run.sh "$conda_env" \
    "ray start --head --temp-dir="${temp_dir}" --include-dashboard=false --num-cpus=$((SLURM_CPUS_PER_TASK-1)) --node-ip-address=$ip --port=$port --redis-password=$redis_password --block"  &
sleep 10  # was sleep 30

# launch worker nodes
worker_num=$((SLURM_JOB_NUM_NODES - 1)) #number of nodes other than the head node
for ((i = 1; i <= worker_num; i++)); do
  node_i=${nodes_array[$i]}
  echo "STARTING WORKER $i at $node_i"
  srun --job-name="ray-worker" --nodes=1 --ntasks=1 -w "$node_i" \
    conda-run.sh "$conda_env" \
    "ray start --temp-dir="${temp_dir}" --num-cpus=$SLURM_CPUS_PER_TASK --address=$ip_head --redis-password=$redis_password --block" &
  sleep 5
done

# export RAY_ADDRESS, so that ray can be initialised using ray.init(), without address
RAY_ADDRESS=$ip_head
export RAY_ADDRESS

# launch main program file on a single core. Wait for it to exit
srun --job-name="main" --unbuffered --nodes=1 --ntasks=1 --cpus-per-task=1 -w "$node_1" \
    conda-run.sh "$conda_env" \
    "ray status; cd ${rundir} ; python -u  ${python_runfile}" 

# if RAY_TIMELINE == 1, save the timeline
if [[ $RAY_TIMELINE == "1" ]]
then
srun --job-name="timeline" --nodes=1 --ntasks=1 --cpus-per-task=1 -w "$node_1" \
    conda-run.sh "$conda_env" \
    "ray timeline"
cp -n /tmp/ray-timeline-* ${temp_dir}
fi

# stop the ray cluster
srun --job-name="ray-stop" --nodes=1 --ntasks=1 --cpus-per-task=1 -w "$node_1" \
    conda-run.sh "$conda_env" \
    "ray stop"

# wait for everything so we don't cancel head/worker jobs that have not had time to clean up
wait

4. Finally, prepare a slurm submission script slurm-submit.sbatch:

#!/bin/bash
# shellcheck disable=SC2206
#SBATCH --job-name=ray-job
#SBATCH --output=ray-job-%j.log
#SBATCH --partition=compute
### Select number of nodes
#SBATCH --nodes=1
### Always run exclusive to avoid issues with conflicting ray servers (needs fixing in the future)
#SBATCH --exclusive
#SBATCH --cpus-per-task=48
#SBATCH --mem-per-cpu=2GB
### Give all resources to a single Ray task, ray can manage the resources internally
#SBATCH --ntasks-per-node=1
#SBATCH --gpus-per-task=0
#SBATCH --time=01:00:00

# Load Delft Blue modules here
# NOTE: git and openssh are included to not break dependencies 
# NOTE: python is not required if it is included in the conda environment. 
# If python is needed, it must be loaded before miniconda to avoid issues with conda activation (module conda not found within internal python calls)
module load 2022r2 miniconda3 git openssh 

# Option: save timeline info for debugging (yes="1", no="0"); see https://docs.ray.io/en/latest/ray-core/troubleshooting.html
declare -x RAY_TIMELINE="0"

# Adapt this to your job requirements
. ray-on-conda-on-slurm.sh \
    --python_runfile="runstorage_ray.py" \
    --python_arguments="" \
    --conda_env="raytest" \
    --rundir="../code" \
    --temp_dir="/scratch/${USER}/ray"

5. In your main python script invoked by the slurm script above, use the following lines:

import ray
ray.init()

The use of the RAY_ADDRESS environment variable in the slurm script means there is no need to point it to the right address.

Some notes on what this script does:

  • Launch a ray server on each node as a single task with the assigned number of cores.
  • On the head node, reserve 1 core for the main script.
  • Launch a python script on the head node (only!). This script dispatches tasks to all worker processes.
  • Clean up when done. It needs some hacks to suppress error codes as ray stop does not exit nicely (error codes 15 or -15).

Warning: the ray server needs additional configuration for multiple ray servers to run on a single node. With the current script, running two jobs on 12 cores each is likely to cause issues, as the same node may be used and port assignments may clash.