-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Hi,
I am trying to run the package on LUMI, and it requires python 3.12 so I'm using a container image for that (/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.1-python-3.12-pytorch-20240918-vllm-4075b35.sif).
But I run into issues running batch jobs from inside the container.
I try to run schedule-eval while binding the slurm commands
singularity exec -B /usr/bin/ $SIF bash -c '$WITH_CONDA && source venv/bin/activate && oellm schedule-eval \
--models "microsoft/DialoGPT-medium,EleutherAI/pythia-160m" \
--tasks "hellaswag,mmlu" \
--n_shot 5'
But I get the error
ERROR Failed to submit job: Command '['sbatch']' returned non-zero exit status 127.
ERROR sbatch stderr: sbatch: error while loading shared libraries: libslurmfull.so: cannot open shared object
file: No such file or directory
without -B /usr/bin/ I get an squeue error FileNotFoundError: [Errno 2] No such file or directory: 'squeue'
What can be done here? There are no modules on LUMI containing python 3.12 as far as I know.