From 6672588d9368e73e22a957285cd689a62fbffd30 Mon Sep 17 00:00:00 2001 From: Francois Caud Date: Tue, 25 Nov 2025 16:13:42 +0100 Subject: [PATCH] DOC add gpu compute worker setup guide --- codabench_gpu_worker.md | 215 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 215 insertions(+) create mode 100644 codabench_gpu_worker.md diff --git a/codabench_gpu_worker.md b/codabench_gpu_worker.md new file mode 100644 index 0000000..f5e1a19 --- /dev/null +++ b/codabench_gpu_worker.md @@ -0,0 +1,215 @@ +# Attaching a GPU Compute Worker on AWS to a Codabench Competition + +This guide explains how to set up and attach a **GPU-enabled Codabench compute worker** running on an **AWS EC2 instance**. + +--- + +## 1. Overview + +Codabench uses **compute workers** to execute competition submissions. +Each worker connects to a **broker URL** (defined in your Codabench admin panel) and runs submissions inside **competition Docker images**. + +For GPU-based evaluations, workers must: +- Run on GPU-capable hardware (e.g. AWS `g4dn.xlarge`, `g5.xlarge`, or `p3.2xlarge`) +- Have NVIDIA drivers and CUDA-compatible Docker support +- Use the image `codalab/competitions-v2-compute-worker:gpu1.3` or a custom image based on it. + +--- + +## 2. Launch an AWS EC2 GPU Instance + +1. Open **AWS EC2 Console** → “Launch instance” +2. Choose an **Ubuntu 24.04 LTS** (64-bit x86) AMI +3. Select a **GPU instance type**, e.g.: + - `g4dn.xlarge` (Tesla T4) + - `g5.xlarge` (A10G) +4. Configure: + - Storage: at least **50 GB** + - Key pair: securely connect to instance + - Security group: allow inbound **SSH (port 22)** +5. Launch the instance and connect via SSH (you can find 'ec2-public-ip' in instance summary->Public DNS): + ```bash + ssh -i your-key.pem ubuntu@ + ``` + + +## 3. Install NVIDIA GPU Drivers + +```bash +sudo apt update && sudo apt upgrade -y +sudo apt install -y nvidia-driver-580 +sudo reboot +``` +After reboot: +```bash +nvidia-smi +``` +You should see the GPU (e.g. Tesla T4) + + +## 4. Install Docker + +Set up Docker's apt repository: +```bash +# Add Docker's official GPG key: +sudo apt update +sudo apt install ca-certificates curl +sudo install -m 0755 -d /etc/apt/keyrings +sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc +sudo chmod a+r /etc/apt/keyrings/docker.asc + +# Add the repository to Apt sources: +sudo tee /etc/apt/sources.list.d/docker.sources < +BROKER_USE_SSL=True +HOST_DIRECTORY=/home/ubuntu/codabench +``` +The broker URL corresponding to the selected queue can be found on Codabench website under Admin panel->My Resources->Queues and then Actions->Copy Broker URL. The queue is to be specified in the Edit section of the competition on Codabench. + + +## 7. Create ```docker-compose.yml``` for the GPU Worker + +Inside ```/home/ubuntu/codabench```, create a file named ```docker-compose.yml``` and copy: +```yaml +# Codabench GPU worker (NVIDIA) +services: + worker: + image: codalab/competitions-v2-compute-worker:gpu1.3 + container_name: compute_worker + volumes: + - /home/ubuntu/codabench:/codabench + - /var/run/docker.sock:/var/run/docker.sock + env_file: + - .env + restart: unless-stopped + #hostname: ${HOSTNAME} + logging: + options: + max-size: 50m + max-file: 3 + deploy: + resources: + reservations: + devices: + - driver: nvidia + count: all + capabilities: + - gpu +``` + + +## 8. Start the GPU Worker + +From ```/home/ubuntu/codabench```: +```bash +docker compose up -d +``` +Check running containers: +```bash +docker ps +``` +Check logs of the ```compute_worker``` container: +```bash +docker logs -f compute_worker +``` +The compute worker container should be ready to receive submissions from Codabench. + +## 9. Stop compute worker instance + +First stop the container: +```bash +docker compose down +``` +Then stop the EC2 instance from AWS console. +