load-balancer

Overview

This project is inspired by a common real-world scenario in which a store employee is responsible for directing customers to checkout lines after shopping. Each customer carries a shopping cart with a varying number of items, and the employee must decide which checkout counter they should join to minimize overall waiting time and improve service efficiency.

In this analogy:

The store employee functions as the agent.
Each customer corresponds to a job.
The number of items in the cart represents the estimated workload of that job.
Each checkout counter is modeled as a server queue.
Importantly, the agent has access only to noisy workload estimates rather than the actual service time.

In practice, humans often rely on simple heuristics such as assigning customers to the queue with the lowest estimated workload. This project investigates how a Deep Reinforcement Learning (RL) agent can learn a more effective job dispatching policy, potentially outperforming such intuitive strategies in uncertain environments.

Model

Apply Double DQN to stablize and reduce Q-value overestimation during learning.
The model is trained implementing Stable-Baselines3.
Action space: selects server's number, defers.

Data Preparation

The environment generates job data based on Gaussian (normal) distributions to simulate real-world uncertainty in workload estimation.
When creating a new job, the estimated workload is first sampled from a Gaussian distribution, where the random variable is the estimated workload itself. This value serves as an approximation of the actual job size and is passed to initialize a job object. (estimated-workload ~ N(mean=25, std=8.0))
The actual workload, which is unknown to the agent, is then sampled from another Gaussian distribution centered around the estimated workload. In this case, the estimated workload acts as the mean, and a predefined standard deviation controls the noise level. This models real-world scenarios where estimation errors occur due to incomplete or noisy information. (actual-workload ~ N(mean=estimated-workload, std=5.0))
Both estimated and actual workloads are clipped to remain within realistic bounds, ensuring job feasibility.

Installation

To install your new environment, run the following commands:

cd job-simulator-env
pip install -e .

To install your dependencies:

pip install -r requirements.txt

Evaluation

To assess the performance of the trained RL agent, we compare its throughput defined as the number of jobs completed per episode against a strong heuristic baseline: the Estimated Earliest Finish Time (EEFT) policy.
Both the RL agent and the EEFT policy are evaluated on identical job arrival sequences. This is achieved by synchronizing the random seed during environment reset, ensuring a fair and consistent comparison.
For each test episode:
- The EEFT policy selects a server based on the estimated finish time of each queue and dispatches jobs accordingly. (simulates human's behaviour)
- The RL agent predicts its actions using the trained best_model checkpoint.
- Throughput (i.e., total completed jobs) is recorded for both algorithms.
The evaluation loop runs for a fixed number of test episodes (e.g., 10000). After each episode, the throughput of both methods is:
- Printed to the terminal for immediate visibility.
- Plotted in real time using matplotlib to visualize performance trends across episodes.
At the end of evaluation, the average throughput is computed and reported for both algorithms. The algorithm with higher average throughput is declared the winner.

This comparative setup enables a quantitative and visual analysis of how well the RL agent generalizes beyond its training episodes and whether it outperforms or underperforms compared to a hand-crafted heuristic.

Result:

Running with Docker

Note: The GHCR packages (cloud-scheduling) use the project’s old name; they remain functionally identical to this repo.

This image provides an environment to run training and evaluation scripts: Pull this image:

docker pull ghcr.io/johnnyau19/cloud-scheduling:v1

Execute:

docker run --rm -it ghcr.io/johnnyau19/cloud-scheduling:v1

Inside the container:

To evaluate the model and compare with EEFT policy, run:

python3 ./evaluate.py

To see latency comparison, please run:

python3 ./benchmark_runtime.py

To train the model:

python3 ./train.py

This image containerizes the FastAPI server for exposing the scheduling service via HTTP. To pull the image:

docker pull ghcr.io/johnnyau19/cloud-scheduling:v2

Execute:

docker run --rm -p 8080:8000 -it ghcr.io/johnnyau19/cloud-scheduling:v2

Access the API: http://127.0.0.1:8080/docs

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
baseline		baseline
images		images
job_simulator_env		job_simulator_env
model		model
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
benchmark_runtime.py		benchmark_runtime.py
dockerfile		dockerfile
evaluate.py		evaluate.py
export_model_onnx.py		export_model_onnx.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
server.py		server.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

load-balancer

Table of Contents

Overview

Model

Data Preparation

Installation

Evaluation

Running with Docker

About

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

load-balancer

Table of Contents

Overview

Model

Data Preparation

Installation

Evaluation

Running with Docker

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages