A production-grade model serving platform focused on high throughput, low latency, and efficient resource utilization.
Client
β
API Gateway / Router (Go) - Week 2
β
Scheduler / Batcher (Go) - Week 3
β
Model Worker Pool (Python) - Week 4
β
GPU / CPU (mock or real)
router/: Go-based HTTP/gRPC entry point. Handles admission control.scheduler/: Go-based core logic for dynamic batching and priority queuing.worker/: Python-based inference worker. Interfaces with ML models (PyTorch/mock).proto/: Protobuf definitions for internal service communication.deploy/: Docker and orchestration configurations.
-
Go Setup:
cd serving-platform go mod tidy -
Python Setup:
cd serving-platform/worker pip install -r requirements.txt -
Generate Proto (Optional):
protoc --go_out=. --go-grpc_out=. proto/serving.proto