Grid-X is a decentralized compute network platform enabling distributed machine learning training across a network of worker nodes.
The system consists of three main components: Backend, Worker, and Frontend.
graph TD
User((User)) -->|Submits Job| Frontend
Frontend -->|API Request| Backend
Backend -->|Stores Metadata| DB[(SQLite Database)]
Backend -->|Uploads Files| Storage[Supabase Storage]
subgraph Compute Network
Worker1[Worker Node 1]
Worker2[Worker Node 2]
end
Worker1 -->|Polls for Task| Backend
Worker1 -->|Downloads Data| Storage
Worker1 -->|Uploads Result| Storage
Backend -->|Aggregates Results| Storage
The central nervous system of Grid-X. It handles authentication, job management, and orchestration.
- Language: Python 3.10+
- Framework: FastAPI (High-performance Async Framework)
- Database: SQLite (
Grid-X.db) via SQLAlchemy ORM - Storage: Supabase (Object Storage for large files like datasets and models)
app/main.py: Entry point, CORS config.app/models.py: Database schema (Users, Agents, Jobs, Subtasks).app/routers/agent.py: API for Workers (Heartbeat, Task Request, Result Upload).app/routers/front_job.py: API for Frontend (Job Submission, Status).app/aggregation.py: Federated Averaging logic (Pytorch-based).
- REST API: Exposes HTTP endpoints (
/agent/...,/jobs/...). - Polling: Does not push to workers; relies on workers polling for tasks.
- Sync/Async: Uses standard synchronous DB calls but runs on
uvicorn(ASGI).
Isolates and executes code securely. Can run on any machine (Laptop, Server, Raspberry Pi).
- Language: Python 3.11
- Environment: Docker (for sandboxing)
- Dependencies:
requests,docker,torch
- Registration: On startup, registers with Backend via
POST /agent/register. - Heartbeat: Sends
POST /agent/heartbeatevery 5 seconds to say "I'm alive". - Polling: Asks
POST /agent/request_taskevery 10 seconds. - Execution:
- Downloads Code (
train.py) and Data (data.csv). - Builds/Runs a Docker Container (
secure-executor-base). - Mounts a temporary volume.
- Runs
python train.pyinside the container.
- Downloads Code (
- Reporting: Uploads
model.pthand callsPOST /agent/complete_task.
- Outbound Only: Does not require open firewall ports. Connects OUT to Backend.
- Configuration: Controlled via
worker_config.env(Backend URL, Email).
User interface for submitting jobs and viewing progress.
- Framework: Next.js 16 (React Framework)
- Language: TypeScript / JavaScript
- UI Library: React 19
- Location:
grid-x/packages/dashboard(Monorepo structure)
-
Job Submission:
- User uploads
train.py,requirements.txt,data.csv. - Frontend sends to Backend.
- Backend uploads files to Storage and creates
Jobrecord.
- User uploads
-
Task Dispatch:
- Worker asks for work.
- Backend checks DB for
PENDINGsubtasks. - Backend assigns task to Worker.
-
Execution & Result:
- Worker processes data.
- Worker uploads
model.pthto Storage. - Worker notifies Backend.
-
Aggregation (FedAvg):
- When all subtasks are complete, Backend triggers
aggregation.py. - Backend downloads all
model.pthfiles. - Backend averages weights (Federated Learning).
- Backend saves
final_model.pth.
- When all subtasks are complete, Backend triggers
cd backend
pip install -r requirements.txt
uvicorn app.main:app --reloadRefer to [WORKER_SETUP.md] for detailed instructions.
- Move
gridx-worker.tar.gzto device. - Extract and run
./setup_worker.sh. - Run
./start_worker.sh.
cd grid-x/packages/dashboard
npm install
npm run dev