Rift through operational complexity
Autonomous infrastructure orchestration and remediation powered by DigitalOcean Gradient AI + Model Context Protocol.
This is a Vibe-coded project, I just made out of fun in a hackathon weekend as I like the idea.
Opening rifts to create, closing rifts to fix - all at machine speed.
- What is Rift?
- Project Structure
- Frontend Setup
- Backend Setup
- Running the Complete System
- Design System
- Demo Guide
- Architecture
- Tech Stack
Rift is an AI-powered infrastructure orchestrator that autonomously detects, diagnoses, and fixes infrastructure issues using multi-agent systems and the Model Context Protocol (MCP).
- DevOps engineers spend 40% of their time on routine infrastructure incidents
- Average incident response time: 2-4 hours
- Manual fixes are error-prone and inconsistent
- On-call fatigue leads to burnout
FixBot uses three specialized AI agents:
- Monitor Agent - Detects issues in seconds via DigitalOcean MCP + Prometheus
- Diagnostic Agent - Uses RAG to analyze root causes from knowledge base
- Remediation Agent - Fixes problems automatically via Terraform + MCP
Result: Incident response time reduced from hours to ~90 seconds, fully autonomous.
fixbot/
β
βββ frontend/ # π¨ Next.js Dashboard (Pre-built)
β βββ app/
β β βββ page.tsx # Main dashboard
β β βββ layout.tsx # Root layout
β β βββ globals.css # Dark punk theme styles
β β βββ favicon.ico
β β
β βββ components/
β β βββ ui/ # shadcn/ui components
β β β βββ button.tsx
β β β βββ card.tsx
β β β βββ badge.tsx
β β β
β β βββ StatusCard.tsx # Agent status display
β β βββ IncidentFeed.tsx # Real-time event stream
β β βββ AgentStatus.tsx # Agent health monitor
β β βββ MetricsChart.tsx # System metrics visualization
β β βββ TraceViewer.tsx # AI decision traceability
β β βββ Terminal.tsx # Terminal-style output
β β
β βββ lib/
β β βββ api.ts # Backend API client
β β βββ websocket.ts # WebSocket connection
β β βββ utils.ts # Utility functions
β β
β βββ public/
β β βββ fixbot-logo.svg
β β
β βββ package.json
β βββ tsconfig.json
β βββ tailwind.config.ts # Dark punk theme
β βββ next.config.js
β βββ .env.local # Configure this!
β βββ README.md
β
βββ backend/ # π§ Python Backend (Build this!)
βββ agents/
β βββ base_agent.py # Base agent class
β βββ monitor_agent.py # Monitoring logic
β βββ diagnostic_agent.py # Diagnosis with RAG
β βββ remediation_agent.py # Auto-remediation
β
βββ mcp_clients/
β βββ do_mcp.py # DigitalOcean MCP
β βββ terraform_mcp.py # Terraform MCP
β βββ prometheus_mcp.py # Custom Prometheus MCP
β
βββ orchestrator/
β βββ coordinator.py # Agent coordination
β
βββ models/
β βββ incident.py # Pydantic models
β
βββ terraform/
β βββ main.tf
β βββ modules/
β
βββ demo/
β βββ failure_injection.py # Demo scenarios
β
βββ knowledge-base/
β βββ do-docs.md
β βββ runbooks.md
β βββ past-incidents.json
β
βββ main.py # FastAPI backend
βββ requirements.txt
βββ .env # Configure this!
βββ README.md
Frontend (fixbot/frontend/):
- β
Pre-built and ready to use - Just run
npm installand configure.env.local - Next.js 14+ with App Router
- Real-time dashboard with WebSocket updates
- Dark Punk Professional Theme - Cyberpunk aesthetics meets Bloomberg Terminal
- TypeScript + Tailwind CSS + shadcn/ui
- Minimal configuration needed
Backend (fixbot/backend/):
β οΈ You build this during the hackathon- Python FastAPI application
- AI agents (Monitor, Diagnostic, Remediation)
- MCP server integrations
- WebSocket server for real-time updates
- Infrastructure as Code (Terraform)
The frontend is pre-built with a professional dark punk theme. You just need to install and configure it.
- Node.js 18+ and npm 9+
- A running backend API (see Backend Setup)
# Navigate to frontend directory
cd fixbot/frontend
# Install dependencies
npm install
# Configure environment variables
cp .env.example .env.localEdit .env.local:
# Backend API endpoint
NEXT_PUBLIC_API_URL=http://localhost:8000
# WebSocket endpoint
NEXT_PUBLIC_WS_URL=ws://localhost:8000/ws# Run development server
npm run dev
# Open browser
open http://localhost:3000- Framework: Next.js 14 (App Router)
- Language: TypeScript
- Styling: Tailwind CSS
- Components: shadcn/ui
- State: React Hooks
- Real-time: WebSocket client
- API Client: Fetch API with error handling
The backend is what you'll build during the hackathon.
- Python 3.11+
- DigitalOcean account with API token
- Gradient AI Platform access
- Terraform installed
- Docker (for MCP servers)
# Navigate to backend directory
cd fixbot/backend
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Configure environment variables
cp .env.example .envEdit .env:
# DigitalOcean
DO_API_TOKEN=your_do_token_here
DO_SPACES_KEY=your_spaces_key
DO_SPACES_SECRET=your_spaces_secret
# Gradient AI
GRADIENT_AI_API_KEY=your_gradient_key
MONITOR_AGENT_ID=agent_xxx
DIAGNOSTIC_AGENT_ID=agent_yyy
REMEDIATION_AGENT_ID=agent_zzz
# MCP Servers
DO_MCP_URL=http://localhost:3000
TERRAFORM_MCP_URL=http://localhost:3001
PROMETHEUS_URL=http://your-prometheus:9090# Start MCP servers (in separate terminals)
# See MCP Integration section in full guide
# Run FastAPI backend
python main.py
# Backend should start on http://localhost:8000- Framework: FastAPI
- AI Platform: DigitalOcean Gradient AI
- Protocol: Model Context Protocol (MCP)
- IaC: Terraform
- Monitoring: Prometheus
- Language: Python 3.11+
Terminal 1: Backend API
cd fixbot/backend
source venv/bin/activate
python main.py
# Runs on http://localhost:8000Terminal 2: Frontend Dashboard
cd fixbot/frontend
npm run dev
# Runs on http://localhost:3000Terminal 3: Monitor Logs (Optional)
cd fixbot/backend
tail -f logs/fixbot.log# Check backend health
curl http://localhost:8000/agents/health
# Check frontend loads
curl http://localhost:3000
# Open dashboard in browser
open http://localhost:3000 # macOS
xdg-open http://localhost:3000 # LinuxYou should see:
- β All three agent status cards showing "Active" with green indicators
- β System metrics displaying normal values
- β Live connection indicator showing "Connected"
- β Empty incident feed (no incidents yet)
"Professional Cyberpunk" - The aesthetic of a high-tech operations center. Think: Blade Runner meets Bloomberg Terminal. Dark, sleek, with neon accents that convey urgency and precision.
/* Background & Surfaces */
--background: #0a0e17; /* Deep space black */
--surface: #111827; /* Card/panel background */
--surface-elevated: #1f2937; /* Elevated elements */
/* Brand Colors (Neon Accents) */
--primary: #00ff9f; /* Neon green - success/active */
--secondary: #00d4ff; /* Cyber blue - info */
--accent: #ff00ff; /* Neon magenta - alerts */
/* Status Colors */
--success: #00ff9f; /* Neon green */
--warning: #ffaa00; /* Electric amber */
--danger: #ff3366; /* Hot pink red */
/* Text */
--text-primary: #e5e7eb; /* Almost white */
--text-secondary: #9ca3af; /* Muted gray */
--text-muted: #6b7280; /* Very muted */
/* Borders */
--border: #1f2937; /* Subtle borders */
--border-bright: #374151; /* Highlighted borders */Fonts:
- Headers:
"JetBrains Mono"or"Space Mono"(monospace, technical feel) - Body:
"Inter"or"DM Sans"(clean, readable) - Code/Terminal:
"Fira Code"or"Cascadia Code"(with ligatures)
Guidelines:
- Use UPPERCASE for labels and status indicators
- Use monospace for anything technical (IDs, timestamps, metrics)
- Use medium-large sizes for important info (remember: projector demo!)
- Use color to convey meaning (green = good, red = critical, blue = info)
<Card className="bg-[#111827] border border-[#1f2937] hover:border-[#00ff9f] transition-all">
<div className="flex items-center gap-3">
{/* Active indicator - pulsing green dot */}
<div className="h-2 w-2 rounded-full bg-[#00ff9f] animate-pulse" />
{/* Agent name - monospace, uppercase, neon green */}
<span className="text-[#00ff9f] font-mono uppercase tracking-wider">
Monitor Agent
</span>
</div>
{/* Status info - secondary text */}
<div className="mt-2 text-[#9ca3af] text-sm">
Status: Active β’ Last check: 2s ago
</div>
</Card><div className="bg-black border border-[#00ff9f] rounded p-4 font-mono">
<div className="flex gap-2 text-[#00ff9f]">
<span className="text-[#00ff9f]">β</span>
<span>14:32:15 | FixBot detected high CPU (95%)</span>
</div>
<div className="flex gap-2 text-[#00d4ff]">
<span className="text-[#00d4ff]">β</span>
<span>14:32:18 | Analyzing root cause...</span>
</div>
<div className="flex gap-2 text-[#00ff9f]">
<span className="text-[#00ff9f]">β</span>
<span>14:33:45 | β
RESOLVED - Droplet resized</span>
</div>
</div><div className="space-y-2">
<div className="flex justify-between text-sm">
<span className="text-[#9ca3af]">CPU Usage</span>
<span className="text-[#00ff9f] font-mono">42%</span>
</div>
{/* Progress bar with gradient */}
<div className="h-2 bg-[#1f2937] rounded-full overflow-hidden">
<div
className="h-full bg-gradient-to-r from-[#00ff9f] to-[#00d4ff]"
style={{ width: "42%" }}
/>
</div>
</div>Use sparingly and professionally:
/* Pulse for active states */
@keyframes pulse {
0%,
100% {
opacity: 1;
}
50% {
opacity: 0.6;
}
}
/* Glow effect on hover */
.hover-glow:hover {
box-shadow: 0 0 20px rgba(0, 255, 159, 0.3);
}
/* Subtle scan line (optional) */
@keyframes scan {
0% {
transform: translateY(-100%);
}
100% {
transform: translateY(100%);
}
}DO:
- β Pulse indicators for active/live states
- β Smooth transitions (0.2-0.3s)
- β Hover effects (glow, border color change)
- β Fade in/out for notifications
DON'T:
- β Excessive animations
- β Constant movement
- β Distracting effects during demo
- β Flashy transitions
Dashboard Grid (Desktop):
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β π€ FixBot [β] LIVE β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β [Monitor] [Diagnostic] [Remediation] β β Agent status cards
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β π System Metrics β β Metrics display
β CPU | Memory | Disk β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β π΄ Live: Incident Timeline β β Real-time feed
β [Scrolling event stream...] β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
Spacing:
- Use
gap-4(1rem) orgap-6(1.5rem) between elements - Generous padding inside cards:
p-6orp-8 - Consistent margins:
my-4ormy-6
5 minutes before demo:
-
Start Backend:
cd fixbot/backend && python main.py
-
Start Frontend:
cd fixbot/frontend && npm run dev
-
Open Dashboard:
open http://localhost:3000
-
Verify Status:
- All agents show green "Active"
- System metrics display normally
- Live indicator shows "Connected"
-
Prepare Failure Injection:
cd fixbot/backend/demo # Have terminal ready with injection command
[0:00-0:30] Hook + Dashboard Intro
YOU: "Infrastructure breaks. That's a fact of life.
But what if you had a bot that fixed things automatically -
before they wake up your on-call engineer at 3 AM?
That's FixBot."
[Show dashboard on screen - point to it]
"This is FixBot's operations center.
Three AI agents monitoring our infrastructure 24/7."
[0:30-1:00] Architecture Walkthrough
[Point to each agent card]
YOU: "Three specialized agents:
Monitor Agent - detects issues via DigitalOcean MCP and Prometheus
Diagnostic Agent - uses RAG to analyze root causes
Remediation Agent - fixes problems automatically via Terraform
All powered by DigitalOcean Gradient AI with Model Context Protocol."
[1:00-4:00] Live Demo: CPU Spike
# Run in terminal (don't show this to judges, just run it)
python failure_injection.py --inject cpu --target web-app[FOCUS ON DASHBOARD - this is the star]
YOU: "Let me trigger a real incident. I'm overloading our web server..."
[Dashboard comes alive:]
- Monitor Agent: Status changes to "β DETECTING..."
- Incident feed starts scrolling:
"14:32:15 | π΄ ALERT: High CPU detected (95%)"
[CPU metric bar turns red, shows 95%]
YOU: "Three seconds. FixBot detected it."
[Diagnostic Agent activates:]
"14:32:18 | π Analyzing root cause..."
"14:32:22 | π‘ Root cause: Undersized droplet"
"14:32:22 | π Recommended: Resize to s-2vcpu-4gb"
YOU: "Now it's using RAG - querying our knowledge base of past incidents,
DigitalOcean documentation, best practices..."
[Remediation Agent executes:]
"14:32:25 | π§ Executing: Terraform resize"
"14:32:30 | βοΈ Applying infrastructure changes..."
"14:33:45 | β
RESOLVED: Droplet resized"
[CPU drops to 42%, turns green]
[All agents return to "Active" status]
YOU: "90 seconds total. From detection to resolution.
Completely autonomous. No human intervention."
[Pause for impact]
[4:00-5:00] Show Traceability
[Click on resolved incident in feed]
[Opens trace viewer panel]
YOU: "Here's what makes this special - full traceability.
[Point to trace view showing:]
- Input metrics and system state
- RAG retrieval results from knowledge base
- Decision logic and confidence scores
- Terraform config generated
- Success validation
"Every decision the AI makes is auditable.
This isn't a black box. You can see exactly why FixBot chose this solution."
[5:00-6:00] Quick Second Demo (If Time)
python failure_injection.py --inject disk --target api-serverYOU: "One more. Disk full on API server..."
[Faster walkthrough on dashboard]
- Detect (5s)
- Diagnose (15s)
- Attach new volume (45s)
- Resolved
YOU: "Same pattern. Different problem. Fixed automatically."
[6:00-7:00] Closing
[Return to clean dashboard - all green]
YOU: "FixBot - the infrastructure fixer that never sleeps.
Key features:
β’ Detects issues in seconds using DigitalOcean MCP
β’ Diagnoses with AI-powered RAG
β’ Fixes automatically via Terraform
β’ Full traceability of every decision
β’ Built entirely on DigitalOcean Gradient AI
This is the future of infrastructure management.
No more 3 AM wake-up calls.
No more manual emergency fixes.
Just autonomous, intelligent infrastructure.
Questions?"
[Confident smile, pause]
DO:
- β Keep dashboard fullscreen during demo
- β Speak slowly and clearly
- β Pause after key points for impact
- β Point to screen elements as you explain
- β Show enthusiasm - this is cool tech!
- β Have backup video if live demo fails
DON'T:
- β Switch between terminal and browser constantly
- β Rush through the demo
- β Get lost in technical details
- β Apologize for delays (they're normal)
- β Turn your back to audience
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER INTERFACE β
β (Next.js Dashboard) β
β β
β β’ Dark Punk Professional Theme β
β β’ Real-time WebSocket Updates β
β β’ Agent Status Monitoring β
β β’ Incident Timeline β
β β’ Decision Traceability β
ββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββ
β
β WebSocket + REST API
β
ββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββ
β FASTAPI BACKEND β
β (Orchestrator) β
β β
β Endpoints: β
β β’ POST /incidents/detect β
β β’ POST /incidents/diagnose β
β β’ POST /incidents/remediate β
β β’ GET /status β
β β’ GET /agents/health β
β β’ WS /ws (WebSocket for real-time) β
ββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββ
β
β Agent API Calls
β
ββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββ
β DIGITALOCEAN GRADIENT AI PLATFORM β
β (Multi-Agent System) β
β β
β ββββββββββββ ββββββββββββββββ ββββββββββββββββββ β
β β MONITOR ββββ DIAGNOSTIC ββββ REMEDIATION β β
β β AGENT β β AGENT β β AGENT β β
β β β β β β β β
β β β’ Detect β β β’ RAG Query β β β’ Terraform β β
β β β’ Alert β β β’ Analyze β β β’ DO API β β
β β β’ Triage β β β’ Recommend β β β’ Validate β β
β ββββββ¬ββββββ ββββββββ¬ββββββββ ββββββββββ¬ββββββββ β
β β β β β
β βββββββββββββββββΌβββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββ β
β β KNOWLEDGE BASE (RAG) β β
β β β’ DO Documentation (auto-indexed) β β
β β β’ Runbooks & Best Practices β β
β β β’ Past Incident History β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β MCP Protocol
β
ββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββ
β MCP SERVERS β
β β
β βββββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β DigitalOcean β β Terraform β β Prometheus β β
β β MCP Server β β MCP Server β β MCP (Custom) β β
β β β β β β β β
β β β’ Droplets β β β’ Validate β β β’ Query β β
β β β’ Monitoring β β β’ Plan β β β’ Alerts β β
β β β’ Spaces β β β’ Apply β β β’ Metrics β β
β β β’ Kubernetes β β β’ State β β β β
β βββββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Framework: Next.js 14 (App Router)
- Language: TypeScript 5+
- Styling: Tailwind CSS 3.4
- Components: shadcn/ui
- State Management: React Hooks (useState, useEffect, useContext)
- Real-time: WebSocket API
- HTTP Client: Fetch API
- Build Tool: Next.js built-in (Turbopack)
- Framework: FastAPI 0.109+
- Language: Python 3.11+
- AI Platform: DigitalOcean Gradient AI
- Multi-agent system
- RAG (Retrieval-Augmented Generation)
- Function calling
- Agent evaluations
- Traceability
- Protocol: Model Context Protocol (MCP)
- Infrastructure: Terraform 1.6+
- Monitoring: Prometheus
- State Management: DO Spaces (S3-compatible)
- WebSocket: FastAPI WebSocket support
- Cloud: DigitalOcean
- Droplets (compute)
- Spaces (object storage)
- Managed Kubernetes (optional)
- Monitoring (built-in)
- IaC: Terraform with DO provider
- Orchestration: FastAPI + asyncio
- Monitoring: Prometheus + node_exporter
- Real-time dashboard with WebSocket
- Dark punk professional theme
- Agent status monitoring
- Live incident feed
- System metrics visualization
- Decision traceability viewer
- Responsive layout (desktop-focused)
- Monitor Agent with DO MCP integration
- Diagnostic Agent with RAG
- Remediation Agent with Terraform
- FastAPI orchestrator
- WebSocket server for real-time updates
- MCP client implementations
- Knowledge base setup
- Demo failure injection scripts
- Agent evaluations
# Install
npm install
# Dev mode
npm run dev
# Build
npm run build
# Production
npm start
# Type check
npm run type-check
# Lint
npm run lint# Install
pip install -r requirements.txt
# Run dev
python main.py
# Run with reload
uvicorn main:app --reload
# Run tests
pytest tests/
# Type check
mypy .- Check
.env.localhas correctNEXT_PUBLIC_API_URL - Verify backend is running on expected port
- Check CORS settings in FastAPI backend
- Look for errors in browser console (F12)
- Check
NEXT_PUBLIC_WS_URLin.env.local - Verify WebSocket endpoint exists in backend
- Check firewall/proxy settings
- Test with:
wscat -c ws://localhost:8000/ws
- Clear browser cache
- Check
globals.cssis imported inlayout.tsx - Verify Tailwind is processing CSS correctly
- Run
npm run devwith clean cache
- Check Gradient AI API keys in backend
.env - Verify agent IDs are correct
- Test agent endpoints individually
- Check Gradient AI dashboard for errors
MIT License - see LICENSE file for details
MLH + DigitalOcean AI Hackathon NYC
December 12-13, 2025
Built with β€οΈ and β by [Your Name]
- DigitalOcean Gradient AI
- Model Context Protocol
- Next.js Documentation
- FastAPI Documentation
- Terraform DigitalOcean Provider
Questions? Found a bug? Want to contribute?
Open an issue or PR on GitHub!
π€ FixBot - Breaking things? We fix them before you notice. π€