Skip to content
/ rift Public

Autonomous Infrastructure orchestration and remediation powered by DigitalOcean Gradient AI + MCP

Notifications You must be signed in to change notification settings

itisaby/rift

Repository files navigation

🌌 Rift

Rift through operational complexity

Autonomous infrastructure orchestration and remediation powered by DigitalOcean Gradient AI + Model Context Protocol.

This is a Vibe-coded project, I just made out of fun in a hackathon weekend as I like the idea.

Opening rifts to create, closing rifts to fix - all at machine speed.

Demo Built with DigitalOcean License


πŸ“‹ Table of Contents


🎯 What is Rift?

Rift is an AI-powered infrastructure orchestrator that autonomously detects, diagnoses, and fixes infrastructure issues using multi-agent systems and the Model Context Protocol (MCP).

The Problem

  • DevOps engineers spend 40% of their time on routine infrastructure incidents
  • Average incident response time: 2-4 hours
  • Manual fixes are error-prone and inconsistent
  • On-call fatigue leads to burnout

The Solution

FixBot uses three specialized AI agents:

  1. Monitor Agent - Detects issues in seconds via DigitalOcean MCP + Prometheus
  2. Diagnostic Agent - Uses RAG to analyze root causes from knowledge base
  3. Remediation Agent - Fixes problems automatically via Terraform + MCP

Result: Incident response time reduced from hours to ~90 seconds, fully autonomous.


πŸ“ Project Structure

fixbot/
β”‚
β”œβ”€β”€ frontend/                    # 🎨 Next.js Dashboard (Pre-built)
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ page.tsx            # Main dashboard
β”‚   β”‚   β”œβ”€β”€ layout.tsx          # Root layout
β”‚   β”‚   β”œβ”€β”€ globals.css         # Dark punk theme styles
β”‚   β”‚   └── favicon.ico
β”‚   β”‚
β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”œβ”€β”€ ui/                 # shadcn/ui components
β”‚   β”‚   β”‚   β”œβ”€β”€ button.tsx
β”‚   β”‚   β”‚   β”œβ”€β”€ card.tsx
β”‚   β”‚   β”‚   └── badge.tsx
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ StatusCard.tsx      # Agent status display
β”‚   β”‚   β”œβ”€β”€ IncidentFeed.tsx    # Real-time event stream
β”‚   β”‚   β”œβ”€β”€ AgentStatus.tsx     # Agent health monitor
β”‚   β”‚   β”œβ”€β”€ MetricsChart.tsx    # System metrics visualization
β”‚   β”‚   β”œβ”€β”€ TraceViewer.tsx     # AI decision traceability
β”‚   β”‚   └── Terminal.tsx        # Terminal-style output
β”‚   β”‚
β”‚   β”œβ”€β”€ lib/
β”‚   β”‚   β”œβ”€β”€ api.ts              # Backend API client
β”‚   β”‚   β”œβ”€β”€ websocket.ts        # WebSocket connection
β”‚   β”‚   └── utils.ts            # Utility functions
β”‚   β”‚
β”‚   β”œβ”€β”€ public/
β”‚   β”‚   └── fixbot-logo.svg
β”‚   β”‚
β”‚   β”œβ”€β”€ package.json
β”‚   β”œβ”€β”€ tsconfig.json
β”‚   β”œβ”€β”€ tailwind.config.ts      # Dark punk theme
β”‚   β”œβ”€β”€ next.config.js
β”‚   β”œβ”€β”€ .env.local              # Configure this!
β”‚   └── README.md
β”‚
└── backend/                     # πŸ”§ Python Backend (Build this!)
    β”œβ”€β”€ agents/
    β”‚   β”œβ”€β”€ base_agent.py       # Base agent class
    β”‚   β”œβ”€β”€ monitor_agent.py    # Monitoring logic
    β”‚   β”œβ”€β”€ diagnostic_agent.py # Diagnosis with RAG
    β”‚   └── remediation_agent.py # Auto-remediation
    β”‚
    β”œβ”€β”€ mcp_clients/
    β”‚   β”œβ”€β”€ do_mcp.py           # DigitalOcean MCP
    β”‚   β”œβ”€β”€ terraform_mcp.py    # Terraform MCP
    β”‚   └── prometheus_mcp.py   # Custom Prometheus MCP
    β”‚
    β”œβ”€β”€ orchestrator/
    β”‚   └── coordinator.py      # Agent coordination
    β”‚
    β”œβ”€β”€ models/
    β”‚   └── incident.py         # Pydantic models
    β”‚
    β”œβ”€β”€ terraform/
    β”‚   β”œβ”€β”€ main.tf
    β”‚   └── modules/
    β”‚
    β”œβ”€β”€ demo/
    β”‚   └── failure_injection.py # Demo scenarios
    β”‚
    β”œβ”€β”€ knowledge-base/
    β”‚   β”œβ”€β”€ do-docs.md
    β”‚   β”œβ”€β”€ runbooks.md
    β”‚   └── past-incidents.json
    β”‚
    β”œβ”€β”€ main.py                 # FastAPI backend
    β”œβ”€β”€ requirements.txt
    β”œβ”€β”€ .env                    # Configure this!
    └── README.md

Directory Responsibilities

Frontend (fixbot/frontend/):

  • βœ… Pre-built and ready to use - Just run npm install and configure .env.local
  • Next.js 14+ with App Router
  • Real-time dashboard with WebSocket updates
  • Dark Punk Professional Theme - Cyberpunk aesthetics meets Bloomberg Terminal
  • TypeScript + Tailwind CSS + shadcn/ui
  • Minimal configuration needed

Backend (fixbot/backend/):

  • ⚠️ You build this during the hackathon
  • Python FastAPI application
  • AI agents (Monitor, Diagnostic, Remediation)
  • MCP server integrations
  • WebSocket server for real-time updates
  • Infrastructure as Code (Terraform)

🎨 Frontend Setup

The frontend is pre-built with a professional dark punk theme. You just need to install and configure it.

Prerequisites

  • Node.js 18+ and npm 9+
  • A running backend API (see Backend Setup)

Quick Start

# Navigate to frontend directory
cd fixbot/frontend

# Install dependencies
npm install

# Configure environment variables
cp .env.example .env.local

Edit .env.local:

# Backend API endpoint
NEXT_PUBLIC_API_URL=http://localhost:8000

# WebSocket endpoint
NEXT_PUBLIC_WS_URL=ws://localhost:8000/ws
# Run development server
npm run dev

# Open browser
open http://localhost:3000

Frontend Tech Stack

  • Framework: Next.js 14 (App Router)
  • Language: TypeScript
  • Styling: Tailwind CSS
  • Components: shadcn/ui
  • State: React Hooks
  • Real-time: WebSocket client
  • API Client: Fetch API with error handling

πŸ”§ Backend Setup

The backend is what you'll build during the hackathon.

Prerequisites

  • Python 3.11+
  • DigitalOcean account with API token
  • Gradient AI Platform access
  • Terraform installed
  • Docker (for MCP servers)

Quick Start

# Navigate to backend directory
cd fixbot/backend

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Configure environment variables
cp .env.example .env

Edit .env:

# DigitalOcean
DO_API_TOKEN=your_do_token_here
DO_SPACES_KEY=your_spaces_key
DO_SPACES_SECRET=your_spaces_secret

# Gradient AI
GRADIENT_AI_API_KEY=your_gradient_key
MONITOR_AGENT_ID=agent_xxx
DIAGNOSTIC_AGENT_ID=agent_yyy
REMEDIATION_AGENT_ID=agent_zzz

# MCP Servers
DO_MCP_URL=http://localhost:3000
TERRAFORM_MCP_URL=http://localhost:3001
PROMETHEUS_URL=http://your-prometheus:9090
# Start MCP servers (in separate terminals)
# See MCP Integration section in full guide

# Run FastAPI backend
python main.py

# Backend should start on http://localhost:8000

Backend Tech Stack

  • Framework: FastAPI
  • AI Platform: DigitalOcean Gradient AI
  • Protocol: Model Context Protocol (MCP)
  • IaC: Terraform
  • Monitoring: Prometheus
  • Language: Python 3.11+

πŸš€ Running the Complete System

Development Mode

Terminal 1: Backend API

cd fixbot/backend
source venv/bin/activate
python main.py
# Runs on http://localhost:8000

Terminal 2: Frontend Dashboard

cd fixbot/frontend
npm run dev
# Runs on http://localhost:3000

Terminal 3: Monitor Logs (Optional)

cd fixbot/backend
tail -f logs/fixbot.log

Verify Everything Works

# Check backend health
curl http://localhost:8000/agents/health

# Check frontend loads
curl http://localhost:3000

# Open dashboard in browser
open http://localhost:3000  # macOS
xdg-open http://localhost:3000  # Linux

You should see:

  • βœ… All three agent status cards showing "Active" with green indicators
  • βœ… System metrics displaying normal values
  • βœ… Live connection indicator showing "Connected"
  • βœ… Empty incident feed (no incidents yet)

🎨 Design System (Dark Punk Professional Theme)

Theme Philosophy

"Professional Cyberpunk" - The aesthetic of a high-tech operations center. Think: Blade Runner meets Bloomberg Terminal. Dark, sleek, with neon accents that convey urgency and precision.

Color Palette

/* Background & Surfaces */
--background: #0a0e17; /* Deep space black */
--surface: #111827; /* Card/panel background */
--surface-elevated: #1f2937; /* Elevated elements */

/* Brand Colors (Neon Accents) */
--primary: #00ff9f; /* Neon green - success/active */
--secondary: #00d4ff; /* Cyber blue - info */
--accent: #ff00ff; /* Neon magenta - alerts */

/* Status Colors */
--success: #00ff9f; /* Neon green */
--warning: #ffaa00; /* Electric amber */
--danger: #ff3366; /* Hot pink red */

/* Text */
--text-primary: #e5e7eb; /* Almost white */
--text-secondary: #9ca3af; /* Muted gray */
--text-muted: #6b7280; /* Very muted */

/* Borders */
--border: #1f2937; /* Subtle borders */
--border-bright: #374151; /* Highlighted borders */

Typography

Fonts:

  • Headers: "JetBrains Mono" or "Space Mono" (monospace, technical feel)
  • Body: "Inter" or "DM Sans" (clean, readable)
  • Code/Terminal: "Fira Code" or "Cascadia Code" (with ligatures)

Guidelines:

  • Use UPPERCASE for labels and status indicators
  • Use monospace for anything technical (IDs, timestamps, metrics)
  • Use medium-large sizes for important info (remember: projector demo!)
  • Use color to convey meaning (green = good, red = critical, blue = info)

Component Examples

Status Card (Agent Display)

<Card className="bg-[#111827] border border-[#1f2937] hover:border-[#00ff9f] transition-all">
  <div className="flex items-center gap-3">
    {/* Active indicator - pulsing green dot */}
    <div className="h-2 w-2 rounded-full bg-[#00ff9f] animate-pulse" />

    {/* Agent name - monospace, uppercase, neon green */}
    <span className="text-[#00ff9f] font-mono uppercase tracking-wider">
      Monitor Agent
    </span>
  </div>

  {/* Status info - secondary text */}
  <div className="mt-2 text-[#9ca3af] text-sm">
    Status: Active β€’ Last check: 2s ago
  </div>
</Card>

Terminal/Console Output

<div className="bg-black border border-[#00ff9f] rounded p-4 font-mono">
  <div className="flex gap-2 text-[#00ff9f]">
    <span className="text-[#00ff9f]">●</span>
    <span>14:32:15 | FixBot detected high CPU (95%)</span>
  </div>
  <div className="flex gap-2 text-[#00d4ff]">
    <span className="text-[#00d4ff]">●</span>
    <span>14:32:18 | Analyzing root cause...</span>
  </div>
  <div className="flex gap-2 text-[#00ff9f]">
    <span className="text-[#00ff9f]">●</span>
    <span>14:33:45 | βœ… RESOLVED - Droplet resized</span>
  </div>
</div>

Metrics Display

<div className="space-y-2">
  <div className="flex justify-between text-sm">
    <span className="text-[#9ca3af]">CPU Usage</span>
    <span className="text-[#00ff9f] font-mono">42%</span>
  </div>

  {/* Progress bar with gradient */}
  <div className="h-2 bg-[#1f2937] rounded-full overflow-hidden">
    <div
      className="h-full bg-gradient-to-r from-[#00ff9f] to-[#00d4ff]"
      style={{ width: "42%" }}
    />
  </div>
</div>

Animation Guidelines

Use sparingly and professionally:

/* Pulse for active states */
@keyframes pulse {
  0%,
  100% {
    opacity: 1;
  }
  50% {
    opacity: 0.6;
  }
}

/* Glow effect on hover */
.hover-glow:hover {
  box-shadow: 0 0 20px rgba(0, 255, 159, 0.3);
}

/* Subtle scan line (optional) */
@keyframes scan {
  0% {
    transform: translateY(-100%);
  }
  100% {
    transform: translateY(100%);
  }
}

DO:

  • βœ… Pulse indicators for active/live states
  • βœ… Smooth transitions (0.2-0.3s)
  • βœ… Hover effects (glow, border color change)
  • βœ… Fade in/out for notifications

DON'T:

  • ❌ Excessive animations
  • ❌ Constant movement
  • ❌ Distracting effects during demo
  • ❌ Flashy transitions

Layout Principles

Dashboard Grid (Desktop):

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  πŸ€– FixBot                        [●] LIVE     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                 β”‚
β”‚  [Monitor]  [Diagnostic]  [Remediation]        β”‚  ← Agent status cards
β”‚                                                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                 β”‚
β”‚  πŸ“Š System Metrics                              β”‚  ← Metrics display
β”‚  CPU | Memory | Disk                            β”‚
β”‚                                                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                 β”‚
β”‚  πŸ”΄ Live: Incident Timeline                     β”‚  ← Real-time feed
β”‚  [Scrolling event stream...]                    β”‚
β”‚                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Spacing:

  • Use gap-4 (1rem) or gap-6 (1.5rem) between elements
  • Generous padding inside cards: p-6 or p-8
  • Consistent margins: my-4 or my-6

🎬 Demo Guide

Pre-Demo Checklist

5 minutes before demo:

  1. Start Backend:

    cd fixbot/backend && python main.py
  2. Start Frontend:

    cd fixbot/frontend && npm run dev
  3. Open Dashboard:

    open http://localhost:3000
  4. Verify Status:

    • All agents show green "Active"
    • System metrics display normally
    • Live indicator shows "Connected"
  5. Prepare Failure Injection:

    cd fixbot/backend/demo
    # Have terminal ready with injection command

Demo Script (7 Minutes)

[0:00-0:30] Hook + Dashboard Intro

YOU: "Infrastructure breaks. That's a fact of life.

But what if you had a bot that fixed things automatically -
before they wake up your on-call engineer at 3 AM?

That's FixBot."

[Show dashboard on screen - point to it]

"This is FixBot's operations center.
Three AI agents monitoring our infrastructure 24/7."

[0:30-1:00] Architecture Walkthrough

[Point to each agent card]

YOU: "Three specialized agents:

Monitor Agent - detects issues via DigitalOcean MCP and Prometheus
Diagnostic Agent - uses RAG to analyze root causes
Remediation Agent - fixes problems automatically via Terraform

All powered by DigitalOcean Gradient AI with Model Context Protocol."

[1:00-4:00] Live Demo: CPU Spike

# Run in terminal (don't show this to judges, just run it)
python failure_injection.py --inject cpu --target web-app
[FOCUS ON DASHBOARD - this is the star]

YOU: "Let me trigger a real incident. I'm overloading our web server..."

[Dashboard comes alive:]
- Monitor Agent: Status changes to "⚠ DETECTING..."
- Incident feed starts scrolling:
  "14:32:15 | πŸ”΄ ALERT: High CPU detected (95%)"

[CPU metric bar turns red, shows 95%]

YOU: "Three seconds. FixBot detected it."

[Diagnostic Agent activates:]
  "14:32:18 | πŸ” Analyzing root cause..."
  "14:32:22 | πŸ’‘ Root cause: Undersized droplet"
  "14:32:22 | πŸ“‹ Recommended: Resize to s-2vcpu-4gb"

YOU: "Now it's using RAG - querying our knowledge base of past incidents,
DigitalOcean documentation, best practices..."

[Remediation Agent executes:]
  "14:32:25 | πŸ”§ Executing: Terraform resize"
  "14:32:30 | βš™οΈ  Applying infrastructure changes..."
  "14:33:45 | βœ… RESOLVED: Droplet resized"

[CPU drops to 42%, turns green]
[All agents return to "Active" status]

YOU: "90 seconds total. From detection to resolution.
Completely autonomous. No human intervention."

[Pause for impact]

[4:00-5:00] Show Traceability

[Click on resolved incident in feed]
[Opens trace viewer panel]

YOU: "Here's what makes this special - full traceability.

[Point to trace view showing:]
- Input metrics and system state
- RAG retrieval results from knowledge base
- Decision logic and confidence scores
- Terraform config generated
- Success validation

"Every decision the AI makes is auditable.
This isn't a black box. You can see exactly why FixBot chose this solution."

[5:00-6:00] Quick Second Demo (If Time)

python failure_injection.py --inject disk --target api-server
YOU: "One more. Disk full on API server..."

[Faster walkthrough on dashboard]
- Detect (5s)
- Diagnose (15s)
- Attach new volume (45s)
- Resolved

YOU: "Same pattern. Different problem. Fixed automatically."

[6:00-7:00] Closing

[Return to clean dashboard - all green]

YOU: "FixBot - the infrastructure fixer that never sleeps.

Key features:
β€’ Detects issues in seconds using DigitalOcean MCP
β€’ Diagnoses with AI-powered RAG
β€’ Fixes automatically via Terraform
β€’ Full traceability of every decision
β€’ Built entirely on DigitalOcean Gradient AI

This is the future of infrastructure management.
No more 3 AM wake-up calls.
No more manual emergency fixes.
Just autonomous, intelligent infrastructure.

Questions?"

[Confident smile, pause]

Demo Tips

DO:

  • βœ… Keep dashboard fullscreen during demo
  • βœ… Speak slowly and clearly
  • βœ… Pause after key points for impact
  • βœ… Point to screen elements as you explain
  • βœ… Show enthusiasm - this is cool tech!
  • βœ… Have backup video if live demo fails

DON'T:

  • ❌ Switch between terminal and browser constantly
  • ❌ Rush through the demo
  • ❌ Get lost in technical details
  • ❌ Apologize for delays (they're normal)
  • ❌ Turn your back to audience

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     USER INTERFACE                           β”‚
β”‚                  (Next.js Dashboard)                         β”‚
β”‚                                                              β”‚
β”‚  β€’ Dark Punk Professional Theme                              β”‚
β”‚  β€’ Real-time WebSocket Updates                               β”‚
β”‚  β€’ Agent Status Monitoring                                   β”‚
β”‚  β€’ Incident Timeline                                         β”‚
β”‚  β€’ Decision Traceability                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
                 β”‚ WebSocket + REST API
                 β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  FASTAPI BACKEND                             β”‚
β”‚                  (Orchestrator)                              β”‚
β”‚                                                              β”‚
β”‚  Endpoints:                                                  β”‚
β”‚  β€’ POST /incidents/detect                                    β”‚
β”‚  β€’ POST /incidents/diagnose                                  β”‚
β”‚  β€’ POST /incidents/remediate                                 β”‚
β”‚  β€’ GET /status                                               β”‚
β”‚  β€’ GET /agents/health                                        β”‚
β”‚  β€’ WS /ws (WebSocket for real-time)                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
                 β”‚ Agent API Calls
                 β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         DIGITALOCEAN GRADIENT AI PLATFORM                    β”‚
β”‚              (Multi-Agent System)                            β”‚
β”‚                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚
β”‚  β”‚ MONITOR  │──│ DIAGNOSTIC   │──│ REMEDIATION    β”‚       β”‚
β”‚  β”‚ AGENT    β”‚  β”‚ AGENT        β”‚  β”‚ AGENT          β”‚       β”‚
β”‚  β”‚          β”‚  β”‚              β”‚  β”‚                β”‚       β”‚
β”‚  β”‚ β€’ Detect β”‚  β”‚ β€’ RAG Query  β”‚  β”‚ β€’ Terraform    β”‚       β”‚
β”‚  β”‚ β€’ Alert  β”‚  β”‚ β€’ Analyze    β”‚  β”‚ β€’ DO API       β”‚       β”‚
β”‚  β”‚ β€’ Triage β”‚  β”‚ β€’ Recommend  β”‚  β”‚ β€’ Validate     β”‚       β”‚
β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚
β”‚       β”‚               β”‚                    β”‚                β”‚
β”‚       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                β”‚
β”‚                       β”‚                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚         KNOWLEDGE BASE (RAG)                         β”‚  β”‚
β”‚  β”‚  β€’ DO Documentation (auto-indexed)                   β”‚  β”‚
β”‚  β”‚  β€’ Runbooks & Best Practices                         β”‚  β”‚
β”‚  β”‚  β€’ Past Incident History                             β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
                 β”‚ MCP Protocol
                 β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              MCP SERVERS                                     β”‚
β”‚                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ DigitalOcean    β”‚  β”‚ Terraform    β”‚  β”‚ Prometheus   β”‚  β”‚
β”‚  β”‚ MCP Server      β”‚  β”‚ MCP Server   β”‚  β”‚ MCP (Custom) β”‚  β”‚
β”‚  β”‚                 β”‚  β”‚              β”‚  β”‚              β”‚  β”‚
β”‚  β”‚ β€’ Droplets      β”‚  β”‚ β€’ Validate   β”‚  β”‚ β€’ Query      β”‚  β”‚
β”‚  β”‚ β€’ Monitoring    β”‚  β”‚ β€’ Plan       β”‚  β”‚ β€’ Alerts     β”‚  β”‚
β”‚  β”‚ β€’ Spaces        β”‚  β”‚ β€’ Apply      β”‚  β”‚ β€’ Metrics    β”‚  β”‚
β”‚  β”‚ β€’ Kubernetes    β”‚  β”‚ β€’ State      β”‚  β”‚              β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ› οΈ Tech Stack

Frontend

  • Framework: Next.js 14 (App Router)
  • Language: TypeScript 5+
  • Styling: Tailwind CSS 3.4
  • Components: shadcn/ui
  • State Management: React Hooks (useState, useEffect, useContext)
  • Real-time: WebSocket API
  • HTTP Client: Fetch API
  • Build Tool: Next.js built-in (Turbopack)

Backend

  • Framework: FastAPI 0.109+
  • Language: Python 3.11+
  • AI Platform: DigitalOcean Gradient AI
    • Multi-agent system
    • RAG (Retrieval-Augmented Generation)
    • Function calling
    • Agent evaluations
    • Traceability
  • Protocol: Model Context Protocol (MCP)
  • Infrastructure: Terraform 1.6+
  • Monitoring: Prometheus
  • State Management: DO Spaces (S3-compatible)
  • WebSocket: FastAPI WebSocket support

Infrastructure

  • Cloud: DigitalOcean
    • Droplets (compute)
    • Spaces (object storage)
    • Managed Kubernetes (optional)
    • Monitoring (built-in)
  • IaC: Terraform with DO provider
  • Orchestration: FastAPI + asyncio
  • Monitoring: Prometheus + node_exporter

🎯 Key Features

βœ… Already Implemented (Frontend)

  • Real-time dashboard with WebSocket
  • Dark punk professional theme
  • Agent status monitoring
  • Live incident feed
  • System metrics visualization
  • Decision traceability viewer
  • Responsive layout (desktop-focused)

πŸ”¨ To Implement (Backend - Your Job!)

  • Monitor Agent with DO MCP integration
  • Diagnostic Agent with RAG
  • Remediation Agent with Terraform
  • FastAPI orchestrator
  • WebSocket server for real-time updates
  • MCP client implementations
  • Knowledge base setup
  • Demo failure injection scripts
  • Agent evaluations

πŸ“¦ Quick Commands Reference

Frontend

# Install
npm install

# Dev mode
npm run dev

# Build
npm run build

# Production
npm start

# Type check
npm run type-check

# Lint
npm run lint

Backend

# Install
pip install -r requirements.txt

# Run dev
python main.py

# Run with reload
uvicorn main:app --reload

# Run tests
pytest tests/

# Type check
mypy .

πŸ› Troubleshooting

Frontend won't connect to backend

  • Check .env.local has correct NEXT_PUBLIC_API_URL
  • Verify backend is running on expected port
  • Check CORS settings in FastAPI backend
  • Look for errors in browser console (F12)

WebSocket connection fails

  • Check NEXT_PUBLIC_WS_URL in .env.local
  • Verify WebSocket endpoint exists in backend
  • Check firewall/proxy settings
  • Test with: wscat -c ws://localhost:8000/ws

Dark theme not applying

  • Clear browser cache
  • Check globals.css is imported in layout.tsx
  • Verify Tailwind is processing CSS correctly
  • Run npm run dev with clean cache

Agents not responding

  • Check Gradient AI API keys in backend .env
  • Verify agent IDs are correct
  • Test agent endpoints individually
  • Check Gradient AI dashboard for errors

πŸ“ License

MIT License - see LICENSE file for details


πŸ† Built For

MLH + DigitalOcean AI Hackathon NYC
December 12-13, 2025


πŸ‘₯ Team

Built with ❀️ and β˜• by [Your Name]


πŸ”— Links


Questions? Found a bug? Want to contribute?
Open an issue or PR on GitHub!

πŸ€– FixBot - Breaking things? We fix them before you notice. πŸ€–

About

Autonomous Infrastructure orchestration and remediation powered by DigitalOcean Gradient AI + MCP

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published