A professional-grade backend for evaluating, comparing, and analyzing Large Language Models with real-time tracing and analytics.
- Dual Response Generation: Compare two model responses side-by-side
- Preference Learning: Collect and analyze user preferences
- Real-time Tracing: Track every API call, token usage, and latency
- Performance Analytics: Dashboard-ready metrics and insights
- Multi-Model Support: Works with Gemini, GPT-4, Claude, and custom models
- MongoDB Integration: Scalable data storage with optimized indexes
- Production Ready: Proper error handling, logging, and monitoring
- Python 3.9+
- MongoDB (local or Atlas)
- At least one LLM API key (Gemini, OpenAI, or Anthropic)
-
Clone the repository
cd citrus_backend -
Create virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Configure environment
cp .env.example .env # Edit .env with your configuration -
Run the server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
-
Access the API
- API: http://localhost:8000
- Swagger Docs: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
citrus_backend/
├── app/
│ ├── __init__.py
│ ├── config.py # Configuration and settings
│ ├── main.py # FastAPI application
│ ├── core/
│ │ ├── __init__.py
│ │ ├── database.py # MongoDB connection
│ │ ├── tracing.py # Tracing system
│ │ └── trace_storage.py # Trace persistence
│ ├── models/
│ │ ├── __init__.py
│ │ ├── schemas.py # Pydantic models
│ │ └── state.py # LangGraph state
│ ├── routers/
│ │ ├── __init__.py
│ │ ├── evaluations.py # Chat and evaluation endpoints
│ │ └── traces.py # Analytics endpoints
│ └── services/
│ ├── __init__.py
│ └── graph.py # LangGraph workflow
├── .env.example # Environment template
├── requirements.txt # Python dependencies
└── README.md # This file
Key configuration options in .env:
# Required
MONGODB_URL=mongodb://localhost:27017
GEMINI_API_KEY=your_key_here
# Optional
DEFAULT_MODEL=gemini-1.5-pro
DEFAULT_TEMPERATURE=0.7
CORS_ORIGINS=http://localhost:3000,http://localhost:5173See .env.example for all available options.
POST /api/dual-responses- Generate two responses for comparisonPOST /api/store-preference- Store user preferencePOST /api/chat/send- Send a single chat messageGET /api/stats- Get platform statistics
GET /api/v1/traces- List traces with filteringGET /api/v1/traces/{trace_id}- Get specific traceGET /api/v1/traces/statistics- Aggregated statisticsGET /api/v1/models/performance- Model performance metricsGET /api/v1/analytics/realtime- Real-time dashboard metrics
GET /health- Health checkGET /- API informationGET /api/info- Detailed platform info
import requests
response = requests.post("http://localhost:8000/api/dual-responses", json={
"user_message": "Explain quantum computing",
"chat_history": [],
"session_id": "test-session-1",
"temperature": 0.7
})
data = response.json()
print("Response 1:", data["response_1"])
print("Response 2:", data["response_2"])requests.post("http://localhost:8000/api/store-preference", json={
"session_id": "test-session-1",
"user_message": "Explain quantum computing",
"response_1": "...",
"response_2": "...",
"choice": "response_1",
"reasoning": "More clear and concise"
})# Get real-time metrics
stats = requests.get("http://localhost:8000/api/v1/analytics/realtime?minutes=60")
print(stats.json())
# Get model performance
perf = requests.get("http://localhost:8000/api/v1/models/performance?days=7")
print(perf.json())Run tests with pytest:
pytest tests/ -vThe application logs to stdout with structured formatting:
2024-02-01 10:42:00 - app.main - INFO - 🚀 Starting Citrus Platform...
2024-02-01 10:42:01 - app.core.database - INFO - ✓ Database connectedEvery request is automatically traced with:
- Latency measurements
- Token usage
- Error tracking
- Model metadata
Access traces via /api/v1/traces endpoints.
Monitor system health:
curl http://localhost:8000/healthReturns:
{
"status": "healthy",
"database": "connected",
"version": "2.4.0",
"uptime_seconds": 3600.5,
"timestamp": "2024-02-01T10:42:00Z"
}Create a Dockerfile:
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app/ ./app/
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]Build and run:
docker build -t citrus-backend .
docker run -p 8000:8000 --env-file .env citrus-backendIn production, update these settings:
# Use production MongoDB
MONGODB_URL=mongodb+srv://user:pass@cluster.mongodb.net/
# Restrict CORS
CORS_ORIGINS=https://yourdomain.com
# Enable API keys
API_KEY_REQUIRED=true
API_KEYS=prod_key_1,prod_key_2
# Reduce logging
DEBUG=false# Format code
black app/
# Lint
flake8 app/
# Type check
mypy app/- Update
app/config.pywith model configuration - Add model wrapper if needed in
app/core/model_wrappers.py - Update
app/services/graph.pyto use the new model
- Create router in
app/routers/ - Define Pydantic schemas in
app/models/schemas.py - Include router in
app/main.py
- evaluations: Evaluation results and metrics
- preferences: User preference submissions
- traces: Detailed execution traces
- analytics: Aggregated analytics data
- models: Model configurations
Automatically created on startup for optimal query performance.
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests
- Submit a pull request
MIT License - see LICENSE file for details
- Documentation: http://localhost:8000/docs
- Issues: GitHub Issues
- Email: support@citrus.ai
- Support for more LLM providers
- Advanced analytics visualizations
- A/B testing framework
- Custom evaluation metrics
- Real-time collaboration
- Export to popular formats
Built with ❤️ using FastAPI, LangGraph, and MongoDB