RAG (Retrieval-Augmented Generation) Evaluation Service for automatic quality assessment and analysis.
- Data Synchronization: Automatically fetch historical conversation data from external APIs
- RAGAS Evaluation: Evaluate RAG responses using Faithfulness, Answer Relevancy, and Context Precision metrics
- LLM Analysis: Deep analysis and improvement suggestions using large language models
- Analytics Dashboard: Visualize trends, compare retrievers/embeddings, and analyze issues
- Scheduled Tasks: Automated daily sync and evaluation tasks
- FastAPI + SQLAlchemy + MySQL
- APScheduler for scheduled tasks
- RAGAS >= 0.2.0 for evaluation
- LangChain for LLM integration
- Next.js 15 + React 19 + TypeScript
- shadcn/ui + Tailwind CSS
- Recharts for data visualization
d
- Copy environment file:
cp .env.example .env
# Edit .env with your configuration- Start all services:
docker-compose up -d- Access the application:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
cd backend
# Install dependencies
pip install uv
uv pip install -e .
# Run database migrations
uv run alembic upgrade head
# Start the server
uv run uvicorn main:app --reloadcd frontend
npm install
npm run devSee .env.example for all available configuration options:
- Database: MySQL connection settings
- External API: OAuth 2.0 credentials for data sync
- RAGAS: LLM and embedding model configuration
- Scheduled Tasks: Cron expressions for automation
POST /api/sync/trigger- Trigger data synchronizationGET /api/sync/status/{sync_id}- Get sync job statusGET /api/sync/history- Get sync history
POST /api/evaluation/trigger- Trigger evaluation jobGET /api/evaluation/status/{job_id}- Get evaluation statusGET /api/evaluation/results- List evaluation resultsGET /api/evaluation/results/{id}- Get evaluation detailGET /api/evaluation/summary- Get evaluation summary
GET /api/analytics/trends- Get score trendsGET /api/analytics/comparison/retriever- Compare retrieversGET /api/analytics/comparison/embedding- Compare embeddingsGET /api/analytics/comparison/context/{id}- Compare by contextGET /api/analytics/issues- Get issue analytics
- Faithfulness (0-1): Measures how faithful the answer is to the retrieved context
- Answer Relevancy (0-1): Measures how relevant the answer is to the question
- Context Precision (0-1): Measures the quality of retrieved context
retrieval_miss: Retrieved content doesn't match the queryretrieval_irrelevant: Retrieved content is irrelevantanswer_hallucination: Answer contains information not in contextanswer_incomplete: Answer doesn't fully utilize contextanswer_irrelevant: Answer doesn't address the questionknowledge_gap: Knowledge base lacks relevant content
Apache-2.0