A retrieval-augmented generation system that lets you upload documents (PDF, Markdown, plain text) and ask questions about them with AI-powered answers grounded in your actual data.
Most chatbot demos just wrap an LLM API with no retrieval layer. This project implements the full RAG pipeline that enterprises actually use: document ingestion, intelligent chunking, vector embeddings, semantic search, and grounded answer generation with source citations.
┌─────────────┐ ┌──────────────────────────────────────────┐
│ Next.js │ │ FastAPI Backend │
│ Frontend │────▶│ │
│ │ │ Upload ──▶ Extract ──▶ Chunk ──▶ Embed │
│ • Chat UI │ │ │ │
│ • Streaming │ │ ▼ │
│ • Upload │ │ Query ──▶ Embed ──▶ Search ──▶ Generate │
│ • Sources │ │ ChromaDB OpenAI │
└─────────────┘ └──────────────────────────────────────────┘
- Document ingestion — Upload PDF, Markdown, or TXT files. Text is extracted, split into overlapping chunks, and embedded into a vector store.
- Semantic search — Queries are embedded and matched against document chunks using cosine similarity via ChromaDB.
- Streaming responses — Answers stream token-by-token via Server-Sent Events for real-time UI updates.
- Source citations — Every answer includes the source documents and relevance scores so you can verify claims.
- Conversation history — The LLM receives recent chat context for follow-up questions.
- Document management — View all ingested documents with chunk counts, delete individual sources.
| Component | Technology | Why |
|---|---|---|
| Backend API | FastAPI | Async, fast, automatic OpenAPI docs |
| Vector store | ChromaDB | Local, no external service needed, persistent |
| Embeddings | OpenAI text-embedding-3-small | High quality, low cost ($0.02/1M tokens) |
| LLM | GPT-4o-mini | Fast, cheap, good at following RAG instructions |
| Text splitting | LangChain RecursiveCharacterTextSplitter | Handles code, prose, and mixed content |
| PDF parsing | PyPDF | Lightweight, no system dependencies |
| Frontend | Next.js 15 + Tailwind CSS | Modern React with great DX |
| Containerization | Docker Compose | One command to run everything |
- Python 3.11+
- Node.js 18+
- An OpenAI API key
git clone https://github.com/ctonneslan/rag-knowledge-base.git
cd rag-knowledge-base
# Set up environment
cp backend/.env.example backend/.env
# Edit backend/.env and add your OPENAI_API_KEYcd backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --reloadBackend runs at http://localhost:8000. API docs at http://localhost:8000/docs.
cd frontend
npm install
npm run devFrontend runs at http://localhost:3000.
# Make sure backend/.env exists with your API key
docker compose up --build| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Health check |
| POST | /documents/upload |
Upload a document (multipart form) |
| GET | /documents |
List all ingested documents |
| DELETE | /documents/{source} |
Delete a document by source name |
| POST | /chat |
Ask a question (streaming SSE response) |
-
Ingestion: Documents are parsed, then split into ~1000-character chunks with 200-character overlap using recursive character splitting. This preserves context across chunk boundaries.
-
Embedding: Each chunk is embedded using OpenAI's
text-embedding-3-smallmodel (1536 dimensions). Embeddings are stored in ChromaDB with source metadata. -
Retrieval: When a user asks a question, the query is embedded and the top 5 most similar chunks are retrieved using cosine similarity.
-
Generation: Retrieved chunks are injected into the LLM prompt as context. The model is instructed to only use provided context and cite sources. Responses stream back via SSE.
rag-knowledge-base/
├── backend/
│ ├── main.py # FastAPI app, routes, CORS
│ ├── config.py # Settings from environment
│ ├── ingestion.py # Text extraction + chunking
│ ├── vectorstore.py # ChromaDB operations + embeddings
│ ├── rag.py # RAG pipeline + streaming generation
│ ├── requirements.txt
│ └── Dockerfile
├── frontend/
│ ├── src/
│ │ ├── app/ # Next.js app router pages
│ │ ├── components/ # Chat UI + document sidebar
│ │ └── lib/api.ts # API client with streaming support
│ └── Dockerfile
├── docker-compose.yml
└── README.md
All settings are in backend/.env:
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY |
required | Your OpenAI API key |
CHROMA_PERSIST_DIR |
./chroma_data |
Where ChromaDB stores data |
CHUNK_SIZE |
1000 |
Characters per chunk |
CHUNK_OVERLAP |
200 |
Overlap between chunks |
MIT