RAG Knowledge Base

A retrieval-augmented generation system that lets you upload documents (PDF, Markdown, plain text) and ask questions about them with AI-powered answers grounded in your actual data.

Why This Exists

Most chatbot demos just wrap an LLM API with no retrieval layer. This project implements the full RAG pipeline that enterprises actually use: document ingestion, intelligent chunking, vector embeddings, semantic search, and grounded answer generation with source citations.

Architecture

┌─────────────┐     ┌──────────────────────────────────────────┐
│   Next.js    │     │              FastAPI Backend              │
│   Frontend   │────▶│                                          │
│              │     │  Upload ──▶ Extract ──▶ Chunk ──▶ Embed  │
│  • Chat UI   │     │                              │           │
│  • Streaming │     │                              ▼           │
│  • Upload    │     │  Query ──▶ Embed ──▶ Search ──▶ Generate │
│  • Sources   │     │                    ChromaDB    OpenAI    │
└─────────────┘     └──────────────────────────────────────────┘

Features

Document ingestion — Upload PDF, Markdown, or TXT files. Text is extracted, split into overlapping chunks, and embedded into a vector store.
Semantic search — Queries are embedded and matched against document chunks using cosine similarity via ChromaDB.
Streaming responses — Answers stream token-by-token via Server-Sent Events for real-time UI updates.
Source citations — Every answer includes the source documents and relevance scores so you can verify claims.
Conversation history — The LLM receives recent chat context for follow-up questions.
Document management — View all ingested documents with chunk counts, delete individual sources.

Tech Stack

Component	Technology	Why
Backend API	FastAPI	Async, fast, automatic OpenAPI docs
Vector store	ChromaDB	Local, no external service needed, persistent
Embeddings	OpenAI text-embedding-3-small	High quality, low cost ($0.02/1M tokens)
LLM	GPT-4o-mini	Fast, cheap, good at following RAG instructions
Text splitting	LangChain RecursiveCharacterTextSplitter	Handles code, prose, and mixed content
PDF parsing	PyPDF	Lightweight, no system dependencies
Frontend	Next.js 15 + Tailwind CSS	Modern React with great DX
Containerization	Docker Compose	One command to run everything

Quick Start

Prerequisites

Python 3.11+
Node.js 18+
An OpenAI API key

1. Clone and configure

git clone https://github.com/ctonneslan/rag-knowledge-base.git
cd rag-knowledge-base

# Set up environment
cp backend/.env.example backend/.env
# Edit backend/.env and add your OPENAI_API_KEY

2. Start the backend

cd backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --reload

Backend runs at http://localhost:8000. API docs at http://localhost:8000/docs.

3. Start the frontend

cd frontend
npm install
npm run dev

Frontend runs at http://localhost:3000.

Docker (alternative)

# Make sure backend/.env exists with your API key
docker compose up --build

API Endpoints

Method	Endpoint	Description
GET	`/health`	Health check
POST	`/documents/upload`	Upload a document (multipart form)
GET	`/documents`	List all ingested documents
DELETE	`/documents/{source}`	Delete a document by source name
POST	`/chat`	Ask a question (streaming SSE response)

How the RAG Pipeline Works

Ingestion: Documents are parsed, then split into ~1000-character chunks with 200-character overlap using recursive character splitting. This preserves context across chunk boundaries.
Embedding: Each chunk is embedded using OpenAI's text-embedding-3-small model (1536 dimensions). Embeddings are stored in ChromaDB with source metadata.
Retrieval: When a user asks a question, the query is embedded and the top 5 most similar chunks are retrieved using cosine similarity.
Generation: Retrieved chunks are injected into the LLM prompt as context. The model is instructed to only use provided context and cite sources. Responses stream back via SSE.

Project Structure

rag-knowledge-base/
├── backend/
│   ├── main.py           # FastAPI app, routes, CORS
│   ├── config.py         # Settings from environment
│   ├── ingestion.py      # Text extraction + chunking
│   ├── vectorstore.py    # ChromaDB operations + embeddings
│   ├── rag.py            # RAG pipeline + streaming generation
│   ├── requirements.txt
│   └── Dockerfile
├── frontend/
│   ├── src/
│   │   ├── app/          # Next.js app router pages
│   │   ├── components/   # Chat UI + document sidebar
│   │   └── lib/api.ts    # API client with streaming support
│   └── Dockerfile
├── docker-compose.yml
└── README.md

Configuration

All settings are in backend/.env:

Variable	Default	Description
`OPENAI_API_KEY`	required	Your OpenAI API key
`CHROMA_PERSIST_DIR`	`./chroma_data`	Where ChromaDB stores data
`CHUNK_SIZE`	`1000`	Characters per chunk
`CHUNK_OVERLAP`	`200`	Overlap between chunks

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Knowledge Base

Why This Exists

Architecture

Features

Tech Stack

Quick Start

Prerequisites

1. Clone and configure

2. Start the backend

3. Start the frontend

Docker (alternative)

API Endpoints

How the RAG Pipeline Works

Project Structure

Configuration

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

RAG Knowledge Base

Why This Exists

Architecture

Features

Tech Stack

Quick Start

Prerequisites

1. Clone and configure

2. Start the backend

3. Start the frontend

Docker (alternative)

API Endpoints

How the RAG Pipeline Works

Project Structure

Configuration

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages