docRAG

A document question-answering system that extracts text from PDFs using OCR and enables semantic search through a REST API.

Architecture

FastAPI backend with async processing via Celery
Redis for task queue management
doctr for OCR text extraction
Qdrant vector database for semantic search
Sentence Transformers for generating embeddings (local)
Ollama (local) or OpenRouter (cloud) for LLM inference
Angular 17 frontend with standalone components

How It Works

The system uses two different models for different purposes:

PDF → OCR → Text chunks → Embedding Model → Vectors stored in Qdrant
                                                    ↓
User Query → Embedding Model → Vector search → Top K chunks retrieved
                                                    ↓
                              Retrieved chunks + Query → LLM → Answer

Embedding Model (local, free) - Converts text into vectors for semantic search
LLM Model (local or cloud) - Generates natural language answers from retrieved context

Requirements

Docker & Docker Compose (for backend)
Node.js 18+ (for frontend development)
(Optional) Make for build shortcuts

Project Structure

docRAG/
├── app/                    # Backend Python application
│   ├── api/               # FastAPI routes and main app
│   ├── core/              # Configuration
│   ├── services/          # OCR, embeddings, LLM, vector store
│   └── worker/            # Celery tasks
├── frontend-angular/       # Angular 17 frontend (recommended)
│   ├── src/
│   │   ├── app/
│   │   │   ├── components/  # UI components
│   │   │   ├── services/    # API service
│   │   │   └── models/      # TypeScript interfaces
│   │   └── styles.scss      # Global styles
│   ├── Dockerfile
│   └── nginx.conf
├── frontend/              # Legacy vanilla JS frontend
├── scripts/               # Helper scripts
└── docker-compose.yml

Setup

Backend Configuration

Create a .env file to configure your models.

LLM Configuration

Choose one of the following LLM providers:

Option 1: Ollama (Local - Free)

LLM_PROVIDER=ollama
LLM_MODEL=llama3

After starting Docker, pull the model:

docker-compose exec ollama ollama pull llama3

Available Ollama models: llama3, llama3.2, mistral, codellama, phi3, etc.

Option 2: OpenRouter (Cloud - Paid)

LLM_PROVIDER=openrouter
LLM_MODEL=google/gemini-2.0-flash-001
OPENROUTER_API_KEY=sk-or-v1-your-key-here

Get your API key at https://openrouter.ai/keys

Popular OpenRouter models:

google/gemini-2.0-flash-001 - Fast and cheap
anthropic/claude-3.5-sonnet - High quality
meta-llama/llama-3-70b-instruct - Open source
openai/gpt-4o-mini - Good balance

See all models at https://openrouter.ai/models

Embedding Model Configuration

The embedding model runs locally and is configured via the EMBEDDING_MODEL environment variable. Default is all-MiniLM-L6-v2.

EMBEDDING_MODEL=all-MiniLM-L6-v2

To change the embedding model:

Update your .env file:
```
EMBEDDING_MODEL=all-mpnet-base-v2
```
Restart the services:
```
docker-compose restart api worker
```
Important: If you change the embedding model after uploading documents, you must re-upload them. Different models produce incompatible vectors.

Available embedding models (from Sentence Transformers):

Model	Dimensions	Speed	Quality
`all-MiniLM-L6-v2`	384	Fast	Good (default)
`all-MiniLM-L12-v2`	384	Medium	Better
`all-mpnet-base-v2`	768	Slow	Best
`paraphrase-MiniLM-L6-v2`	384	Fast	Good for paraphrasing

Complete .env Example

# LLM Configuration
LLM_PROVIDER=openrouter          # or "ollama"
LLM_MODEL=google/gemini-2.0-flash-001
OPENROUTER_API_KEY=sk-or-v1-your-key-here

# Embedding Configuration
EMBEDDING_MODEL=all-MiniLM-L6-v2

# Optional: RAG tuning
RAG_TOP_K=5                      # Number of chunks to retrieve
CHUNK_TOKENS=500                 # Size of text chunks
CHUNK_OVERLAP_TOKENS=50          # Overlap between chunks

Running

Backend (Docker)

Start all backend services:

docker-compose up -d --build

This launches:

Redis - task queue (port 6379)
Qdrant - vector database (port 6333)
Ollama - local LLM inference (port 11434) - only used if LLM_PROVIDER=ollama
API - FastAPI backend (port 8000)
Worker - Celery background processing

Frontend (Angular)

Local Development

cd frontend-angular
npm install
npm start

The frontend runs at http://localhost:4200 and connects to the backend at http://localhost:8000.

Production Build

cd frontend-angular
npm run build

Output is in frontend-angular/dist/docrag-frontend/browser/.

Docker (Optional)

Build and run the frontend container:

cd frontend-angular
docker build -t docrag-frontend .
docker run -p 8080:80 docrag-frontend

Deployment

Backend on Render

Create a new Web Service on Render
Connect your repository
Set build command: docker build -t app .
Set start command based on your Dockerfile
Add environment variables from your .env file

The backend is configured to accept CORS requests from any origin (allow_origins=["*"]).

Frontend on Vercel

Import your repository on Vercel
Set the root directory to frontend-angular
Framework preset: Angular
Build command: npm run build
Output directory: dist/docrag-frontend/browser

The frontend is pre-configured to connect to https://docrag-2gvg.onrender.com. To change this, update:

frontend-angular/src/app/services/api.service.ts - DEFAULT_API_URL constant
frontend-angular/src/index.html - default input value

Keep-Alive for Render Free Tier

The Angular frontend includes automatic keep-alive functionality that pings the backend every 14 minutes to prevent Render's free tier from sleeping (which happens after 15 minutes of inactivity). This only works while the frontend is open in a browser.

For reliable uptime without the frontend open, use an external monitoring service like UptimeRobot to ping your backend's /api/v1/health endpoint.

Usage

Web Interface

Open the frontend URL in your browser:

Local: http://localhost:4200
Deployed: Your Vercel URL

Features:

API URL configuration with online/offline status indicator
Drag & drop PDF upload with optional force re-processing
Task status monitoring with auto-refresh
Chat/query interface with citation display
Recent tasks tracking with quick actions

Quick Start Workflow

Upload a PDF
- Drag & drop or click to select a PDF file
- Click "Upload PDF"
- Note the task_id and doc_id returned
Monitor Processing
- The task ID auto-fills for status checking
- Status progresses: PENDING → STARTED → SUCCESS
Query Documents
- Enter your question in the chat section
- Optionally specify a doc_id to search only that document
- View the answer and citations with source references

Command Line

Upload a PDF:

./scripts/load_small_pdf.sh "/path/to/document.pdf"

Query your documents:

./scripts/query_rag.sh "What is the main topic of the document?"

API Reference

All endpoints are prefixed with /api/v1:

Endpoint	Method	Description
`/api/v1/health`	GET	Health check
`/api/v1/upload`	POST	Upload a PDF for processing
`/api/v1/status/{task_id}`	GET	Check processing status
`/api/v1/chat`	POST	Query documents with natural language

Example API Calls

Upload PDF:

curl -X POST "http://localhost:8000/api/v1/upload" \
  -F "file=@document.pdf"

Check Status:

curl "http://localhost:8000/api/v1/status/{task_id}"

Chat Query:

curl -X POST "http://localhost:8000/api/v1/chat" \
  -H "Content-Type: application/json" \
  -d '{"query": "What is this document about?", "doc_id": "optional-doc-id"}'

Development

Backend

Run tests:

docker-compose run api pytest

View logs:

docker-compose logs -f api      # API logs
docker-compose logs -f worker   # Worker logs
docker-compose logs -f ollama   # LLM logs

Frontend

The Angular frontend uses:

Angular 17 with standalone components
Signals for reactive state management
SCSS for styling
Inter font for typography

Key files:

src/app/services/api.service.ts - API communication and state
src/app/components/ - UI components (upload, status, chat, recent-tasks, header)
src/styles.scss - Global styles and CSS variables

Troubleshooting

Model not found error

model 'llama3' not found

Pull the model: docker-compose exec ollama ollama pull llama3

API shows "Offline" in frontend

Check containers are running: docker-compose ps
Check API logs: docker-compose logs api
Verify the API URL is correct in the frontend header
If deployed, ensure CORS is enabled and the backend is awake

CORS errors in browser console

The backend has CORS enabled for all origins. If you still see errors:

Ensure the backend is running and accessible
Check that the URL doesn't have a trailing slash
Verify the backend responded (might be sleeping on Render free tier)

OCR processing fails

Check worker logs: docker-compose logs -f worker

Status stays "PENDING"

Check worker is running: docker-compose ps
Check worker logs: docker-compose logs worker
Verify Redis and Qdrant are healthy

Connection errors

Verify Qdrant is running: curl http://localhost:6333/collections

No results when querying

Ensure document processing completed (status: SUCCESS)
Check if embeddings were generated in the worker logs
Try a more specific question

Changed embedding model but search doesn't work

If you change EMBEDDING_MODEL after uploading documents, the old vectors are incompatible. Re-upload your documents to regenerate embeddings with the new model.

Frontend build fails

cd frontend-angular
rm -rf node_modules package-lock.json
npm install
npm run build

Render backend sleeping

Render free tier sleeps after 15 minutes of inactivity. The first request after sleeping may take 30-60 seconds. Keep the frontend open to maintain keep-alive pings, or use an external monitoring service.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.claude		.claude
app		app
frontend-angular		frontend-angular
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

AnshulPatil2005/docRAG

Folders and files

Latest commit

History

Repository files navigation