A document question-answering system that extracts text from PDFs using OCR and enables semantic search through a REST API.
- FastAPI backend with async processing via Celery
- Redis for task queue management
- doctr for OCR text extraction
- Qdrant vector database for semantic search
- Sentence Transformers for generating embeddings (local)
- Ollama (local) or OpenRouter (cloud) for LLM inference
- Angular 17 frontend with standalone components
The system uses two different models for different purposes:
PDF → OCR → Text chunks → Embedding Model → Vectors stored in Qdrant
↓
User Query → Embedding Model → Vector search → Top K chunks retrieved
↓
Retrieved chunks + Query → LLM → Answer
- Embedding Model (local, free) - Converts text into vectors for semantic search
- LLM Model (local or cloud) - Generates natural language answers from retrieved context
- Docker & Docker Compose (for backend)
- Node.js 18+ (for frontend development)
- (Optional) Make for build shortcuts
docRAG/
├── app/ # Backend Python application
│ ├── api/ # FastAPI routes and main app
│ ├── core/ # Configuration
│ ├── services/ # OCR, embeddings, LLM, vector store
│ └── worker/ # Celery tasks
├── frontend-angular/ # Angular 17 frontend (recommended)
│ ├── src/
│ │ ├── app/
│ │ │ ├── components/ # UI components
│ │ │ ├── services/ # API service
│ │ │ └── models/ # TypeScript interfaces
│ │ └── styles.scss # Global styles
│ ├── Dockerfile
│ └── nginx.conf
├── frontend/ # Legacy vanilla JS frontend
├── scripts/ # Helper scripts
└── docker-compose.yml
Create a .env file to configure your models.
Choose one of the following LLM providers:
Option 1: Ollama (Local - Free)
LLM_PROVIDER=ollama
LLM_MODEL=llama3After starting Docker, pull the model:
docker-compose exec ollama ollama pull llama3Available Ollama models: llama3, llama3.2, mistral, codellama, phi3, etc.
Option 2: OpenRouter (Cloud - Paid)
LLM_PROVIDER=openrouter
LLM_MODEL=google/gemini-2.0-flash-001
OPENROUTER_API_KEY=sk-or-v1-your-key-hereGet your API key at https://openrouter.ai/keys
Popular OpenRouter models:
google/gemini-2.0-flash-001- Fast and cheapanthropic/claude-3.5-sonnet- High qualitymeta-llama/llama-3-70b-instruct- Open sourceopenai/gpt-4o-mini- Good balance
See all models at https://openrouter.ai/models
The embedding model runs locally and is configured via the EMBEDDING_MODEL environment variable. Default is all-MiniLM-L6-v2.
EMBEDDING_MODEL=all-MiniLM-L6-v2To change the embedding model:
-
Update your
.envfile:EMBEDDING_MODEL=all-mpnet-base-v2
-
Restart the services:
docker-compose restart api worker
-
Important: If you change the embedding model after uploading documents, you must re-upload them. Different models produce incompatible vectors.
Available embedding models (from Sentence Transformers):
| Model | Dimensions | Speed | Quality |
|---|---|---|---|
all-MiniLM-L6-v2 |
384 | Fast | Good (default) |
all-MiniLM-L12-v2 |
384 | Medium | Better |
all-mpnet-base-v2 |
768 | Slow | Best |
paraphrase-MiniLM-L6-v2 |
384 | Fast | Good for paraphrasing |
# LLM Configuration
LLM_PROVIDER=openrouter # or "ollama"
LLM_MODEL=google/gemini-2.0-flash-001
OPENROUTER_API_KEY=sk-or-v1-your-key-here
# Embedding Configuration
EMBEDDING_MODEL=all-MiniLM-L6-v2
# Optional: RAG tuning
RAG_TOP_K=5 # Number of chunks to retrieve
CHUNK_TOKENS=500 # Size of text chunks
CHUNK_OVERLAP_TOKENS=50 # Overlap between chunksStart all backend services:
docker-compose up -d --buildThis launches:
- Redis - task queue (port 6379)
- Qdrant - vector database (port 6333)
- Ollama - local LLM inference (port 11434) - only used if
LLM_PROVIDER=ollama - API - FastAPI backend (port 8000)
- Worker - Celery background processing
cd frontend-angular
npm install
npm startThe frontend runs at http://localhost:4200 and connects to the backend at http://localhost:8000.
cd frontend-angular
npm run buildOutput is in frontend-angular/dist/docrag-frontend/browser/.
Build and run the frontend container:
cd frontend-angular
docker build -t docrag-frontend .
docker run -p 8080:80 docrag-frontend- Create a new Web Service on Render
- Connect your repository
- Set build command:
docker build -t app . - Set start command based on your Dockerfile
- Add environment variables from your
.envfile
The backend is configured to accept CORS requests from any origin (allow_origins=["*"]).
- Import your repository on Vercel
- Set the root directory to
frontend-angular - Framework preset: Angular
- Build command:
npm run build - Output directory:
dist/docrag-frontend/browser
The frontend is pre-configured to connect to https://docrag-2gvg.onrender.com. To change this, update:
frontend-angular/src/app/services/api.service.ts-DEFAULT_API_URLconstantfrontend-angular/src/index.html- default input value
The Angular frontend includes automatic keep-alive functionality that pings the backend every 14 minutes to prevent Render's free tier from sleeping (which happens after 15 minutes of inactivity). This only works while the frontend is open in a browser.
For reliable uptime without the frontend open, use an external monitoring service like UptimeRobot to ping your backend's /api/v1/health endpoint.
Open the frontend URL in your browser:
- Local: http://localhost:4200
- Deployed: Your Vercel URL
Features:
- API URL configuration with online/offline status indicator
- Drag & drop PDF upload with optional force re-processing
- Task status monitoring with auto-refresh
- Chat/query interface with citation display
- Recent tasks tracking with quick actions
-
Upload a PDF
- Drag & drop or click to select a PDF file
- Click "Upload PDF"
- Note the
task_idanddoc_idreturned
-
Monitor Processing
- The task ID auto-fills for status checking
- Status progresses: PENDING → STARTED → SUCCESS
-
Query Documents
- Enter your question in the chat section
- Optionally specify a
doc_idto search only that document - View the answer and citations with source references
Upload a PDF:
./scripts/load_small_pdf.sh "/path/to/document.pdf"Query your documents:
./scripts/query_rag.sh "What is the main topic of the document?"All endpoints are prefixed with /api/v1:
| Endpoint | Method | Description |
|---|---|---|
/api/v1/health |
GET | Health check |
/api/v1/upload |
POST | Upload a PDF for processing |
/api/v1/status/{task_id} |
GET | Check processing status |
/api/v1/chat |
POST | Query documents with natural language |
Upload PDF:
curl -X POST "http://localhost:8000/api/v1/upload" \
-F "file=@document.pdf"Check Status:
curl "http://localhost:8000/api/v1/status/{task_id}"Chat Query:
curl -X POST "http://localhost:8000/api/v1/chat" \
-H "Content-Type: application/json" \
-d '{"query": "What is this document about?", "doc_id": "optional-doc-id"}'Run tests:
docker-compose run api pytestView logs:
docker-compose logs -f api # API logs
docker-compose logs -f worker # Worker logs
docker-compose logs -f ollama # LLM logsThe Angular frontend uses:
- Angular 17 with standalone components
- Signals for reactive state management
- SCSS for styling
- Inter font for typography
Key files:
src/app/services/api.service.ts- API communication and statesrc/app/components/- UI components (upload, status, chat, recent-tasks, header)src/styles.scss- Global styles and CSS variables
model 'llama3' not found
Pull the model: docker-compose exec ollama ollama pull llama3
- Check containers are running:
docker-compose ps - Check API logs:
docker-compose logs api - Verify the API URL is correct in the frontend header
- If deployed, ensure CORS is enabled and the backend is awake
The backend has CORS enabled for all origins. If you still see errors:
- Ensure the backend is running and accessible
- Check that the URL doesn't have a trailing slash
- Verify the backend responded (might be sleeping on Render free tier)
Check worker logs: docker-compose logs -f worker
- Check worker is running:
docker-compose ps - Check worker logs:
docker-compose logs worker - Verify Redis and Qdrant are healthy
Verify Qdrant is running: curl http://localhost:6333/collections
- Ensure document processing completed (status: SUCCESS)
- Check if embeddings were generated in the worker logs
- Try a more specific question
If you change EMBEDDING_MODEL after uploading documents, the old vectors are incompatible. Re-upload your documents to regenerate embeddings with the new model.
cd frontend-angular
rm -rf node_modules package-lock.json
npm install
npm run buildRender free tier sleeps after 15 minutes of inactivity. The first request after sleeping may take 30-60 seconds. Keep the frontend open to maintain keep-alive pings, or use an external monitoring service.