.
├── docker-compose.yml
├── Dockerfile
├── .env
├── pyproject.toml
├── uv.lock
├── .pre-commit-config.yaml
├── README.md
├── data/
│ └── qdrant/
├── src/
│ ├── fdds/
│ │ ├── __init__.py
│ │ ├── config.py
│ │ ├── handlers.py
│ │ ├── inference.py
│ │ ├── evaluation.py
│ │ ├── evaluation_pipeline.py
│ │ ├── manage_pdfs.py
│ │ └── reranker.py
│ ├── ui-build/
│ └── chat.py
docker-compose.ymlDefines and manages services like Qdrant (vector DB), the API backend, and Jaeger (for monitoring and tracing).Dockerfile: Builds the backend API service that serves the RAG-based chatbot.pyproject.tomlanduv.lock: Configuration files for uv and project dependencies..pre-commit-config.yaml: Configuration for pre-commit hooks to ensure code quality.data/qdrant: Persistent volume for Qdrant's vector data storage.src/fdds/inference.pycontains methods to process a query and generate responses based on contextual data using RAG.src/fdds/manage_pdfs.pyscript to ingest and delete PDF files from the list of URLs in Qdrant.src/fdds/evaluation.pyEvaluates RAG using a defined pipeline in thesrc/fdds/evaluation_pipeline.pyfile (requires NEPTUNE_API_KEY).src/fdds/config.pyholds configuration settings for the project.src/ui-build: Precompiled frontend UI assets for the chatbot interface.src/chat.py: Contains the coreMyChatclass responsible for managing conversation flow.
- Python 3.11+
- Docker and Docker Compose
- Pre-commit
- uv (https://docs.astral.sh/uv/)
git clone git@github.com:deepsense-ai/fdds-rag.git
cd fdds-raguv python install 3.11
uv syncpre-commit install
Use the automated startup script to launch the system. The script will handle environment configuration and service startup:
./start.sh--help- Show all available options--with-ingest- Include ingestion service to process PDFs--with-ingest-file FILE- Use custom PDF file list for ingestion--detached- Run services in background mode--jaeger- Enable Jaeger tracing--port PORT- Set API port (default: 8000)--host HOST- Set API host (default: 0.0.0.0)--data-path PATH- Set data mount path (default: ./app-data)--env-file FILE- Load environment variables from file--env KEY=VALUE- Set individual environment variables
# Basic startup with ingestion
./start.sh --with-ingest
# Custom port and detached mode
./start.sh --with-ingest --port 9000 --detached
# Use custom PDF list
./start.sh --with-ingest-file my-pdfs.txt
# Load environment from file
./start.sh --env-file .env.prod --with-ingestThe script will:
- Prompt for OpenAI API key if not found in environment
- Generate secure API keys for internal services
- Create the data directory and environment configuration
- Start Docker services (Qdrant, API, and optionally Jaeger/Ingestion)
Note: Ensure Docker is running before executing the script.
If you don't already have a list of PDF URLs to ingest, you can generate one by running the web scraping script:
cd scripts/fdds_scrapper
uv run scrapy crawl get_pdf_linksThis will crawl and extract all PDF file links from the sections specified in the start_urls list, which is defined in: scripts/fdds_scrapper/fdds_scrapper/spiders/pdf_spider.py
The collected links will be saved as: scripts/fdds_scrapper/pdfs.txt.
Note: To customize which sections are scraped, modify the start_urls list in
pdf_spider.py.
To load PDF documents into the Qdrant database, prepare a .txt file containing one PDF URL per line (no delimiters or special characters). If you followed the previous step, this file is already generated. To ingest the documents, run:
uv run src/fdds/manage_pdfs.py --ingest <path_to_txt_file>To delete the corresponding documents from Qdrant, use:
uv run src/fdds/manage_pdfs.py --delete <path_to_txt_file>