Skip to content

Latest commit

 

History

History
433 lines (310 loc) · 13.5 KB

File metadata and controls

433 lines (310 loc) · 13.5 KB

Project Documentation

This documentation provides a comprehensive overview and deployment guide for your FastAPI-based Human Rights Question Answering system. The application leverages modern AI, document embedding, cloud storage (e.g: AWS S3 storage, and GCS), and advanced orchestration to deliver intelligent responses to human rights-related queries.


🎯 Overview

This project is a RESTful web service powered by FastAPI. It provides a /synthesize/ endpoint that receives a user question, retrieves relevant documents from a vector database, and synthesizes a detailed answer using a large language model via the Together API. It features:

  • Semantic search over a document database (Chroma DB + HuggingFace Embeddings)
  • Streaming responses from a powerful LLM backend (Qwen/Qwen2.5-7B-Instruct-Turbo)
  • Cloud storage integration for document persistence
  • Secure, configurable deployment via Docker

🗂️ File Structure Overview

Filename Purpose
main.py Main FastAPI app; API endpoint logic, embeddings, retrieval, LLM invocation, cloud storage integration.
test_request.py Simple script to test the /synthesize/ API endpoint.
Dockerfile Docker setup for containerized deployment.
requirements.txt List of Python dependencies needed for the app.
__init__.py Marks the directory as a package (empty in this case).

🏗️ main.py

This is the core of your application. It orchestrates document download, semantic search, LLM-driven answer synthesis, and exposes the REST API endpoint.

Key Features

  • FastAPI REST API: One main endpoint at /synthesize/.
  • Semantic Document Search: Uses HuggingFace embeddings and Chroma for finding relevant documents to the query.
  • LLM Integration: Calls the Together API to generate context-rich answers.
  • cloud storage Support: Downloads Chroma DB files from used cloud storage.
  • CORS setup: Allows safe cross-origin requests for front-end clients.
  • Streaming: Streams model tokens during generation for responsiveness.

High-Level Data Flow

flowchart TD
    User[User sends POST /synthesize/] -->|question| API[FastAPI Endpoint]
    API -->|embeddings| Retriever[Chroma Retriever]
    Retriever -->|docs| DocList[Relevant Documents]
    DocList -->|context| LLM[Together LLM API]
    LLM -->|response| API
    API -->|JSON answer| User
Loading

API Endpoint

1. /synthesize/ (POST)

Processes a user's question, retrieves relevant documents, and synthesizes a detailed answer.

{
    "title": "Synthesize Human Rights Answer",
    "description": "Given a human rights question, retrieves relevant documents and generates a comprehensive answer using LLM.",
    "method": "POST",
    "baseUrl": "http://127.0.0.1:8000",
    "endpoint": "/synthesize/",
    "headers": [
        {
            "key": "Content-Type",
            "value": "application/json",
            "required": true
        }
    ],
    "queryParams": [],
    "pathParams": [],
    "bodyType": "json",
    "requestBody": "{\n  \"question\": \"What color is the sky at night if the moon were made of cheese?\"\n}",
    "formData": [],
    "responses": {
        "200": {
            "description": "Success - Returns generated human rights answer.",
            "body": "{\n  \"response\": \"<Synthesized answer with references to relevant documents>\"\n}"
        },
        "500": {
            "description": "Server Error",
            "body": "{\n  \"detail\": \"Error processing request: <error details>\"\n}"
        }
    }
}

Endpoint Logic

  • If the question is a standard greeting or FAQ: Returns a canned, witty, or informative response.
  • Else:
    1. Retrieves documents semantically similar to the user's question.
    2. Cleans and formats those documents.
    3. Calls the Together LLM API with a system prompt and user question plus documents as context.
    4. Streams and collates the generated answer.
    5. Returns the synthesized response, or a relevant error.

Environment Variables & Configuration

The application relies on several environment variables:

Variable Name Description Example Value
TOGETHER_API_KEY API key for Together API your-together-key
CLOUD_STORAGE_API_KEY key for cloud storage, (vector db) access AKIA...

You can set these in a .env file for development, or export them in your shell environment.


Main Components

1. Chroma DB Initialization

  • Downloads the Chroma database from cloud storage if not present.
  • Uses HuggingFace sentence-transformer embeddings for semantic search.

2. Query Handling

  • Defines a Pydantic model for input validation.
  • Automatically handles CORS for listed origins.

3. Retriever

  • Retrieves top-K relevant documents using semantic similarity.

4. LLM Synthesis

  • Sends both the user question and relevant document snippets to the Together API.
  • Uses a detailed system prompt to instruct the LLM to generate high-quality, sourced answers.

5. Error Handling

  • Handles missing files, S3 connectivity, and runtime exceptions gracefully.

Example System Prompt (LLM)

"You are an AI assistant specializing in human rights. ... Generate a comprehensive answer that addresses the question clearly and thoroughly. Include: - A concise summary of the relevant documents. - A clear and structured explanation, highlighting main points and connections. - References or citations from the provided documents to back up your answer..."


Key Functions

  • download_chroma_from_cloud(): Handles downloading all files from storage.
  • ensure_chroma_exists(): Ensures the Chroma DB is available before starting.
  • create_retriever(): Configures document retriever with specified relevance thresholds.
  • synthesize_response(): Main FastAPI endpoint handler.

Class Diagram

classDiagram
    class Query {
        str question
    }
    class Chroma
    class HuggingFaceEmbeddings
    class Together
    FastAPI "1" -- "1" Chroma : uses
    FastAPI "1" -- "1" HuggingFaceEmbeddings : uses
    FastAPI "1" -- "1" Together : uses
    FastAPI "1" -- "*" Query : accepts
Loading

🧪 test_request.py

A utility script for quickly testing your /synthesize/ API.

What It Does

  • Sends a POST request to the local FastAPI server.
  • Prints the JSON response or error details.

Example Usage

import requests

url = "http://127.0.0.1:8000/synthesize/"
payload = {
    "question": "What color is the sky at night if the moon were made of cheese?"
}

try:
    response = requests.post(url, json=payload)
    if response.status_code == 200:
        print("Response:", response.json())
    else:
        print(f"Error: {response.status_code}, Details: {response.text}")
except Exception as e:
    print(f"An error occurred: {e}")

How to Use:

  • Make sure the FastAPI server is running.
  • Run the script: python test_request.py

🐳 Dockerfile

Describes how to build and run your application in a containerized environment.

Key Steps

  • FROM python:3.9-slim: Uses a lightweight Python image.
  • WORKDIR /app: Sets the work directory.
  • COPY requirements.txt .: Copies requirements.
  • RUN pip install ...: Installs all dependencies.
  • COPY . .: Copies all app source code.
  • EXPOSE 8000: Exposes port for FastAPI.
  • CMD: Starts the FastAPI server with Uvicorn.

Example Docker Build & Run

docker build -t human-rights-qa .
docker run --env-file .env -p 8000:8000 human-rights-qa

📦 requirements.txt

Lists all libraries needed for this project:

  • fastapi: Web framework.
  • uvicorn: ASGI server for FastAPI.
  • langchain, langchain-community, langchain-chroma, langchain-huggingface: Document search and vector DB.
  • sentence-transformers, huggingface-hub: Embedding models.
  • requests: HTTP client for testing.
  • together: LLM API client.
  • pyngrok, nest-asyncio: Deployment helpers.
  • python-dotenv: Environment variable management.
  • boto3: AWS S3 access.
  • tf-keras: Neural network support, needed for embedding models.

Install with:

pip install -r requirements.txt

📁 init.py

Marks the directory as a package. The file is empty but necessary for Python package recognition.


🚀 Deployment Instructions

Follow these steps to deploy the Human Rights QA API:

1. Prerequisites

  • Docker installed OR Python 3.9+ and pip.
  • AWS S3 credentials with access to your Chroma DB bucket.
  • Together API key.

2. Clone the Repository

git clone <your-repo-url>
cd <project-folder>

3. Configure Environment

Create a .env file in the project root:

TOGETHER_API_KEY=your-together-api-key
AWS_ACCESS_KEY_ID=your-aws-access-key
AWS_SECRET_ACCESS_KEY=your-aws-secret-key
AWS_REGION=eu-north-1
S3_BUCKET_NAME=alaasbucket

Or export variables in your shell.

4. Build and Run (Docker)

docker build -t human-rights-qa .
docker run --env-file .env -p 8000:8000 human-rights-qa

5. Build and Run (Local Python)

pip install -r requirements.txt
python main.py

6. Test API

  • Use test_request.py or a tool like Postman to send POST requests to http://127.0.0.1:8000/synthesize/.

☁️ Google Cloud Storage Alternative

Use Google Cloud Storage instead of AWS S3. This section provides steps and examples.

Prerequisites for GCS

  • A GCP project with billing enabled.
  • A service account with Storage Object Viewer role.
  • A JSON key for that service account.

Install GCS Dependency

Install the Google Cloud Storage client. Add it to your environment.

pip install google-cloud-storage

Environment Variables for GCS

Define credentials and bucket configuration. Use these variables in your setup.

GOOGLE_APPLICATION_CREDENTIALS=/secrets/gcp-key.json
GCS_BUCKET_NAME=your-gcs-bucket
GCS_FOLDER_PATH=chroma/
DOWNLOAD_DIRECTORY=chroma_local/

Example: Download Chroma DB from GCS

This snippet mirrors the S3 logic. It downloads only missing or incomplete files.

import os
from google.cloud import storage

gcs_bucket_name = os.getenv("GCS_BUCKET_NAME")
gcs_folder_path = os.getenv("GCS_FOLDER_PATH", "chroma/")
download_directory = os.getenv("DOWNLOAD_DIRECTORY", "chroma_local/")

os.makedirs(download_directory, exist_ok=True)

def download_chroma_from_gcs():
    """Downloads files from GCS prefix to a local directory."""
    try:
        client = storage.Client()
        bucket = client.bucket(gcs_bucket_name)
        blobs = client.list_blobs(gcs_bucket_name, prefix=gcs_folder_path)

        found = False
        for blob in blobs:
            found = True
            if blob.name.endswith("/"):
                continue

            local_path = os.path.join(
                download_directory,
                os.path.relpath(blob.name, gcs_folder_path)
            )
            os.makedirs(os.path.dirname(local_path), exist_ok=True)

            if not os.path.exists(local_path) or os.path.getsize(local_path) != blob.size:
                print(f"Downloading {blob.name} to {local_path}...")
                blob.download_to_filename(local_path)
                if os.path.exists(local_path) and os.path.getsize(local_path) == blob.size:
                    print(f"Successfully downloaded {blob.name}.")
                else:
                    print(f"Failed to download {blob.name}.")
            else:
                print(f"File {local_path} is up-to-date. Skipping.")
        if not found:
            print("No files found under the specified GCS prefix.")
    except Exception as e:
        print(f"Error while downloading from GCS: {e}")

Docker Run with GCS Credentials

Mount the service account key into the container. Pass required environment variables.

docker run \
  -p 8000:8000 \
  -e TOGETHER_API_KEY=$TOGETHER_API_KEY \
  -e GCS_BUCKET_NAME=your-gcs-bucket \
  -e GCS_FOLDER_PATH=chroma/ \
  -e DOWNLOAD_DIRECTORY=chroma_local/ \
  -e GOOGLE_APPLICATION_CREDENTIALS=/secrets/gcp-key.json \
  -v $(pwd)/gcp-key.json:/secrets/gcp-key.json:ro \
  human-rights-qa

Notes and Alternatives

  • You can store the Chroma DB locally. Skip any cloud downloads.
  • For production on GCP, prefer Workload Identity over JSON keys.
  • You may use gcsfuse to mount buckets. It simplifies file access.

🛠️ Troubleshooting & Tips

  • If you see FileNotFoundError for Chroma DB, check your S3 credentials and bucket.
  • For CORS errors, ensure your frontend origin is listed in the origins array in main.py.
  • If you run out of GPU/CPU memory, try reducing document chunk size or LLM parameters.
  • Logs will print S3 download status and error details to the console.

⭐️ Summary

This project provides a robust, scalable template for question-answering systems over document corpora using modern AI technologies. It is production-ready and can be adapted for various knowledge domains with minimal changes.


Happy hacking! 🚀