This documentation provides a comprehensive overview and deployment guide for your FastAPI-based Human Rights Question Answering system. The application leverages modern AI, document embedding, cloud storage (e.g: AWS S3 storage, and GCS), and advanced orchestration to deliver intelligent responses to human rights-related queries.
This project is a RESTful web service powered by FastAPI. It provides a /synthesize/ endpoint that receives a user question, retrieves relevant documents from a vector database, and synthesizes a detailed answer using a large language model via the Together API. It features:
- Semantic search over a document database (Chroma DB + HuggingFace Embeddings)
- Streaming responses from a powerful LLM backend (Qwen/Qwen2.5-7B-Instruct-Turbo)
- Cloud storage integration for document persistence
- Secure, configurable deployment via Docker
| Filename | Purpose |
|---|---|
main.py |
Main FastAPI app; API endpoint logic, embeddings, retrieval, LLM invocation, cloud storage integration. |
test_request.py |
Simple script to test the /synthesize/ API endpoint. |
Dockerfile |
Docker setup for containerized deployment. |
requirements.txt |
List of Python dependencies needed for the app. |
__init__.py |
Marks the directory as a package (empty in this case). |
This is the core of your application. It orchestrates document download, semantic search, LLM-driven answer synthesis, and exposes the REST API endpoint.
- FastAPI REST API: One main endpoint at
/synthesize/. - Semantic Document Search: Uses HuggingFace embeddings and Chroma for finding relevant documents to the query.
- LLM Integration: Calls the Together API to generate context-rich answers.
- cloud storage Support: Downloads Chroma DB files from used cloud storage.
- CORS setup: Allows safe cross-origin requests for front-end clients.
- Streaming: Streams model tokens during generation for responsiveness.
flowchart TD
User[User sends POST /synthesize/] -->|question| API[FastAPI Endpoint]
API -->|embeddings| Retriever[Chroma Retriever]
Retriever -->|docs| DocList[Relevant Documents]
DocList -->|context| LLM[Together LLM API]
LLM -->|response| API
API -->|JSON answer| User
Processes a user's question, retrieves relevant documents, and synthesizes a detailed answer.
{
"title": "Synthesize Human Rights Answer",
"description": "Given a human rights question, retrieves relevant documents and generates a comprehensive answer using LLM.",
"method": "POST",
"baseUrl": "http://127.0.0.1:8000",
"endpoint": "/synthesize/",
"headers": [
{
"key": "Content-Type",
"value": "application/json",
"required": true
}
],
"queryParams": [],
"pathParams": [],
"bodyType": "json",
"requestBody": "{\n \"question\": \"What color is the sky at night if the moon were made of cheese?\"\n}",
"formData": [],
"responses": {
"200": {
"description": "Success - Returns generated human rights answer.",
"body": "{\n \"response\": \"<Synthesized answer with references to relevant documents>\"\n}"
},
"500": {
"description": "Server Error",
"body": "{\n \"detail\": \"Error processing request: <error details>\"\n}"
}
}
}
- If the question is a standard greeting or FAQ: Returns a canned, witty, or informative response.
- Else:
- Retrieves documents semantically similar to the user's question.
- Cleans and formats those documents.
- Calls the Together LLM API with a system prompt and user question plus documents as context.
- Streams and collates the generated answer.
- Returns the synthesized response, or a relevant error.
The application relies on several environment variables:
| Variable Name | Description | Example Value |
|---|---|---|
TOGETHER_API_KEY |
API key for Together API | your-together-key |
CLOUD_STORAGE_API_KEY |
key for cloud storage, (vector db) access | AKIA... |
You can set these in a .env file for development, or export them in your shell environment.
- Downloads the Chroma database from cloud storage if not present.
- Uses HuggingFace sentence-transformer embeddings for semantic search.
- Defines a Pydantic model for input validation.
- Automatically handles CORS for listed origins.
- Retrieves top-K relevant documents using semantic similarity.
- Sends both the user question and relevant document snippets to the Together API.
- Uses a detailed system prompt to instruct the LLM to generate high-quality, sourced answers.
- Handles missing files, S3 connectivity, and runtime exceptions gracefully.
"You are an AI assistant specializing in human rights. ... Generate a comprehensive answer that addresses the question clearly and thoroughly. Include: - A concise summary of the relevant documents. - A clear and structured explanation, highlighting main points and connections. - References or citations from the provided documents to back up your answer..."
download_chroma_from_cloud(): Handles downloading all files from storage.ensure_chroma_exists(): Ensures the Chroma DB is available before starting.create_retriever(): Configures document retriever with specified relevance thresholds.synthesize_response(): Main FastAPI endpoint handler.
classDiagram
class Query {
str question
}
class Chroma
class HuggingFaceEmbeddings
class Together
FastAPI "1" -- "1" Chroma : uses
FastAPI "1" -- "1" HuggingFaceEmbeddings : uses
FastAPI "1" -- "1" Together : uses
FastAPI "1" -- "*" Query : accepts
A utility script for quickly testing your /synthesize/ API.
- Sends a POST request to the local FastAPI server.
- Prints the JSON response or error details.
import requests
url = "http://127.0.0.1:8000/synthesize/"
payload = {
"question": "What color is the sky at night if the moon were made of cheese?"
}
try:
response = requests.post(url, json=payload)
if response.status_code == 200:
print("Response:", response.json())
else:
print(f"Error: {response.status_code}, Details: {response.text}")
except Exception as e:
print(f"An error occurred: {e}")How to Use:
- Make sure the FastAPI server is running.
- Run the script:
python test_request.py
Describes how to build and run your application in a containerized environment.
- FROM python:3.9-slim: Uses a lightweight Python image.
- WORKDIR /app: Sets the work directory.
- COPY requirements.txt .: Copies requirements.
- RUN pip install ...: Installs all dependencies.
- COPY . .: Copies all app source code.
- EXPOSE 8000: Exposes port for FastAPI.
- CMD: Starts the FastAPI server with Uvicorn.
docker build -t human-rights-qa .
docker run --env-file .env -p 8000:8000 human-rights-qaLists all libraries needed for this project:
- fastapi: Web framework.
- uvicorn: ASGI server for FastAPI.
- langchain, langchain-community, langchain-chroma, langchain-huggingface: Document search and vector DB.
- sentence-transformers, huggingface-hub: Embedding models.
- requests: HTTP client for testing.
- together: LLM API client.
- pyngrok, nest-asyncio: Deployment helpers.
- python-dotenv: Environment variable management.
- boto3: AWS S3 access.
- tf-keras: Neural network support, needed for embedding models.
Install with:
pip install -r requirements.txtMarks the directory as a package. The file is empty but necessary for Python package recognition.
Follow these steps to deploy the Human Rights QA API:
- Docker installed OR Python 3.9+ and pip.
- AWS S3 credentials with access to your Chroma DB bucket.
- Together API key.
git clone <your-repo-url>
cd <project-folder>Create a .env file in the project root:
TOGETHER_API_KEY=your-together-api-key
AWS_ACCESS_KEY_ID=your-aws-access-key
AWS_SECRET_ACCESS_KEY=your-aws-secret-key
AWS_REGION=eu-north-1
S3_BUCKET_NAME=alaasbucketOr export variables in your shell.
docker build -t human-rights-qa .
docker run --env-file .env -p 8000:8000 human-rights-qapip install -r requirements.txt
python main.py- Use
test_request.pyor a tool like Postman to send POST requests tohttp://127.0.0.1:8000/synthesize/.
Use Google Cloud Storage instead of AWS S3. This section provides steps and examples.
- A GCP project with billing enabled.
- A service account with Storage Object Viewer role.
- A JSON key for that service account.
Install the Google Cloud Storage client. Add it to your environment.
pip install google-cloud-storageDefine credentials and bucket configuration. Use these variables in your setup.
GOOGLE_APPLICATION_CREDENTIALS=/secrets/gcp-key.json
GCS_BUCKET_NAME=your-gcs-bucket
GCS_FOLDER_PATH=chroma/
DOWNLOAD_DIRECTORY=chroma_local/This snippet mirrors the S3 logic. It downloads only missing or incomplete files.
import os
from google.cloud import storage
gcs_bucket_name = os.getenv("GCS_BUCKET_NAME")
gcs_folder_path = os.getenv("GCS_FOLDER_PATH", "chroma/")
download_directory = os.getenv("DOWNLOAD_DIRECTORY", "chroma_local/")
os.makedirs(download_directory, exist_ok=True)
def download_chroma_from_gcs():
"""Downloads files from GCS prefix to a local directory."""
try:
client = storage.Client()
bucket = client.bucket(gcs_bucket_name)
blobs = client.list_blobs(gcs_bucket_name, prefix=gcs_folder_path)
found = False
for blob in blobs:
found = True
if blob.name.endswith("/"):
continue
local_path = os.path.join(
download_directory,
os.path.relpath(blob.name, gcs_folder_path)
)
os.makedirs(os.path.dirname(local_path), exist_ok=True)
if not os.path.exists(local_path) or os.path.getsize(local_path) != blob.size:
print(f"Downloading {blob.name} to {local_path}...")
blob.download_to_filename(local_path)
if os.path.exists(local_path) and os.path.getsize(local_path) == blob.size:
print(f"Successfully downloaded {blob.name}.")
else:
print(f"Failed to download {blob.name}.")
else:
print(f"File {local_path} is up-to-date. Skipping.")
if not found:
print("No files found under the specified GCS prefix.")
except Exception as e:
print(f"Error while downloading from GCS: {e}")Mount the service account key into the container. Pass required environment variables.
docker run \
-p 8000:8000 \
-e TOGETHER_API_KEY=$TOGETHER_API_KEY \
-e GCS_BUCKET_NAME=your-gcs-bucket \
-e GCS_FOLDER_PATH=chroma/ \
-e DOWNLOAD_DIRECTORY=chroma_local/ \
-e GOOGLE_APPLICATION_CREDENTIALS=/secrets/gcp-key.json \
-v $(pwd)/gcp-key.json:/secrets/gcp-key.json:ro \
human-rights-qa- You can store the Chroma DB locally. Skip any cloud downloads.
- For production on GCP, prefer Workload Identity over JSON keys.
- You may use gcsfuse to mount buckets. It simplifies file access.
- If you see
FileNotFoundErrorfor Chroma DB, check your S3 credentials and bucket. - For CORS errors, ensure your frontend origin is listed in the
originsarray inmain.py. - If you run out of GPU/CPU memory, try reducing document chunk size or LLM parameters.
- Logs will print S3 download status and error details to the console.
This project provides a robust, scalable template for question-answering systems over document corpora using modern AI technologies. It is production-ready and can be adapted for various knowledge domains with minimal changes.
Happy hacking! 🚀