Skip to content

SpotDocAI is a full-stack proof-of-concept platform for scalable document ingestion, AI-based semantic search, and natural language question answering. Users can upload collections of documents (ZIPs), which are processed asynchronously, indexed, and queried with context-aware responses, complete with source citations.

Notifications You must be signed in to change notification settings

DG-20/SpotDocAI

Repository files navigation

SpotDocAI

AI-Powered Document Knowledge Management and Q&A System

Description

SpotDocAI is a full-stack proof-of-concept platform for scalable document ingestion, AI-based semantic search, and natural language question answering. Users can upload collections of documents (ZIPs), which are processed asynchronously, indexed, and queried with context-aware responses, complete with source citations.


Key Features

  • Multi-Document Upload: Upload multiple documents or entire ZIP collections of various file formats.
  • Asynchronous Processing: Background workers unzip, process, and index documents for AI retrieval.
  • AI-Powered Q&A: Ask natural language questions and receive AI-generated answers directly from the sources passed in (no hallucinations!).
  • Source Citations: Answers include references to relevant documents.
  • Multi-User & Multi-Collection Support: Each user can maintain separate collections.
  • Scalable Architecture: Containerized services using Docker, Redis queues, and MinIO/S3-compatible storage.
  • Streaming Responses: Real-time, low-latency answer streaming.

Tech Stack

  • Backend: C# (.NET 10.0), ASP.NET Core, Kernel Memory (retrieval), Amazon S3 SDk (for MinIO file storage upload), Redis Queues upload
  • Frontend: Next.js
  • Worker Service: C# (.NET 10.0), ASP.NET Core, Kernel Memory (upload and embeddings), Amazon S3 SDk (for MinIO file storage download), Redis Queues ingestion
  • Storage: MinIO (S3-compatible object storage), Postgres (vector database)
  • Containerization: Docker, docker-compose
  • AI/RAG: LLMs for embedding generation and Q&A (Ollama), Kernel Memory for RAG

Architecture Diagram

Frontend (Next.js)
        |
        v
Backend API (.NET 10.0, ASP.NET Core)
        |  \
        |   --> MinIO (S3-compatible storage) [upload ZIPs & files]
        |  
        --> Redis Queues [enqueue ingestion jobs]
        |
Worker Service (.NET 10.0, ASP.NET Core)
        |  \
        |   --> MinIO [download ZIPs, extract files]
        |   --> Kernel Memory [upload files, generate embeddings]
        |
        --> Postgres (Vector Store) [embeddings & retrieval]
        |
Kernel Memory (RAG & Retrieval)
        |
        v
LLMs (Ollama) [embedding generation & Q&A]


Key Learnings

  • Distributed system design with background workers and queues
  • Integration of AI embeddings for semantic search and retrieval
  • Scalable cloud storage patterns with MinIO / S3
  • Streaming data to frontend efficiently
  • Full-stack development (Next.js + .NET backend)

Screenshots

Homepage

image image image image image

File Upload Page

image

Chat Page

image

Real-Time Streamed Responses with Sources image

About

SpotDocAI is a full-stack proof-of-concept platform for scalable document ingestion, AI-based semantic search, and natural language question answering. Users can upload collections of documents (ZIPs), which are processed asynchronously, indexed, and queried with context-aware responses, complete with source citations.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published