Intelligent document processing platform for transport and logistics. Upload PDF or image documents, automatically classify them, extract structured data with OCR + LLM, validate against business rules, and review results in a human-in-the-loop UI.
- Document classification — automatically identifies 11 transport document types (CMR, Invoice, Bill of Lading, Air Waybill, Customs Declaration, and more) using weighted keyword scoring with fuzzy matching and LLM fallback
- OCR extraction — PaddleOCR (default) or Tesseract for text extraction from PDFs and images
- LLM field extraction — structured field extraction via Ollama (qwen2.5:3b) or OpenAI, with per-document-type schemas
- Validation engine — config-driven business rules with confidence scoring
- Review UI — human-in-the-loop correction interface with side-by-side document viewer
- Authentication — JWT access/refresh tokens with role-based access control (RBAC)
- File security — magic bytes validation, filename sanitization, rate limiting
| Layer | Technology |
|---|---|
| Frontend | React 18, TypeScript, Tailwind CSS, Lucide icons |
| Backend | Python 3.11, FastAPI, SQLAlchemy, Pydantic v2 |
| OCR | PaddleOCR / Tesseract (pluggable) |
| LLM | Ollama (qwen2.5:3b) / OpenAI (pluggable) |
| Database | PostgreSQL 15 |
| Storage | MinIO (S3-compatible) |
| Queue | Redis 7 |
| Monitoring | OpenTelemetry, Prometheus |
Hexagonal Architecture (Ports & Adapters):
backend/src/
├── domain/ # Entities, value objects, domain services
├── application/ # Use cases, DTOs, extraction schemas
├── infrastructure/ # Repositories, OCR, LLM, storage, auth, messaging
└── api/ # FastAPI routes, middleware
git clone https://github.com/HopeyCodeDS/sortex-ai.git
cd sortex-aiCreate a .env file in the project root:
POSTGRES_USER=sortex
POSTGRES_PASSWORD=sortex
POSTGRES_DB=sortex
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin
JWT_SECRET_KEY=change-this-in-production
DEFAULT_LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://host.docker.internal:11434
OLLAMA_MODEL=qwen2.5:3b
CORS_ORIGINS=http://localhost:3000ollama pull qwen2.5:3bdocker-compose up -dThis starts PostgreSQL, MinIO, Redis, the FastAPI backend, and the React frontend.
| Service | URL |
|---|---|
| Frontend | http://localhost:3000 |
| Backend API | http://localhost:8000 |
| API docs (Swagger) | http://localhost:8000/docs |
| MinIO console | http://localhost:9001 |
Default login: admin@docflow.ai / admin123
sortex-ai/
├── frontend/ # React + TypeScript + Tailwind
│ ├── src/
│ │ ├── components/
│ │ │ ├── layout/ # AppLayout, Sidebar, TopBar
│ │ │ └── ui/ # Reusable components (Card, Modal, DataTable, etc.)
│ │ ├── pages/ # Dashboard, DocumentUpload, DocumentReview, Login
│ │ ├── contexts/ # Auth context
│ │ └── services/ # API client
│ └── package.json
├── backend/ # FastAPI + Hexagonal Architecture
│ ├── src/
│ │ ├── domain/
│ │ │ ├── entities/ # Document, Extraction, User
│ │ │ └── services/ # Classification, validation
│ │ ├── application/
│ │ │ ├── use_cases/ # Extract fields, process document
│ │ │ └── extraction_schemas.py
│ │ ├── infrastructure/
│ │ │ ├── persistence/ # PostgreSQL repositories
│ │ │ ├── ocr/ # PaddleOCR, Tesseract adapters
│ │ │ ├── llm/ # Ollama, OpenAI adapters
│ │ │ ├── storage/ # MinIO adapter
│ │ │ └── auth/ # JWT, RBAC
│ │ └── api/
│ │ └── routes/ # REST endpoints
│ ├── migrations/
│ └── requirements.txt
├── docker-compose.yml
└── .env
| Type | Description |
|---|---|
| CMR | Convention Marchandise Routiers (road consignment) |
| Invoice | Commercial / freight invoices |
| Delivery Note | Proof of delivery documents |
| Bill of Lading | Ocean shipping contracts |
| Air Waybill | Air cargo consignment notes |
| Sea Waybill | Non-negotiable sea transport |
| Packing List | Shipment contents listing |
| Customs Declaration | Import/export customs forms |
| Certificate of Origin | Country of origin certification |
| Dangerous Goods Declaration | Hazmat shipping declarations |
| Freight Bill | Carrier billing documents |
Backend:
cd backend
python -m venv venv
venv\Scripts\activate # Linux/Mac: source venv/bin/activate
pip install -r requirements.txt
uvicorn src.api.main:app --reloadFrontend:
cd frontend
npm install
npm startRequires PostgreSQL, Redis, and MinIO running locally or via Docker.
MIT