AI-powered prompt generator for video, image, and creative content
Features Β· Quick Start Β· Supported Targets Β· Models Β· Configuration
PromptMill is a self-contained web UI that runs entirely locally - no API keys, no cloud dependencies. It uses selectable LLMs (scaled by your GPU VRAM) to generate optimized prompts for the latest AI video and image generators.
| 102 Preset Roles |
7 LLM Options |
1B-8B Parameters |
100% Local |
Clean dark UI with quick examples and customizable generation settings
Support for Video, Image, Audio, 3D, and Creative AI tools
- Smart GPU Detection - Automatically selects the best model for your VRAM
- 7 LLM Tiers - From 1B (CPU) to 8B parameters (24GB+ VRAM) using Dolphin models
- 102 Specialized Roles - Video (22), Image (21), Audio (13), 3D (12), and Creative (34)
- Dark Mode UI - Modern interface with streaming generation
- Model Cleanup - Delete downloaded models to free disk space
- Zero Config - Works out of the box with Docker
- Fully Offline - No API keys or internet required after setup
- Thread-Safe - Concurrent request handling with proper locking
- Configurable - Environment variables for server settings
# GPU (NVIDIA) - auto-detects VRAM
docker compose --profile gpu up -d
# CPU only
docker compose --profile cpu up -dModels auto-download on first use and persist in
./models/
# GPU (CUDA)
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python
pip install gradio huggingface_hub
python -m promptmill
# CPU only
pip install llama-cpp-python gradio huggingface_hub
python -m promptmill|
Wan2.1, Wan2.2, Wan2.5, Hunyuan Video, Hunyuan 1.5, Runway Gen-3, Kling AI, Kling 2.1, Pika Labs, Pika 2.1, Luma Dream Machine, Luma Ray2, Sora, Veo, Veo 3, Hailuo AI, Seedance, SkyReels V1, Mochi 1, CogVideoX, LTX Video, Open-Sora Stable Diffusion, SD 3.5, FLUX, FLUX 2, Midjourney, DALL-E 3, ComfyUI, Ideogram, Leonardo AI, Adobe Firefly, Recraft, Imagen 3, Imagen 4, GPT-4o Images, Reve Image, HiDream-I1, Qwen-Image, Recraft V3, FLUX Kontext, Ideogram 3, Grok Image |
Suno AI, Udio, ElevenLabs, Eleven Music, Mureka AI, SOUNDRAW, Beatoven.ai, Stable Audio 2.0, MusicGen, Suno v4.5, ACE Studio, AIVA, Boomy Meshy, Tripo AI, Rodin, Spline, Sloyd, 3DFY.ai, Luma Genie, Masterpiece X, Hunyuan3D, Trellis, TripoSR, Unique3D Story Writer, Code Generator, Technical Writer, Marketing Copy, SEO Content, Screenplay Writer, Social Media Manager, Video Script Writer, Song Lyrics, Email Copywriter, Product Description, Podcast Script, Resume Writer, Cover Letter, Speech Writer, Game Narrative, UX Writer, Press Release, Poetry Writer, Data Analysis, Business Plan, Academic Writing, Tutorial Creator, Newsletter Writer, Legal Document, Grant Writer, API Documentation, Course Creator, Pitch Deck, Meeting Notes, Changelog Writer, Recipe Creator, Travel Guide, Workout Plan |
PromptMill automatically selects the best model based on your GPU. All models are uncensored Dolphin variants:
| VRAM | Model | Size | Quality |
|---|---|---|---|
| CPU | Dolphin 3.0 Llama 3.2 1B Q8 | ~1GB | β |
| 4GB | Dolphin 3.0 Llama 3.2 3B Q4_K_M | ~2.5GB | ββ |
| 6GB | Dolphin 3.0 Llama 3.2 3B Q8 | ~4GB | βββ |
| 8GB | Dolphin 3.0 Llama 3.1 8B Q4_K_M | ~6GB | ββββ |
| 12GB | Dolphin 3.0 Llama 3.1 8B Q6_K_L | ~10GB | ββββ |
| 16GB+ | Dolphin 3.0 Llama 3.1 8B Q8 | ~12GB | βββββ |
| 24GB+ | Dolphin 2.9.4 Llama 3.1 8B Q8 (131K ctx) | ~10GB | βββββ |
The app auto-configures based on your hardware:
- GPU detected β Uses all layers on GPU, selects model by VRAM
- No GPU β CPU mode with lightweight 1B model
Manual override available in the UI for GPU layers and model selection.
| Variable | Default | Description |
|---|---|---|
SERVER_HOST |
127.0.0.1 |
Server bind address (use 0.0.0.0 for network access) |
SERVER_PORT |
7610 |
Server port |
MODELS_DIR |
/app/models |
Directory for model storage |
Security Note: The default
127.0.0.1only allows local access. For network/Docker access, useSERVER_HOST=0.0.0.0with a reverse proxy (nginx/traefik) for production.
Example:
SERVER_PORT=8080 python -m promptmillPromptMill exposes a health endpoint for container orchestration:
# Health check
curl http://localhost:7610/healthResponse:
{
"status": "healthy",
"version": "3.0.0",
"model_loaded": false,
"roles_count": 102
}The Gradio API is also available at /api/ for programmatic access.
PromptMill/
βββ src/promptmill/ # Application source (Hexagonal Architecture)
β βββ __main__.py # Entry point
β βββ container.py # Dependency injection container
β βββ domain/ # Domain layer (entities, ports, exceptions)
β β βββ entities/ # Model, Role, GPUInfo
β β βββ value_objects/ # PromptGenerationRequest/Result
β β βββ ports/ # Abstract interfaces (LLM, Repository)
β β βββ exceptions.py # Domain exceptions
β βββ application/ # Application layer (use cases, services)
β β βββ use_cases/ # GeneratePrompt, LoadModel, etc.
β β βββ services/ # PromptService, ModelService, HealthService
β βββ infrastructure/ # Infrastructure layer (adapters, config)
β β βββ adapters/ # LlamaCpp, HuggingFace, NvidiaSmi adapters
β β βββ config/ # Settings, ModelConfigs
β β βββ persistence/ # RolesData (102 role templates)
β βββ presentation/ # Presentation layer (Gradio UI)
β βββ gradio_app.py # Main UI
β βββ theme.py # Dark theme configuration
βββ tests/ # Unit & integration tests
βββ pyproject.toml # Project config & dependencies
βββ assets/logo.svg # Logo
βββ Dockerfile.gpu # CUDA build
βββ Dockerfile.cpu # CPU build
βββ docker-compose.yml # Docker orchestration
βββ models/ # Downloaded LLMs (persisted)
Requires Python 3.12+ and uv (recommended) or pip.
# Install dependencies
uv sync
# Run application
uv run python -m promptmill
# Lint & format
uv run ruff check --fix
uv run ruff format
# Run tests
PYTHONPATH=src uv run pytest tests/unit -vPromptMill uses Hexagonal Architecture (Ports and Adapters) with Domain-Driven Design:
- Domain Layer: Pure Python entities, value objects, and port interfaces
- Application Layer: Use cases and services orchestrating business logic
- Infrastructure Layer: Adapters implementing ports (LlamaCpp, HuggingFace, etc.)
- Presentation Layer: Gradio UI adapter
- Set GPU Layers to
0in the UI for CPU-only mode - Ensure NVIDIA drivers are installed:
nvidia-smi - For Docker: use
--profile gpuand ensure nvidia-container-toolkit is installed
- Check internet connectivity
- Models are cached in
./models/directory - Delete and re-download: use "Model Management" in UI
- Try a smaller model (lower VRAM tier)
- Close other GPU-intensive applications
- Model auto-unloads after 10 seconds of inactivity
SERVER_PORT=8080 python -m promptmillContributions welcome! Feel free to:
- Report bugs or request features via Issues
- Submit pull requests
MIT License - see LICENSE for details.
Made with β€οΈ for the AI creative community