A comprehensive repository for deploying the Devstral model with OpenHands for web access on a server, eliminating the need for LM Studio on the client side.
# Clone the repository
git clone <repository-url>
cd devstral-openhands-deployment
# Build and run with default settings (Ollama)
./build-and-run.sh
# Or with Text Generation WebUI
./build-and-run.sh -t textgen
# Or with llama.cpp and GPU acceleration
./build-and-run.sh -t llamacpp -g# Run the interactive setup script
./scripts/setup.sh
# Test your deployment
./scripts/test-deployment.sh# Quick start with Ollama (recommended for beginners)
cp examples/quick-start-ollama.yml docker-compose.yml && docker-compose up -d
# Production setup with Text Generation WebUI
cp examples/production-textgen.yml docker-compose.yml && docker-compose up -d
# High-performance setup with llama.cpp
cp examples/high-performance-llamacpp.yml docker-compose.yml && docker-compose up -d- 🚀 Quick Start
- 🎯 Core Concept
- 📦 Repository Structure
- ⚙️ Deployment Options
- 🔧 Configuration
- 📊 Monitoring & Testing
- 🐛 Troubleshooting
- 🏆 Advantages
- 📚 Documentation
- 🤝 Contributing
This deployment replaces LM Studio with a server-side solution that:
-
Serves the Devstral GGUF model via HTTP API using your choice of:
- Ollama (user-friendly, web UI included)
- Text Generation WebUI (feature-rich, production-ready)
- llama.cpp server (high-performance, minimal overhead)
-
Runs OpenHands in a Docker container configured to connect to your model API
-
Provides web access to the OpenHands interface from any browser
-
Enables centralized control with optional monitoring and scaling
devstral-openhands-deployment/
├── README.md # This file
├── LICENSE # MIT License
├── .gitignore # Git ignore rules
├── Dockerfile # Complete deployment container
├── docker-compose.standalone.yml # Standalone Docker deployment
├── build-and-run.sh # Build and run script
├── DOCKER_DEPLOYMENT.md # Docker deployment guide
├── ollama-setup/ # Ollama deployment files
│ ├── README.md
│ ├── docker-compose.yml
│ ├── Modelfile
│ └── test-api.sh
├── text-generation-webui/ # Text Generation WebUI setup
│ ├── README.md
│ ├── docker-compose.yml
│ ├── settings.yaml
│ └── test-api.sh
├── llamacpp-server/ # llama.cpp server setup
│ ├── README.md
│ ├── docker-compose.yml
│ ├── docker-compose.gpu.yml
│ ├── test-api.sh
│ └── start-server.sh
├── examples/ # Complete deployment examples
│ ├── README.md
│ ├── quick-start-ollama.yml
│ ├── production-textgen.yml
│ └── high-performance-llamacpp.yml
├── docs/ # Documentation
│ ├── deployment-guide.md
│ └── troubleshooting.md
└── scripts/ # Utility scripts
├── setup.sh # Interactive setup script
└── test-deployment.sh # Deployment testing script
Best for: All users, production, development, easy setup
Features:
- Complete containerized solution
- Automatic service orchestration
- Multiple deployment types in one container
- Easy configuration and management
Quick Start:
./build-and-run.shBest for: First-time users, development, quick prototyping
Features:
- User-friendly web interface
- Easy model management
- Built-in model library
- Simple configuration
Quick Start:
cd ollama-setup/
docker-compose up -dBest for: Production environments, advanced features, monitoring
Features:
- Comprehensive web interface
- Advanced model parameters
- Chat templates and personas
- API compatibility
- Monitoring and logging
Quick Start:
cd text-generation-webui/
docker-compose up -dBest for: Maximum performance, minimal overhead, GPU acceleration
Features:
- Optimized inference engine
- GPU acceleration support
- Minimal resource usage
- OpenAI-compatible API
Quick Start:
cd llamacpp-server/
docker-compose up -dSystem Requirements:
- Minimum: 8GB RAM, 4 CPU cores, 20GB disk space
- Recommended: 16GB+ RAM, 8+ CPU cores, 50GB+ disk space
- GPU: NVIDIA GPU with 8GB+ VRAM (optional, for acceleration)
Software:
- Docker 20.10+
- Docker Compose 2.0+
- Git (for cloning)
Create a .env file to customize your deployment:
# Model Configuration
MODEL_FILE=devstral-model.gguf
MODEL_NAME=devstral
CONTEXT_SIZE=4096
# Performance Settings
THREADS=8
BATCH_SIZE=512
GPU_LAYERS=0 # Set to 35+ for GPU acceleration
# Port Configuration
OPENHANDS_PORT=3000
MODEL_SERVER_PORT=8080
# Paths
WORKSPACE_PATH=./workspace
MODELS_PATH=./models# Use environment variables
MODEL_FILE=my-model.gguf GPU_LAYERS=35 docker-compose up -d
# Use specific example
cp examples/high-performance-llamacpp.yml docker-compose.yml
docker-compose up -dAll deployments include built-in health checks:
# Check service status
docker-compose ps
# View service logs
docker-compose logs -f
# Test API endpoints
./scripts/test-deployment.shFor production deployments with monitoring:
# Start with monitoring stack
docker-compose --profile monitoring up -d
# Access monitoring interfaces
# Grafana: http://localhost:3001
# Prometheus: http://localhost:9090# Comprehensive deployment test
./scripts/test-deployment.sh
# Quick API test (varies by deployment)
curl http://localhost:8080/v1/models # llama.cpp
curl http://localhost:11434/api/tags # Ollama
curl http://localhost:5000/api/v1/model # Text Generation WebUI-
Docker Image Pull Failures:
# Use correct OpenHands image docker pull docker.all-hands.dev/all-hands-ai/openhands:0.40 -
Model Loading Issues:
# Check model file exists ls -la models/ # Verify model in container docker exec -it ollama ollama list
-
API Connection Issues:
# Test network connectivity docker exec -it openhands curl http://ollama:11434/api/tags
-
Port Conflicts:
# Change ports in docker-compose.yml ports: - "3001:3000" # Change from 3000:3000
For detailed troubleshooting, see docs/troubleshooting.md.
- 🌐 Web Access: Access from any browser, any device
- 🔄 Centralized Control: Manage from a single server
- 📈 Scalability: Easy to scale resources or add instances
- 🔒 Security: Centralized security management
- 👥 Multi-User: Support multiple concurrent users
- 📊 Monitoring: Built-in monitoring and logging
- 🚀 Performance: Dedicated server resources
- 🐳 Containerized: Consistent deployment across environments
- ⚡ Quick Setup: Automated setup scripts and examples
- 🔧 Configurable: Multiple deployment options and configurations
- 🧪 Testable: Comprehensive testing scripts
- 📖 Documented: Extensive documentation and examples
- Docker Deployment Guide: Complete Docker deployment instructions
- Deployment Guide: Step-by-step deployment instructions
- Troubleshooting Guide: Common issues and solutions
- Examples: Ready-to-use deployment examples
- Ollama Setup: Ollama-specific instructions
- Text Generation WebUI: WebUI setup guide
- llama.cpp Server: High-performance setup
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
# Clone the repository
git clone <repository-url>
cd devstral-openhands-deployment
# Test your changes
./scripts/test-deployment.sh
# Submit a pull requestThis project is licensed under the MIT License - see the LICENSE file for details.
- OpenHands - AI agent framework
- Ollama - Local LLM server
- Text Generation WebUI - Web interface for LLMs
- llama.cpp - High-performance LLM inference
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: docs/
Ready to deploy? Start with ./scripts/setup.sh for an interactive setup experience!