This monorepo contains the following components:
- Data Parser - Parses CML data and metadata CSV files from SFTP uploads into the database
- Database - TimescaleDB for storing time series data and metadata
- Data Processor - (Stub implementation) Placeholder for future data analysis and processing logic
- Webserver - Main user-facing web application with interactive visualizations
- Grafana - Real-time dashboards for CML data visualization
- MNO Data Source Simulator - Simulates real-time CML data from MNO sources via SFTP
- SFTP Receiver - Receives uploaded CML data files
The webserver provides an intuitive interface with four main pages:
- Landing Page (
/) - System overview with data statistics and processing status - Real-Time Data (
/realtime) - Interactive CML network map with Grafana-embedded time series plots - Archive (
/archive) - Long-term archive statistics and data distribution analysis - Data Uploads (
/data-uploads) - File upload interface for CML data files
- Docker and Docker Compose
- Git
-
Clone the repository:
git clone https://github.com/OpenSenseAction/GMDI_prototype.git cd GMDI_prototype -
Generate SSH keys for SFTP server:
cd ssh_keys ./generate_ssh_keys.sh cd ..
-
Build and run the containers:
docker compose up -d
-
Access the services:
- Webserver (Main UI): http://localhost:5000
- Grafana Dashboards: http://localhost:3000
- Database: localhost:5432
- SFTP Server: localhost:2222
Note: The processor service (port 5002) is currently a minimal stub implementation.
- MNO Simulator → generates CML data from NetCDF files and uploads via SFTP to SFTP Receiver
- Parser → watches SFTP upload directory and processes CSV files (both metadata and data)
- Parser → validates and writes parsed data to Database (TimescaleDB)
- Webserver → serves UI and provides API access to database
- Grafana → visualizes real-time data from database with embedded dashboards
The database can be initialized with archive CML data using two methods:
Pre-generated CSV files included in the repository:
- 728 CML sublinks (364 unique CML IDs) covering Berlin area
- ~1.5M data rows at 5-minute intervals over 7 days
- Gzip-compressed (~7.6 MB total, included in repo)
- Loads in ~3 seconds via PostgreSQL COPY
Files are located in /database/archive_data/ and loaded automatically on first database startup.
Load data directly from the full 3-month NetCDF archive with configurable time range:
# Rebuild parser if needed
docker compose build parser
# Start database
docker compose up -d database
# Load last 7 days from NetCDF
docker compose run --rm -e DB_HOST=database parser python /app/parser/parse_netcdf_archive.pyUse ARCHIVE_MAX_DAYS to control how much data to load:
# Load last 14 days (~88M rows, ~10 minutes)
docker compose run --rm -e DB_HOST=database -e ARCHIVE_MAX_DAYS=14 parser python /app/parser/parse_netcdf_archive.py
# Load full 3 months (~579M rows, ~1 hour)
docker compose run --rm -e DB_HOST=database -e ARCHIVE_MAX_DAYS=0 parser python /app/parser/parse_netcdf_archive.pyNote: Set ARCHIVE_MAX_DAYS=0 to disable the time limit and load the entire dataset. Larger datasets require more database memory (recommend at least 4GB RAM for full 3-month archive).
Features:
- Auto-downloads 3-month NetCDF file (~209 MB) on first run
- 10-second resolution (vs 5-minute for CSV method)
- Automatic timestamp shifting - data ends at current time
- Progress reporting with batch-by-batch status (~155K rows/sec)
- PostgreSQL COPY for maximum performance
- Configurable time window to balance demo realism vs load time
The NetCDF file is downloaded to parser/example_data/openMRG_cmls_20150827_3months.nc and gitignored.
To regenerate CSV archive data:
python mno_data_source_simulator/generate_archive.pyTo reload archive data (either method):
docker compose down -v # Remove volumes
docker compose up -d # Restart with fresh databaseThe webserver supports multiple storage backends for received files:
- Local filesystem (default) - For development and testing
- MinIO - S3-compatible object storage (optional)
- AWS S3 - Production object storage (configure via environment variables)
To use MinIO, uncomment the minio service in docker-compose.yml and set:
environment:
- STORAGE_BACKEND=minio
- STORAGE_S3_BUCKET=cml-data
- STORAGE_S3_ENDPOINT=http://minio:9000