SCIoT (Split Computing on IoT) is an intelligent system that makes TinyML applications on IoT devices smarter and more efficient. Instead of running all AI computations on a small device (which can be slow) or sending everything to the cloud (which wastes bandwidth), SCIoT automatically decides which parts of the AI model to run locally on the device and which parts to run on a nearby edge server.
Traditional IoT AI applications face a dilemma:
- All on device: Limited processing power, slow inference, but no network dependency
- All on cloud: Fast processing but high latency, bandwidth usage, and privacy concerns
- Static split: Fixed division that doesn't adapt to changing conditions
SCIoT provides dynamic split computing that automatically:
- Monitors Performance: Continuously tracks how long each part of the AI model takes to run
- Detects Changes: Notices when performance changes (network gets slower, device gets busy)
- Adapts Automatically: Adjusts the split point to maintain optimal performance
- Handles Failures: Falls back to local processing when the server is unavailable
- Up to 3x faster inference compared to device-only execution
- Automatic optimization as conditions change
- Real-time adaptation to network and device conditions
- Continues working even when the edge server is down
- Graceful degradation to local-only mode
- No crashes or data loss on network failures
- Learns from past performance to make better decisions
- Detects when performance patterns change significantly
- Self-optimizing system that gets better over time
- Reduces bandwidth usage by sending only necessary data
- Minimizes device battery consumption
- Optimizes both latency and throughput
- Quality Control: Inspect products on production lines
- Predictive Maintenance: Detect equipment failures before they happen
- Safety Monitoring: Identify hazardous conditions in real-time
- Crop Monitoring: Detect diseases and pest infestations
- Precision Irrigation: Optimize water usage based on plant health
- Livestock Monitoring: Track animal health and behavior
- Security: Intelligent surveillance and access control
- Energy Management: Optimize heating, cooling, and lighting
- Occupancy Detection: Manage spaces based on usage patterns
- Patient Monitoring: Continuous health tracking
- Medical Imaging: Edge-assisted diagnostic imaging
- Emergency Response: Rapid detection of critical conditions
- Supported Models: TensorFlow Lite models (tested with FOMO 96x96)
- Platforms: ESP32, Python clients, Raspberry Pi
- Communication: HTTP, WebSocket, MQTT protocols
- Languages: Python, C++ (Arduino)
- Requirements: Python 3.11+, TensorFlow 2.15+
SCIoT is based on peer-reviewed research:
- F. Bove, S. Colli and L. Bedogni, "Performance Evaluation of Split Computing with TinyML on IoT Devices," IEEE CCNC 2024
- F. Bove and L. Bedogni, "Smart Split: Leveraging TinyML and Split Computing for Efficient Edge AI," IEEE/ACM SEC 2024
- For End Users: See USER_GUIDE.md for setup and configuration
- For Developers: See DEVELOPER_GUIDE.md for technical implementation details
SCIoT is open source software licensed under MIT. We welcome contributions, bug reports, and feature requests.
Repository: https://github.com/UBICO/SCIoT_python_client
Issues: https://github.com/UBICO/SCIoT_python_client/issues
Powered by UBICO - University of Bologna
- Python 3.11+
- TensorFlow 2.15.0
- Docker (for MQTT broker)
Clone the repository:
git clone https://github.com/UBICO/SCIoT.git
cd SCIoT_python_clientCreate virtual environment and install dependencies:
uv syncActivate the virtual environment:
source .venv/bin/activate # On macOS/Linux
# or
.venv\Scripts\activate # On Windows- Save your Keras model as
test_model.h5insrc/server/models/test/test_model/ - Save your test image as
test_image.pnginsrc/server/models/test/test_model/pred_data/ - Split the model:
python3 src/server/models/model_split.py - Configure paths in
src/server/commons.py
communication:
http:
host: 0.0.0.0
port: 8000
endpoints:
registration: /api/registration
device_input: /api/device_input
offloading_layer: /api/offloading_layer
device_inference_result: /api/device_inference_result
delay_simulation:
computation:
enabled: false
type: gaussian
mean: 0.001
std_dev: 0.0002
network:
enabled: false
type: gaussian
mean: 0.020
std_dev: 0.005
local_inference_mode:
enabled: true
probability: 0.1 # 10% of requests force local inferenceclient:
device_id: "device_01"
http:
server_host: "0.0.0.0"
server_port: 8000
model:
last_offloading_layer: 58
local_inference_mode:
enabled: true
probability: 0.1Activate the virtual environment:
source .venv/bin/activateStart the MQTT broker (optional):
docker compose upRun the edge server:
python src/server/edge/run_edge.pyIn a separate terminal:
source .venv/bin/activate
python server_client_light/client/http_client.pyClient Behavior:
- Connects to server and registers device
- Sends image data
- Receives offloading decision (or
-1for local-only) - Runs inference (split or local)
- Sends results back to server
- Continues operating if server becomes unavailable (graceful degradation to local-only mode)
View real-time statistics:
streamlit run src/server/web/webpage.pypytest tests/test_variance_and_local_inference.py tests/test_client_resilience.py tests/test_mqtt_client/ -v# Core features (variance, local inference, -1 handling)
pytest tests/test_variance_and_local_inference.py -v
# Connection resilience
pytest tests/test_client_resilience.py -v
# MQTT client
pytest tests/test_mqtt_client/ -v# Variance detection demonstration
python test_variance_detection.py
# Cascade propagation demonstration
python test_variance_cascading.pyThe system monitors inference time stability using Coefficient of Variation (CV):
CV = StdDev / Mean
If CV > 15% → Unstable → Trigger re-test
Cascading: When layer i shows variance, layer i+1 is automatically flagged for re-testing (since layer i's output is layer i+1's input).
See VARIANCE_DETECTION.md for details.
Probabilistically forces device to run all layers locally:
- Purpose: Refresh device inference times periodically
- Configuration:
enabled(true/false) +probability(0.0-1.0) - Mechanism: Server returns
-1instead of calculated offloading layer - Client Handling:
-1→ converts to layer 58 (run all 59 layers locally)
See LOCAL_INFERENCE_MODE.md for details.
Simulate network and computation delays for testing:
delay_simulation:
computation:
enabled: true
type: gaussian # Options: static, gaussian, uniform, exponential
mean: 0.001 # 1ms average
std_dev: 0.0002 # 0.2ms variation
network:
enabled: true
type: gaussian
mean: 0.020 # 20ms average
std_dev: 0.005 # 5ms variationSee DELAY_SIMULATION.md for details.
Run comprehensive multi-scenario simulations with automated analysis:
# Run all 9 predefined scenarios (duration: ~15 minutes)
python simulation_runner.py
# Results saved to: simulated_results/simulation_YYYYMMDD_HHMMSS/
# - baseline_inference_results.csv
# - network_delay_20ms_inference_results.csv
# - computation_delay_5ms_inference_results.csv
# - ... (one per scenario)See SIMULATION_RUNNER_README.md for scenarios and configuration.
Generate comprehensive graphs and statistics from simulation results:
# Analyze a simulation folder
python analyze_simulation.py simulated_results/simulation_YYYYMMDD_HHMMSS
# Generates in analysis/ subfolder:
# - Device vs Edge time comparison plots
# - Total inference time bar charts
# - Throughput analysis
# - Timing distribution boxplots
# - Layer statistics
# - Comprehensive comparison dashboard
# - Summary statistics CSVSee ANALYSIS_README.md for detailed output descriptions and interpretation.
Clients handle server unavailability gracefully:
- Connection timeout: 5 seconds on all requests
- Fallback behavior: Run all layers locally when server unreachable
- No crashes: All network errors caught and handled
- Auto-retry: Attempts reconnection on each request
- Continues operation: System never stops, even when isolated
Example output when server is down:
⚠ Registration failed (server unreachable): Connection refused
→ Continuing with local-only inference
⚠ Cannot reach server: Connection refused
→ Running all layers locally
✓ Inference complete (layers 0-58)
Comprehensive documentation available:
- VARIANCE_DETECTION.md - Technical documentation of variance detection
- VARIANCE_DETECTION_IMPLEMENTATION.md - Implementation overview
- LOCAL_INFERENCE_MODE.md - Local inference mode reference
- LOCAL_INFERENCE_IMPLEMENTATION.md - Implementation details
- CLIENT_SERVER_-1_SEMANTICS.md - How -1 works end-to-end
- DELAY_SIMULATION.md - Delay simulation guide
- TEST_SUITE_SUMMARY.md - Complete test documentation
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Device │ ◄─────► │ Edge Server │ ◄─────► │ Analytics │
│ Client │ HTTP │ (FastAPI) │ │ Dashboard │
└─────────────┘ └──────────────┘ └─────────────┘
│ │
│ │
Inference Offloading
(0 to N) Algorithm +
Variance +
Local Mode
│ │
▼ ▼
Device Edge
Results Results
(times) (prediction)
Request Flow:
- Client sends image → Server
- Server returns offloading layer (or
-1) - Client runs inference up to layer
- Client sends results + times → Server
- Server tracks variance + updates times
- Server runs remaining layers (if needed)
- Server returns final prediction
- Inference: 59 layers (FOMO 96x96)
- Device time: ~19µs per layer average
- Edge time: ~450-540µs per layer average
- Network: Configurable latency simulation
- Variance threshold: 15% CV
- Refresh rate: Configurable (default 10% via local inference mode)
- Check port 8000 is not in use
- Verify TensorFlow is installed correctly
- Check model files exist in correct paths
- Verify server is running
- Check
server_hostandserver_portin config - Note: Client will continue in local-only mode if server unavailable
- Ensure virtual environment is activated
- Run
uv syncto update dependencies - Check Python version is 3.11+
- Verify model is split correctly
- Check layer dimensions match
- Review logs in
logs/directory
This is a research project. For questions or collaboration:
- Open an issue on GitHub
- Contact the UBICO research group
- See publications for research context
See LICENSE file for details.