This project demonstrates provenance tracking for a machine learning model trained on the MNIST dataset. It includes comprehensive tracking of data, model, and training provenance, along with verification capabilities.
- Data provenance tracking
- Model architecture and weights tracking
- Training process monitoring
- Comprehensive verification system
- Detailed reporting with markdown output
mnist_provenance/
βββ src/
β βββ provenance/
β β βββ tracker.py
β β βββ verifier.py
β β βββ generate_final_report.py
β βββ training/
β βββ train.py
βββ scripts/
β βββ run_training.sh
βββ artifacts/
β βββ models/
β βββ provenance/
βββ tests/
βββ test_provenance.py
- Create a virtual environment:
python -m venv venv- Activate the virtual environment:
source venv/bin/activate # On Unix/macOS
# or
.\venv\Scripts\activate # On Windows- Install dependencies:
pip install -r requirements.txtRun the training script:
./scripts/run_training.shThis will:
- Train a model on the MNIST dataset
- Track all provenance information
- Generate a detailed report in the artifacts directory
- Python 3.8+
- TensorFlow 2.x
- NumPy
- pytest (for testing)
MIT License