End-to-End Credit Default Prediction with Explainable ML API & Dashboard
A production-ready machine learning solution for credit risk assessment with comprehensive explainability features, designed for fintech companies and financial institutions.
- Advanced ML Pipeline: Multiple algorithms with hyperparameter tuning
- Explainable AI: SHAP-based explanations for model decisions
- Production API: FastAPI backend with comprehensive endpoints
- Interactive Dashboard: Streamlit interface for risk analysis
- Containerized Deployment: Docker and Docker Compose ready
- Feature Engineering: 23+ engineered features from financial data
- Real-time Predictions: Single and batch prediction capabilities
- Model Monitoring: Performance tracking and validation
Credit Default Prediction System
βββ Data Pipeline
β βββ Data Ingestion (UCI Dataset)
β βββ Data Validation (Schema + Drift Detection)
β βββ Data Transformation (Feature Engineering)
βββ ML Pipeline
β βββ Model Training (Multiple Algorithms)
β βββ Hyperparameter Tuning (GridSearchCV)
β βββ Model Evaluation (Cross-validation)
βββ Explainability Layer
β βββ SHAP Global Explanations
β βββ SHAP Local Explanations
β βββ Interactive Visualizations
βββ API Layer
β βββ FastAPI Backend
β βββ Prediction Endpoints
β βββ Explanation Endpoints
βββ Frontend Layer
β βββ Streamlit Dashboard
β βββ Interactive Risk Assessment
β βββ Batch Processing Interface
βββ Deployment Layer
βββ Docker Containerization
βββ CI/CD Pipeline Support
βββ Cloud Deployment Ready
- Python 3.10+
- Git
- 4GB+ RAM recommended
- Docker & Docker Compose (optional)
# Clone the repository
git clone <repository-url>
cd credit_default_prediction
# Run setup script
chmod +x scripts/setup.sh
./scripts/setup.sh# Activate virtual environment
source venv/bin/activate # Linux/Mac
# or
venv\Scripts\activate # Windows
# Run training pipeline
python src/credit_default/pipeline/training_pipeline.py
# or
./scripts/train.shcd deployment
docker-compose up -d# Start API server (Terminal 1)
./scripts/start_api.sh
# Start Dashboard (Terminal 2)
./scripts/start_dashboard.sh- API Documentation: http://localhost:8000/docs
- Interactive Dashboard: http://localhost:8501
- MLflow Tracking: http://localhost:5000 (if using Docker Compose)
| Algorithm | Accuracy | Precision | Recall | F1-Score | ROC-AUC |
|---|---|---|---|---|---|
| XGBoost | 0.823 | 0.756 | 0.689 | 0.721 | 0.891 |
| Random Forest | 0.816 | 0.741 | 0.678 | 0.708 | 0.885 |
| Gradient Boosting | 0.819 | 0.748 | 0.672 | 0.708 | 0.887 |
| Logistic Regression | 0.801 | 0.695 | 0.634 | 0.663 | 0.856 |
Best Model: XGBoost with hyperparameter tuning
- Feature Importance: Ranking of most predictive features
- SHAP Summary Plots: Overall impact of features across all predictions
- Partial Dependence Plots: Feature effect visualization
- SHAP Force Plots: Individual prediction breakdown
- SHAP Waterfall Plots: Step-by-step prediction explanation
- Feature Contribution Analysis: Positive/negative impact identification
credit_default_prediction/
βββ src/
β βββ credit_default/
β βββ components/ # ML pipeline components
β βββ configuration/ # Configuration management
β βββ constants/ # Project constants
β βββ entity/ # Data classes and entities
β βββ exception/ # Custom exception handling
β βββ logger/ # Logging configuration
β βββ pipeline/ # Training and prediction pipelines
β βββ utils/ # Utility functions
βββ config/
β βββ schema.yaml # Data schema configuration
β βββ model.yaml # Model configuration
βββ data/
β βββ raw/ # Raw data files
β βββ processed/ # Processed data files
βββ artifacts/ # Pipeline artifacts
β βββ data_ingestion/
β βββ data_validation/
β βββ data_transformation/
β βββ model_trainer/
β βββ explainer/
βββ api/
β βββ fastapi_main.py # FastAPI application
βββ dashboard/
β βββ streamlit_dashboard.py # Streamlit application
βββ deployment/
β βββ Dockerfile # Docker configuration
β βββ docker-compose.yml # Multi-service orchestration
βββ scripts/ # Execution scripts
βββ tests/ # Unit and integration tests
βββ logs/ # Application logs
βββ notebooks/ # Jupyter notebooks for EDA
# Single prediction
POST /predict
Content-Type: application/json
{
"LIMIT_BAL": 200000,
"SEX": 2,
"EDUCATION": 2,
"MARRIAGE": 1,
"AGE": 35,
"PAY_0": 1,
"PAY_2": 2,
...
}# Get prediction with explanation
POST /explain
Content-Type: application/json
# Same payload as /predict
# Returns SHAP explanations + visualizations# Batch predictions
POST /batch-predict
Content-Type: multipart/form-data
# Upload CSV file with customer data
# Returns batch predictions + summary statisticsGET /health # Health check
GET /model-info # Model metadata
POST /sample-prediction # Generate sample prediction
GET /feature-schema # Get input schema- Risk Assessment Form: Input customer details
- Real-time Prediction: Instant risk scoring
- SHAP Explanations: Feature contribution analysis
- Risk Visualization: Gauge charts and indicators
- CSV Upload: Process multiple customers
- Summary Statistics: Aggregate risk metrics
- Risk Distribution: Visual analytics
- Export Results: Download predictions
- Performance Metrics: Model evaluation scores
- Feature Importance: Global explanations
- Model Information: Architecture details
# Quick start with Docker
docker-compose up -d
# Manual startup
source venv/bin/activate
python api/fastapi_main.py &
streamlit run dashboard/streamlit_dashboard.py# Build production images
docker build -t credit-default-api .
docker build -t credit-default-dashboard .
# Deploy with orchestration
docker-compose -f docker-compose.prod.yml up -d- Data source settings
- Feature engineering parameters
- Model hyperparameters
- Training configurations
- Input feature definitions
- Data validation rules
- Quality thresholds
# Run all tests
python -m pytest tests/ -v
# Run with coverage
python -m pytest tests/ --cov=./ --cov-report=html
# Run specific test categories
python -m pytest tests/test_components.py -v
python -m pytest tests/test_pipeline.py -v- Health checks and service monitoring
- Performance metrics tracking
- Error tracking and alerting
- Prediction drift detection
- Model performance tracking
- Feature importance monitoring
- Input validation and sanitization
- Secure API endpoints
- Data encryption support
- Model versioning and artifact management
- Prediction audit trails
- Explainability for compliance
- Fork the repository
- Create feature branch:
git checkout -b feature/amazing-feature - Commit changes:
git commit -m 'Add amazing feature' - Push to branch:
git push origin feature/amazing-feature - Open Pull Request
- Follow PEP 8 style guidelines
- Add unit tests for new features
- Update documentation for API changes
- Ensure Docker build passes
This project is licensed under the MIT License - see the LICENSE file for details.
- UCI Machine Learning Repository for the Credit Default Dataset
- Open Source Community for the excellent libraries and frameworks
- SHAP for explainable AI capabilities
For questions, issues, or contributions:
- Check existing issues: GitHub Issues
- Create new issue: Detailed bug reports or feature requests
- Documentation: Check the
/docsdirectory for detailed guides
- Real-time model retraining pipeline
- A/B testing framework for model variants
- Advanced ensemble methods
- GPU acceleration support
- Multi-model comparison interface
- Custom risk threshold settings
- Regulatory compliance reporting
- Integration with external credit bureaus
Built for Production-Ready ML Engineering
This project demonstrates enterprise-grade ML engineering practices suitable for fintech environments.