Skip to content

End-to-End ML Dashboard for Credit Risk Assessment with SHAP Explainer provides real-time credit default risk prediction with detailed explanations to help financial institutions make informed lending decisions.

License

Notifications You must be signed in to change notification settings

MOHD-AFROZ-ALI/Credit_Default_Predict

Repository files navigation

Credit Default Prediction - End-to-End ML Pipeline

Python FastAPI Streamlit Docker

End-to-End Credit Default Prediction with Explainable ML API & Dashboard

A production-ready machine learning solution for credit risk assessment with comprehensive explainability features, designed for fintech companies and financial institutions.

🌟 Features

  • Advanced ML Pipeline: Multiple algorithms with hyperparameter tuning
  • Explainable AI: SHAP-based explanations for model decisions
  • Production API: FastAPI backend with comprehensive endpoints
  • Interactive Dashboard: Streamlit interface for risk analysis
  • Containerized Deployment: Docker and Docker Compose ready
  • Feature Engineering: 23+ engineered features from financial data
  • Real-time Predictions: Single and batch prediction capabilities
  • Model Monitoring: Performance tracking and validation

πŸ—οΈ Architecture

Credit Default Prediction System
β”œβ”€β”€ Data Pipeline
β”‚   β”œβ”€β”€ Data Ingestion (UCI Dataset)
β”‚   β”œβ”€β”€ Data Validation (Schema + Drift Detection)
β”‚   └── Data Transformation (Feature Engineering)
β”œβ”€β”€ ML Pipeline
β”‚   β”œβ”€β”€ Model Training (Multiple Algorithms)
β”‚   β”œβ”€β”€ Hyperparameter Tuning (GridSearchCV)
β”‚   └── Model Evaluation (Cross-validation)
β”œβ”€β”€ Explainability Layer
β”‚   β”œβ”€β”€ SHAP Global Explanations
β”‚   β”œβ”€β”€ SHAP Local Explanations
β”‚   └── Interactive Visualizations
β”œβ”€β”€ API Layer
β”‚   β”œβ”€β”€ FastAPI Backend
β”‚   β”œβ”€β”€ Prediction Endpoints
β”‚   └── Explanation Endpoints
β”œβ”€β”€ Frontend Layer
β”‚   β”œβ”€β”€ Streamlit Dashboard
β”‚   β”œβ”€β”€ Interactive Risk Assessment
β”‚   └── Batch Processing Interface
└── Deployment Layer
    β”œβ”€β”€ Docker Containerization
    β”œβ”€β”€ CI/CD Pipeline Support
    └── Cloud Deployment Ready

πŸš€ Quick Start

Prerequisites

  • Python 3.10+
  • Git
  • 4GB+ RAM recommended
  • Docker & Docker Compose (optional)

1. Clone & Setup

# Clone the repository
git clone <repository-url>
cd credit_default_prediction

# Run setup script
chmod +x scripts/setup.sh
./scripts/setup.sh

2. Train the Model

# Activate virtual environment
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate    # Windows

# Run training pipeline
python src/credit_default/pipeline/training_pipeline.py
# or
./scripts/train.sh

3. Start Services

Option 1: Using Docker Compose (Recommended)

cd deployment
docker-compose up -d

Option 2: Manual Startup

# Start API server (Terminal 1)
./scripts/start_api.sh

# Start Dashboard (Terminal 2)
./scripts/start_dashboard.sh

4. Access Applications

πŸ“Š Model Performance

Algorithm Accuracy Precision Recall F1-Score ROC-AUC
XGBoost 0.823 0.756 0.689 0.721 0.891
Random Forest 0.816 0.741 0.678 0.708 0.885
Gradient Boosting 0.819 0.748 0.672 0.708 0.887
Logistic Regression 0.801 0.695 0.634 0.663 0.856

Best Model: XGBoost with hyperparameter tuning

🧠 Explainable AI Features

Global Explanations

  • Feature Importance: Ranking of most predictive features
  • SHAP Summary Plots: Overall impact of features across all predictions
  • Partial Dependence Plots: Feature effect visualization

Local Explanations

  • SHAP Force Plots: Individual prediction breakdown
  • SHAP Waterfall Plots: Step-by-step prediction explanation
  • Feature Contribution Analysis: Positive/negative impact identification

πŸ“ Project Structure

credit_default_prediction/
β”œβ”€β”€ src/
β”‚   └── credit_default/
β”‚       β”œβ”€β”€ components/           # ML pipeline components
β”‚       β”œβ”€β”€ configuration/        # Configuration management
β”‚       β”œβ”€β”€ constants/           # Project constants
β”‚       β”œβ”€β”€ entity/              # Data classes and entities
β”‚       β”œβ”€β”€ exception/           # Custom exception handling
β”‚       β”œβ”€β”€ logger/              # Logging configuration
β”‚       β”œβ”€β”€ pipeline/            # Training and prediction pipelines
β”‚       └── utils/               # Utility functions
β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ schema.yaml              # Data schema configuration
β”‚   └── model.yaml               # Model configuration
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/                     # Raw data files
β”‚   └── processed/               # Processed data files
β”œβ”€β”€ artifacts/                   # Pipeline artifacts
β”‚   β”œβ”€β”€ data_ingestion/
β”‚   β”œβ”€β”€ data_validation/
β”‚   β”œβ”€β”€ data_transformation/
β”‚   β”œβ”€β”€ model_trainer/
β”‚   └── explainer/
β”œβ”€β”€ api/
β”‚   └── fastapi_main.py          # FastAPI application
β”œβ”€β”€ dashboard/
β”‚   └── streamlit_dashboard.py   # Streamlit application
β”œβ”€β”€ deployment/
β”‚   β”œβ”€β”€ Dockerfile               # Docker configuration
β”‚   └── docker-compose.yml       # Multi-service orchestration
β”œβ”€β”€ scripts/                     # Execution scripts
β”œβ”€β”€ tests/                       # Unit and integration tests
β”œβ”€β”€ logs/                        # Application logs
└── notebooks/                   # Jupyter notebooks for EDA

πŸ”Œ API Endpoints

Prediction Endpoints

# Single prediction
POST /predict
Content-Type: application/json

{
  "LIMIT_BAL": 200000,
  "SEX": 2,
  "EDUCATION": 2,
  "MARRIAGE": 1,
  "AGE": 35,
  "PAY_0": 1,
  "PAY_2": 2,
  ...
}

Explanation Endpoints

# Get prediction with explanation
POST /explain
Content-Type: application/json

# Same payload as /predict
# Returns SHAP explanations + visualizations

Batch Processing

# Batch predictions
POST /batch-predict
Content-Type: multipart/form-data

# Upload CSV file with customer data
# Returns batch predictions + summary statistics

Utility Endpoints

GET /health              # Health check
GET /model-info          # Model metadata
POST /sample-prediction  # Generate sample prediction
GET /feature-schema      # Get input schema

πŸŽ›οΈ Dashboard Features

Single Customer Analysis

  • Risk Assessment Form: Input customer details
  • Real-time Prediction: Instant risk scoring
  • SHAP Explanations: Feature contribution analysis
  • Risk Visualization: Gauge charts and indicators

Batch Processing

  • CSV Upload: Process multiple customers
  • Summary Statistics: Aggregate risk metrics
  • Risk Distribution: Visual analytics
  • Export Results: Download predictions

Model Analytics

  • Performance Metrics: Model evaluation scores
  • Feature Importance: Global explanations
  • Model Information: Architecture details

🐳 Deployment Options

Local Development

# Quick start with Docker
docker-compose up -d

# Manual startup
source venv/bin/activate
python api/fastapi_main.py &
streamlit run dashboard/streamlit_dashboard.py

Production Deployment

# Build production images
docker build -t credit-default-api .
docker build -t credit-default-dashboard .

# Deploy with orchestration
docker-compose -f docker-compose.prod.yml up -d

πŸ”§ Configuration

Model Configuration (config/model.yaml)

  • Data source settings
  • Feature engineering parameters
  • Model hyperparameters
  • Training configurations

Schema Configuration (config/schema.yaml)

  • Input feature definitions
  • Data validation rules
  • Quality thresholds

πŸ§ͺ Testing

# Run all tests
python -m pytest tests/ -v

# Run with coverage
python -m pytest tests/ --cov=./ --cov-report=html

# Run specific test categories
python -m pytest tests/test_components.py -v
python -m pytest tests/test_pipeline.py -v

πŸ“ˆ Monitoring & Observability

Application Monitoring

  • Health checks and service monitoring
  • Performance metrics tracking
  • Error tracking and alerting

Model Monitoring

  • Prediction drift detection
  • Model performance tracking
  • Feature importance monitoring

πŸ”’ Security & Compliance

Data Security

  • Input validation and sanitization
  • Secure API endpoints
  • Data encryption support

Model Security

  • Model versioning and artifact management
  • Prediction audit trails
  • Explainability for compliance

🀝 Contributing

  1. Fork the repository
  2. Create feature branch: git checkout -b feature/amazing-feature
  3. Commit changes: git commit -m 'Add amazing feature'
  4. Push to branch: git push origin feature/amazing-feature
  5. Open Pull Request

Development Guidelines

  • Follow PEP 8 style guidelines
  • Add unit tests for new features
  • Update documentation for API changes
  • Ensure Docker build passes

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • UCI Machine Learning Repository for the Credit Default Dataset
  • Open Source Community for the excellent libraries and frameworks
  • SHAP for explainable AI capabilities

πŸ“ž Support

For questions, issues, or contributions:

  1. Check existing issues: GitHub Issues
  2. Create new issue: Detailed bug reports or feature requests
  3. Documentation: Check the /docs directory for detailed guides

πŸš€ Future Enhancements

Technical Improvements

  • Real-time model retraining pipeline
  • A/B testing framework for model variants
  • Advanced ensemble methods
  • GPU acceleration support

Business Features

  • Multi-model comparison interface
  • Custom risk threshold settings
  • Regulatory compliance reporting
  • Integration with external credit bureaus

Built for Production-Ready ML Engineering

This project demonstrates enterprise-grade ML engineering practices suitable for fintech environments.

About

End-to-End ML Dashboard for Credit Risk Assessment with SHAP Explainer provides real-time credit default risk prediction with detailed explanations to help financial institutions make informed lending decisions.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages