Skip to content

andrecodea/credit-risk-classification-system

Repository files navigation

Credit Risk Classification System

A machine learning application for predicting credit risk using the German Credit Risk dataset. The system includes model training, evaluation, and an interactive web interface for predictions with LLM-powered strategic recommendations.

🗺️ Roadmap

Current Status

  • ✅ Basic Gradio web interface
  • ✅ Random Forest + SMOTE model (F1=0.84, ROC AUC=0.758)
  • ✅ LLM-powered recommendations
  • ⚠️ Purpose feature not included in model

Planned Enhancements

1. Add Purpose Feature

  • Add 'Purpose' to features in notebook
  • Retrain model with 10 features (including Purpose)
  • Update service.py to accept purpose parameter
  • Add purpose dropdown to Gradio app

2. Add Preprocessing to ML Pipeline

  • Modify pipeline to include preprocessing (ColumnTransformer with OneHotEncoder)
  • Remove manual mappings from service.py - let pipeline handle encoding
  • Pass raw string values directly to model

3. Add FastAPI API

  • Create api.py with FastAPI
  • Endpoints:
    • POST /predict - Single prediction
    • POST /predict/batch - Batch predictions
    • GET /model/info - Model metadata
    • GET /health - Health check
  • Share prediction logic with Gradio via service.py

4. Documentation & Demo

  • Add screenshots/GIFs to /docs folder:
    • docs/demo.gif - Quick demo of the app
    • docs/screenshot-prediction.png - Make Prediction tab
    • docs/screenshot-showcase.png - Model Showcase tab
  • Update README with image embeds
  • Add API documentation endpoint (/docs - FastAPI auto-generated)
  • Create POSTMAN_COLLECTION.json for API testing
  • Add deployment instructions (Render, Railway, etc.)

5. Project Structure (After Roadmap)

├── app.py              # Gradio web app
├── api.py              # FastAPI REST API
├── service.py          # Shared prediction logic
├── prompt.py           # LLM prompts
├── requirements.txt    # Dependencies (add: fastapi, uvicorn)
├── docs/               # Documentation & demos
│   ├── demo.gif
│   ├── screenshot-prediction.png
│   ├── screenshot-showcase.png
│   └── POSTMAN_COLLECTION.json
└── artifacts/
    └── rf_smote_pipeline.pkl  # Includes preprocessing

🎯 Features

  • Machine Learning Pipeline: Random Forest classifier with SMOTE for handling imbalanced data
  • Model Evaluation: ROC curves, confusion matrices, and comprehensive metrics
  • Web Interface: Interactive Gradio app for predictions
  • REST API: FastAPI endpoints for programmatic access
  • LLM Integration: Strategic recommendations via OpenAI-compatible API (rendered in Markdown)
  • Feature Engineering: Credit amount per month derived feature + Purpose classification

📊 Model Performance

Metric Score
F1 Score (Bad) 0.84
ROC AUC 0.758
Recall (Bad) 85.7%
Precision (Bad) 82.8%

🗂️ Project Structure

credit-risk-classification/
├── app.py                    # Gradio web application
├── api.py                    # FastAPI REST API (planned)
├── service.py                # Shared prediction service
├── prompt.py                 # LLM prompt templates
├── requirements.txt          # Python dependencies
├── pyproject.toml            # Project config
├── .env                      # Environment variables (API keys)
├── german_credit_data.csv    # Original dataset
├── credit_risk_analysis.ipynb # Model training notebook
├── README.md                 # This file
├── docs/                     # Documentation & demos
│   ├── demo.gif
│   ├── screenshot-prediction.png
│   ├── screenshot-showcase.png
│   ├── screenshot-api.png
│   └── POSTMAN_COLLECTION.json
└── artifacts/                # Model artifacts (generated by notebook)
    ├── rf_smote_pipeline.pkl    # Trained model pipeline
    ├── metrics.pkl              # Model evaluation metrics
    └── feature_names.pkl        # Feature names

🚀 Quick Start

Note: Model artifacts (artifacts/*.pkl) are not included in the repo. Run credit_risk_analysis.ipynb to train the model and generate them.

1. Install Dependencies

pip install -r requirements.txt

2. Configure API Key

Edit the .env file:

INCEPTION_API_KEY=your-api-key-here

3. Run the App

python app.py

The app will open in your browser at http://localhost:7860.

4. Run the API (Optional)

uvicorn api:app --reload

The API will be available at http://localhost:8000. Visit /docs for interactive API documentation.

🖥️ Demo & Screenshots

Gradio Web Interface

Model Showcase Make Prediction
Model Showcase Make Prediction

Quick Demo

Demo

API Documentation

API Docs

See the /docs folder for more screenshots and the Postman collection.

📖 How to Use

Model Showcase Tab

  • View model performance metrics
  • Interactive ROC curve
  • Confusion matrix visualization

Make Prediction Tab

  1. Select applicant features:

    • Age (slider: 18-80)
    • Sex (male/female)
    • Job type (0-3)
    • Housing (own/rent/free)
    • Savings account
    • Checking account
    • Credit amount
    • Duration (months)
    • Purpose (car, furniture, etc.)
  2. Click "Analyze & Get Recommendation"

  3. View:

    • Risk prediction (HIGH RISK / LOW RISK)
    • Confidence percentage
    • Strategic LLM recommendation (Markdown formatted)

🔧 Technical Details

Model Pipeline

Pipeline([
    ('scaler', StandardScaler()),
    ('smote', SMOTE(random_state=42)),
    ('classifier', RandomForestClassifier(
        n_estimators=200,
        max_depth=8,
        class_weight='balanced',
        random_state=42
    ))
])

Feature Mappings

Feature Values
Sex female → 0, male → 1
Housing free → 0, own → 1, rent → 2
Savings little → 0, moderate → 1, quite rich → 2, rich → 3, none → 4
Checking little → 0, moderate → 1, rich → 2, none → 3
Job 0 (unskilled), 1 (skilled), 2-3 (highly skilled)

Derived Features

  • credit_per_month: Credit amount / Duration

🛠️ Dependencies

  • gradio
  • fastapi
  • uvicorn
  • openai
  • python-dotenv
  • joblib
  • pandas
  • numpy
  • scikit-learn
  • imbalanced-learn (SMOTE)
  • matplotlib
  • xgboost

📝 License

This project is for educational purposes.

👤 Author

  • André Costa

🙏 Acknowledgments

  • German Credit Risk Dataset
  • scikit-learn for ML tools
  • Gradio for web interface
  • OpenAI-compatible LLM for recommendations

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors