Credit Risk Classification System

A machine learning application for predicting credit risk using the German Credit Risk dataset. The system includes model training, evaluation, and an interactive web interface for predictions with LLM-powered strategic recommendations.

🗺️ Roadmap

Current Status

✅ Basic Gradio web interface
✅ Random Forest + SMOTE model (F1=0.84, ROC AUC=0.758)
✅ LLM-powered recommendations
⚠️ Purpose feature not included in model

Planned Enhancements

1. Add Purpose Feature

Add 'Purpose' to features in notebook
Retrain model with 10 features (including Purpose)
Update service.py to accept purpose parameter
Add purpose dropdown to Gradio app

2. Add Preprocessing to ML Pipeline

Modify pipeline to include preprocessing (ColumnTransformer with OneHotEncoder)
Remove manual mappings from service.py - let pipeline handle encoding
Pass raw string values directly to model

3. Add FastAPI API

Create api.py with FastAPI
Endpoints:
- POST /predict - Single prediction
- POST /predict/batch - Batch predictions
- GET /model/info - Model metadata
- GET /health - Health check
Share prediction logic with Gradio via service.py

4. Documentation & Demo

Add screenshots/GIFs to /docs folder:
- docs/demo.gif - Quick demo of the app
- docs/screenshot-prediction.png - Make Prediction tab
- docs/screenshot-showcase.png - Model Showcase tab
Update README with image embeds
Add API documentation endpoint (/docs - FastAPI auto-generated)
Create POSTMAN_COLLECTION.json for API testing
Add deployment instructions (Render, Railway, etc.)

5. Project Structure (After Roadmap)

├── app.py              # Gradio web app
├── api.py              # FastAPI REST API
├── service.py          # Shared prediction logic
├── prompt.py           # LLM prompts
├── requirements.txt    # Dependencies (add: fastapi, uvicorn)
├── docs/               # Documentation & demos
│   ├── demo.gif
│   ├── screenshot-prediction.png
│   ├── screenshot-showcase.png
│   └── POSTMAN_COLLECTION.json
└── artifacts/
    └── rf_smote_pipeline.pkl  # Includes preprocessing

🎯 Features

Machine Learning Pipeline: Random Forest classifier with SMOTE for handling imbalanced data
Model Evaluation: ROC curves, confusion matrices, and comprehensive metrics
Web Interface: Interactive Gradio app for predictions
REST API: FastAPI endpoints for programmatic access
LLM Integration: Strategic recommendations via OpenAI-compatible API (rendered in Markdown)
Feature Engineering: Credit amount per month derived feature + Purpose classification

📊 Model Performance

Metric	Score
F1 Score (Bad)	0.84
ROC AUC	0.758
Recall (Bad)	85.7%
Precision (Bad)	82.8%

🗂️ Project Structure

credit-risk-classification/
├── app.py                    # Gradio web application
├── api.py                    # FastAPI REST API (planned)
├── service.py                # Shared prediction service
├── prompt.py                 # LLM prompt templates
├── requirements.txt          # Python dependencies
├── pyproject.toml            # Project config
├── .env                      # Environment variables (API keys)
├── german_credit_data.csv    # Original dataset
├── credit_risk_analysis.ipynb # Model training notebook
├── README.md                 # This file
├── docs/                     # Documentation & demos
│   ├── demo.gif
│   ├── screenshot-prediction.png
│   ├── screenshot-showcase.png
│   ├── screenshot-api.png
│   └── POSTMAN_COLLECTION.json
└── artifacts/                # Model artifacts (generated by notebook)
    ├── rf_smote_pipeline.pkl    # Trained model pipeline
    ├── metrics.pkl              # Model evaluation metrics
    └── feature_names.pkl        # Feature names

🚀 Quick Start

Note: Model artifacts (artifacts/*.pkl) are not included in the repo. Run credit_risk_analysis.ipynb to train the model and generate them.

1. Install Dependencies

pip install -r requirements.txt

2. Configure API Key

Edit the .env file:

INCEPTION_API_KEY=your-api-key-here

3. Run the App

python app.py

The app will open in your browser at http://localhost:7860.

4. Run the API (Optional)

uvicorn api:app --reload

The API will be available at http://localhost:8000. Visit /docs for interactive API documentation.

🖥️ Demo & Screenshots

Gradio Web Interface

Model Showcase	Make Prediction

Quick Demo

API Documentation

See the /docs folder for more screenshots and the Postman collection.

📖 How to Use

Model Showcase Tab

View model performance metrics
Interactive ROC curve
Confusion matrix visualization

Make Prediction Tab

Select applicant features:
- Age (slider: 18-80)
- Sex (male/female)
- Job type (0-3)
- Housing (own/rent/free)
- Savings account
- Checking account
- Credit amount
- Duration (months)
- Purpose (car, furniture, etc.)
Click "Analyze & Get Recommendation"
View:
- Risk prediction (HIGH RISK / LOW RISK)
- Confidence percentage
- Strategic LLM recommendation (Markdown formatted)

🔧 Technical Details

Model Pipeline

Pipeline([
    ('scaler', StandardScaler()),
    ('smote', SMOTE(random_state=42)),
    ('classifier', RandomForestClassifier(
        n_estimators=200,
        max_depth=8,
        class_weight='balanced',
        random_state=42
    ))
])

Feature Mappings

Feature	Values
Sex	female → 0, male → 1
Housing	free → 0, own → 1, rent → 2
Savings	little → 0, moderate → 1, quite rich → 2, rich → 3, none → 4
Checking	little → 0, moderate → 1, rich → 2, none → 3
Job	0 (unskilled), 1 (skilled), 2-3 (highly skilled)

Derived Features

credit_per_month: Credit amount / Duration

🛠️ Dependencies

gradio
fastapi
uvicorn
openai
python-dotenv
joblib
pandas
numpy
scikit-learn
imbalanced-learn (SMOTE)
matplotlib
xgboost

📝 License

This project is for educational purposes.

👤 Author

André Costa

🙏 Acknowledgments

German Credit Risk Dataset
scikit-learn for ML tools
Gradio for web interface
OpenAI-compatible LLM for recommendations

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
credit_risk_analysis.ipynb		credit_risk_analysis.ipynb
german_credit_data.csv		german_credit_data.csv
prompt.py		prompt.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
service.py		service.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Credit Risk Classification System

🗺️ Roadmap

Current Status

Planned Enhancements

1. Add Purpose Feature

2. Add Preprocessing to ML Pipeline

3. Add FastAPI API

4. Documentation & Demo

5. Project Structure (After Roadmap)

🎯 Features

📊 Model Performance

🗂️ Project Structure

🚀 Quick Start

1. Install Dependencies

2. Configure API Key

3. Run the App

4. Run the API (Optional)

🖥️ Demo & Screenshots

Gradio Web Interface

Quick Demo

API Documentation

📖 How to Use

Model Showcase Tab

Make Prediction Tab

🔧 Technical Details

Model Pipeline

Feature Mappings

Derived Features

🛠️ Dependencies

📝 License

👤 Author

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages