A machine learning application for predicting credit risk using the German Credit Risk dataset. The system includes model training, evaluation, and an interactive web interface for predictions with LLM-powered strategic recommendations.
- ✅ Basic Gradio web interface
- ✅ Random Forest + SMOTE model (F1=0.84, ROC AUC=0.758)
- ✅ LLM-powered recommendations
⚠️ Purpose feature not included in model
- Add
'Purpose'to features in notebook - Retrain model with 10 features (including Purpose)
- Update
service.pyto accept purpose parameter - Add purpose dropdown to Gradio app
- Modify pipeline to include preprocessing (ColumnTransformer with OneHotEncoder)
- Remove manual mappings from
service.py- let pipeline handle encoding - Pass raw string values directly to model
- Create
api.pywith FastAPI - Endpoints:
POST /predict- Single predictionPOST /predict/batch- Batch predictionsGET /model/info- Model metadataGET /health- Health check
- Share prediction logic with Gradio via
service.py
- Add screenshots/GIFs to
/docsfolder:docs/demo.gif- Quick demo of the appdocs/screenshot-prediction.png- Make Prediction tabdocs/screenshot-showcase.png- Model Showcase tab
- Update README with image embeds
- Add API documentation endpoint (
/docs- FastAPI auto-generated) - Create
POSTMAN_COLLECTION.jsonfor API testing - Add deployment instructions (Render, Railway, etc.)
├── app.py # Gradio web app
├── api.py # FastAPI REST API
├── service.py # Shared prediction logic
├── prompt.py # LLM prompts
├── requirements.txt # Dependencies (add: fastapi, uvicorn)
├── docs/ # Documentation & demos
│ ├── demo.gif
│ ├── screenshot-prediction.png
│ ├── screenshot-showcase.png
│ └── POSTMAN_COLLECTION.json
└── artifacts/
└── rf_smote_pipeline.pkl # Includes preprocessing
- Machine Learning Pipeline: Random Forest classifier with SMOTE for handling imbalanced data
- Model Evaluation: ROC curves, confusion matrices, and comprehensive metrics
- Web Interface: Interactive Gradio app for predictions
- REST API: FastAPI endpoints for programmatic access
- LLM Integration: Strategic recommendations via OpenAI-compatible API (rendered in Markdown)
- Feature Engineering: Credit amount per month derived feature + Purpose classification
| Metric | Score |
|---|---|
| F1 Score (Bad) | 0.84 |
| ROC AUC | 0.758 |
| Recall (Bad) | 85.7% |
| Precision (Bad) | 82.8% |
credit-risk-classification/
├── app.py # Gradio web application
├── api.py # FastAPI REST API (planned)
├── service.py # Shared prediction service
├── prompt.py # LLM prompt templates
├── requirements.txt # Python dependencies
├── pyproject.toml # Project config
├── .env # Environment variables (API keys)
├── german_credit_data.csv # Original dataset
├── credit_risk_analysis.ipynb # Model training notebook
├── README.md # This file
├── docs/ # Documentation & demos
│ ├── demo.gif
│ ├── screenshot-prediction.png
│ ├── screenshot-showcase.png
│ ├── screenshot-api.png
│ └── POSTMAN_COLLECTION.json
└── artifacts/ # Model artifacts (generated by notebook)
├── rf_smote_pipeline.pkl # Trained model pipeline
├── metrics.pkl # Model evaluation metrics
└── feature_names.pkl # Feature names
Note: Model artifacts (
artifacts/*.pkl) are not included in the repo. Runcredit_risk_analysis.ipynbto train the model and generate them.
pip install -r requirements.txtEdit the .env file:
INCEPTION_API_KEY=your-api-key-here
python app.pyThe app will open in your browser at http://localhost:7860.
uvicorn api:app --reloadThe API will be available at http://localhost:8000. Visit /docs for interactive API documentation.
| Model Showcase | Make Prediction |
|---|---|
![]() |
![]() |
See the /docs folder for more screenshots and the Postman collection.
- View model performance metrics
- Interactive ROC curve
- Confusion matrix visualization
-
Select applicant features:
- Age (slider: 18-80)
- Sex (male/female)
- Job type (0-3)
- Housing (own/rent/free)
- Savings account
- Checking account
- Credit amount
- Duration (months)
- Purpose (car, furniture, etc.)
-
Click "Analyze & Get Recommendation"
-
View:
- Risk prediction (HIGH RISK / LOW RISK)
- Confidence percentage
- Strategic LLM recommendation (Markdown formatted)
Pipeline([
('scaler', StandardScaler()),
('smote', SMOTE(random_state=42)),
('classifier', RandomForestClassifier(
n_estimators=200,
max_depth=8,
class_weight='balanced',
random_state=42
))
])| Feature | Values |
|---|---|
| Sex | female → 0, male → 1 |
| Housing | free → 0, own → 1, rent → 2 |
| Savings | little → 0, moderate → 1, quite rich → 2, rich → 3, none → 4 |
| Checking | little → 0, moderate → 1, rich → 2, none → 3 |
| Job | 0 (unskilled), 1 (skilled), 2-3 (highly skilled) |
credit_per_month: Credit amount / Duration
- gradio
- fastapi
- uvicorn
- openai
- python-dotenv
- joblib
- pandas
- numpy
- scikit-learn
- imbalanced-learn (SMOTE)
- matplotlib
- xgboost
This project is for educational purposes.
- André Costa
- German Credit Risk Dataset
- scikit-learn for ML tools
- Gradio for web interface
- OpenAI-compatible LLM for recommendations



