Predicting loan default probability on a 2007β2015 Lending Clubβstyle dataset.
This repository shows a full, reproducible ML workflowβfrom data prep and EDA to model benchmarking and evaluation.
Given borrower and loan attributes, estimate the probability a loan will default.
This supports better risk-based decisions (pricing, approvals, limits) and model explainability for stakeholders.
- Logistic Regression β interpretable baseline
- Decision Tree (no CV) β simple, high variance
- Decision Tree (CV-tuned) β better bias/variance balance
- Random Forest β robust ensemble
| Model | ROC-AUC (test) | Notes |
|---|---|---|
| Decision Tree (CV-tuned) | ~0.84 | Best balance of accuracy & interpretability |
| Random Forest | ~0.83 | Strong generalisation, less transparent |
| Logistic Regression | ~0.76 | Reliable, easy to explain |
| Decision Tree (no CV) | ~0.64 | Overfits without tuning |
Full write-up, figures and confusion matrices:
report/credit_risk_analysis_report.pdf
Reproduce the pipeline innotebooks/credit_risk_models.ipynb.