This project uses a variety of machine learning models—Logistic Regression, Decision Trees, Random Forests, K-Nearest Neighbors, Support Vector Machines, and Neural Networks—to predict credit default based on a provided dataset.
The data is a random sample of loans issued on the platform between 2007− 2015, including the loan status and payment information. The data also contains a number of predictors that have been documented in the variables description file provided to you named “variable description.csv”.
- pandas
- numpy
- scikit-learn -seaborn
- matplotlib
- tensorflow
- xgboost
The final output prints the optimal threshold for each model, which maximizes net profit based on business impact. It also displays the accuracy and best-tuned parameters for each model. Additionally, an Excel file is generated that includes predictions of default or non-default for the test sample for each model, along with a sheet detailing the net profit and total cost corresponding to the optimal threshold for each model.