This project uses supervised machine learning techniques to predict the likelihood of early-stage diabetes in patients based on a set of clinical symptoms and demographic data.
The dataset is preprocessed through feature encoding, exploratory data analysis (EDA), and feature selection techniques. Key features are selected using chi-squared scoring and variance thresholding.
Multiple classification models are trained and evaluated, including Logistic Regression, Support Vector Machines (linear and RBF kernels), K-Nearest Neighbors, and Gaussian Naive Bayes. Model performance is assessed using accuracy, confusion matrices, cross-validation scores, and ROC curves.
The notebook demonstrates the full pipeline from raw data ingestion to model evaluation, showcasing how machine learning can assist in early medical diagnostics.