In this project, we explore the Penguins dataset, focusing on two species: Adelie and Gentoo. The objective is to compare the performance of two classification models:
- Naive Bayes (Generative Model)
- Logistic Regression (Discriminative Model)
We use these models to classify penguins into one of the two species based on various features, such as:
- Bill length
- Bill depth
- Flipper length
- Body mass
- Sex
The goal is to evaluate and compare the performance of these models using common classification metrics such as accuracy, precision, recall, ROC AUC score and lift and gain graphs.
To further compare the performance of the two classification models, we expand the analysis to a multiclass continuous problem to investigate the advantages and disadvanges of generative and discriminative models.
This dataset is commonly available for download and is not included in this repository.
This project requires the following Python libraries:
pandasseabornmatplotlibscikit-learn
You can install the required libraries using the following:
pip install pandas seaborn matplotlib scikit-learn