Phishdec Project: Evaluating Phishing Email Classification Models

An artefact for the dissertation:

Phishing Email Detection: A Comparison of Traditional Naive Bayes and Ensemble Random Forest Classification Models using Curated Datasets

By Xue Ling Teh

Overview

Datasets*

phishing
validate
enron
ling
spamassassin
ceas_08
trec_05
trec_06
trec_07

*Oversized files have not been uploaded to GitHub remote due to the file size limit

Data preprocessing and cleaning

One of the following options:

Drop empty rows with null, NA or NaN values

OR

Impute/Fill NA or NaN values

Model Types

Naive Bayes (NB): Multinomial
Random Forest (RF)

Evaluation Metrics

Accuracy
Confusion Matrix
- True Positive (TP)
- True Negative (TN)
- False Positive (FP)
- False Negative (FN)
Classification Report:
- precision
- recall
- f1-score
- support: total number of occurrences of a specific class in the dataset
5×2 Cross-Validation Paired T-test
- t statistic
- p-value

Setup

Prerequisites

~= Python 3.13.0

Getting Started (Windows)

Create a virtual environment named venv: python -m venv venv
Install the required dependencies in requirements.txt:
1. python -m pip install -r requirements.txt
Activate the virtual environment: venv\Scripts\activate
Run the program: python main.py

To zip up this project: git archive --format=zip --output ./artefact.zip HEAD

Notes

Filename convention: title*_datasetnumber_randomstate.fileextension

*Title with more than one word has extra underscores (_)

e.g. nb_c_matrix_1_860.png indicates a png image of a Naive Bayes Confusion Matrix of Dataset 1 with random state 860

Default random state is 42.

References

Downey, A. (2022) Think Bayes 2 — Think Bayes. Available from: https://allendowney.github.io/ThinkBayes2/ [Accessed 19 October 2024].

Downey, A. (2024) Think Python — Think Python. Available from: https://allendowney.github.io/ThinkPython/ [Accessed 16 September 2024].

Chakraborty, S. (2023) Phishing Email Detection. DOI: https://doi.org/10.34740/kaggle/dsv/6090437.

Champa, A.I., Rabbi, M.F. & Zibran, M.F. (2024) Curated Datasets and Feature Analysis for Phishing Email Detection with Machine Learning. In: 2024 IEEE 3rd International Conference on Computing and Machine Intelligence (ICMI). April 2024 pp. 1–7. DOI: https://doi.org/10.1109/ICMI60790.2024.10585821.

Miltchev, R., Dimitar, R. & Evgeni, G. (2024) Phishing validation emails dataset. DOI: https://doi.org/10.5281/ZENODO.13474746.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M. & Duchesnay, É. (2011) Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 12(85): 2825–2830.

Toledo Jr, T. (2021) Statistical Tests for Comparing Classification Algorithms. 23 November 2021. Towards Data Science. Available from: https://towardsdatascience.com/statistical-tests-for-comparing-classification-algorithms-ac1804e79bb7/ [Accessed 12 March 2025].

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
graph		graph
model		model
other/results_using_dropna_instead_of_fillna		other/results_using_dropna_instead_of_fillna
result		result
.gitignore		.gitignore
README.md		README.md
main.py		main.py
model.py		model.py
requirements.txt		requirements.txt
utility.py		utility.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phishdec Project: Evaluating Phishing Email Classification Models

Overview

Datasets*

Data preprocessing and cleaning

Model Types

Evaluation Metrics

Setup

Prerequisites

Getting Started (Windows)

Notes

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

xcelt/phishdec

Folders and files

Latest commit

History

Repository files navigation

Phishdec Project: Evaluating Phishing Email Classification Models

Overview

Datasets*

Data preprocessing and cleaning

Model Types

Evaluation Metrics

Setup

Prerequisites

Getting Started (Windows)

Notes

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages