- Defined three distance metrics (Cartesian, Manhattan, Minkowski of order 3) and built a KNN algorithm with steps for distance calculation, nearest neighbor retrieval, and prediction.
- Made predictions on test data using K values 1, 3, and 7.
- Implemented Leave One Out Evaluation for KNN with Cartesian distance, assessing algorithm's performance for different K values.
- Evaluated KNN performance after excluding 'age' data, highlighting the significance of age in predicting labels.
- Developed Gaussian Naive Bayes algorithm with steps for dataset separation, summarization, Gaussian Probability Density Function calculation, class probability calculation, and prediction.
- Made predictions on test data using the Gaussian Naive Bayes algorithm.
- Evaluated algorithm performance using Leave One Out Evaluation.
- Assessed algorithm performance after removing 'age' data.
- Compared the performance of KNN and Gaussian Naive Bayes, concluding that Gaussian Naive Bayes outperforms KNN.
Title: Exploring Linear Regression and Function Depth Impact
Description: Developed linear regression models with varying function depths up to 6, applied regression to generated data, and assessed model performance by evaluating errors on test data. Found depth 4 to be the best fit due to its minimized mean square error, although acknowledging potential limitations of small datasets in affecting model reliability.
Title: Investigating Locally Weighted Linear Regression and Dataset Size Effects
Description: Explored locally weighted linear regression for 1-dimensional data, applied the method to generated data, and compared its performance with the linear regression model. Noted that the locally weighted model outperformed the linear regression model on the test data. Additionally, examined the impact of dataset size reduction on model performance and observed increased mean squared error and reduced fit quality. Concluded that the original data might not adhere to the assumed function format.
Title: Classification Analysis using Logistic Regression: Comparing Performance and Feature Impact
Description: Implemented logistic regression to classify data based on height, weight, and age. Created visualization plots for separation boundaries and data points. Evaluated the logistic regression model's performance using leave-one-out validation and compared results with KNN and Naïve Bayes classifiers. Found the logistic regression model outperformed KNN and slightly surpassed Naïve Bayes in accuracy (70.83% vs. 63.33% and 70%). Evaluated model performance after removing the age feature, observing decreased accuracy compared to KNN and Naïve Bayes due to reduced dimensionality, indicating that the latter two models perform better in lower-dimensional scenarios.
Description: This project focuses on the implementation of a decision tree algorithm using Python. The primary objective is to develop a working decision tree model and demonstrate comprehension of concepts such as information gain, entropy calculations, and data splitting. The provided run_code.sh script should be used to execute the Python code. Ensure code comments are extensively added to clarify critical sections, including entropy calculation, information gain, split evaluation, and threshold determination. The project includes a reference decision tree output (decision_tree_output.png), which might differ from your implementation. The dummy_sample_output.txt provides an example of the expected output format. Successful completion of this project will showcase your understanding of decision trees and effective coding practices.