This repository contains a project to recognize human activities using machine learning techniques. The model is trained and tested on the UCI Human Activity Recognition dataset. The pipeline involves feature selection, classifier training, and testing.
HAR.ipynb: Notebook containing the training code, including data preprocessing, feature selection, and model training.test.ipynb: Notebook used to test the trained model on new or unseen data.utils.py: Python script that contains utility functions such asremoveDuplicateColumns, which is used to clean the dataset by removing redundant features.model.pkl: Trained Random Forest model saved in pickle format usingjoblib, which can be loaded and used in thetest.ipynbfor making predictions.
The dataset used in this project is the UCI Human Activity Recognition dataset, which can be downloaded from this link.
The model training approach is organized as follows:
- The activity labels (output) are encoded into numerical values to be used by the classifier.
- The dataset is checked for redundant columns, which are removed using the
removeDuplicateColumnsfunction from theutils.pyfile.
SelectKBestis applied to select the top k most important features based on statistical tests.
- Recursive Feature Elimination (RFE) is applied using the Random Forest Classifier to further refine the feature selection and remove the least important features iteratively.
- A Random Forest Classifier is trained using the selected features. Random Forest is chosen due to its robustness and ability to handle complex datasets effectively.
- Steps 2 to 5 are organized into a pipeline, ensuring a streamlined and reproducible process for data preparation, feature selection, and model training.
- The trained model is exported into a
.pklfile usingjoblibfor later use in thetest.ipynbnotebook for testing and predictions.
- Open and run
HAR.ipynbto:- Preprocess the data.
- Train the Random Forest model.
- Export the trained model to a
.pklfile.
- Open and run
test.ipynbto:- Load the saved model (
pipeline.pkl). - Test the model on new data or the test set.
- Load the saved model (
Make sure to install the following dependencies:
pip install numpy pandas seaborn matplotlib scikit-learn joblib