This project focuses on classifying electronic music tracks using machine learning techniques. The data is sourced from BeatsDataset, and the model is built using popular Python libraries such as pandas, sklearn, matplotlib, and more.
You can find a detailed implementation of this project in my Kaggle notebook.
-
Data Preprocessing:
- The dataset is loaded using
pandasand processed to clean and prepare the features for model training. - Categorical data is handled using the
OneHotEncoderfromsklearn.preprocessingand combined with numerical data usingColumnTransformerfromsklearn.compose.
- The dataset is loaded using
-
Data Splitting:
- The dataset is split into training and testing sets using
train_test_splitfromsklearn.model_selection.
- The dataset is split into training and testing sets using
-
Feature Scaling:
- The features are scaled using the
StandardScalerfromsklearn.preprocessingto ensure that all features contribute equally to the model.
- The features are scaled using the
-
Model Training:
- The classification model chosen for this task is the
KNeighborsClassifierfromsklearn.neighbors. - Model hyperparameters are tuned using cross-validation to achieve optimal performance.
- The classification model chosen for this task is the
-
Model Evaluation:
- The model's performance is evaluated using accuracy metrics and visualized using
matplotlib.pyplot.
- The model's performance is evaluated using accuracy metrics and visualized using
- pandas: For data manipulation and analysis.
- KNeighborsClassifier: A simple yet effective machine learning algorithm used for classification tasks.
- OneHotEncoder: For encoding categorical features.
- matplotlib.pyplot: For plotting and visualizing data and results.
- train_test_split: For splitting the dataset into training and testing subsets.
- ColumnTransformer: To apply different preprocessing steps to different columns.
- sklearn.preprocessing: Provides preprocessing utilities like scaling and encoding.
- sklearn.compose: Helps in combining multiple feature transformations into a single pipeline.
The dataset used in this project is the BeatsDataset, which contains various features describing electronic music tracks.