Skip to content

will03216/Intro_to_machine_learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Decision Tree Analysis - WiFi Localization

Requirements

  • Python 3.7+
  • NumPy
  • Matplotlib

Installation

pip install numpy matplotlib

Usage

1. Complete Analysis Pipeline (Default)

Runs the full decision tree analysis including tree building, visualization, and cross-validation on both clean and noisy datasets.

python3 run.py
python3 run.py --random_seed 42

What it does:

  • Loads clean dataset and splits into train/test (80/20)
  • Builds decision tree on training data
  • Generates tree visualization
  • Performs 10-fold cross-validation on clean dataset
  • Performs 10-fold cross-validation on noisy dataset
  • Saves evaluation plots and metrics

Options:

  • --random_seed INT: Set random seed for reproducible results (default: -1, 42 can be used to exactly replicate results in report)

Output Files: Output files are stored in location ./CW1/docs/figures

  • tree_clean.png: Decision tree visualization
  • decision_tree_evaluation_clean.png: Clean data evaluation metrics
  • decision_tree_evaluation_noisy.png: Noisy data evaluation metrics

2. Build and Evaluate Single Decision Tree

Builds a single decision tree on specified dataset and evaluates its performance.

python3 run.py build_tree
python3 run.py build_tree --data_path wifi_db/noisy_dataset.txt --visualize
python3 run.py build_tree --train_split 0.7 --random_seed 123

What it does:

  • Loads specified dataset
  • Splits data according to train_split ratio
  • Builds decision tree on training data
  • Evaluates tree on test data
  • Optionally generates tree visualization
  • Prints accuracy and performance metrics

Options:

  • --data_path PATH: Dataset file path (Data must be present in location ./CW1/data/ and path supplied must be relative to this location)
  • --train_split FLOAT: Training data ratio, 0.1 - 0.9 (default: 0.8)
  • --random_seed INT: Random seed for reproducibility (default: -1)
  • --visualize: Generate and save tree visualization

Output:

  • Console: Accuracy, confusion matrix, precision, recall, F1-scores
  • Optional: Tree visualization PNG file

3. K-Fold Cross-Validation

Performs k-fold cross-validation on specified dataset and reports averaged metrics.

python3 run.py cross_validate
python3 run.py cross_validate --k 5 --data_path wifi_db/noisy_dataset.txt
python3 run.py cross_validate --k 10 --random_seed 42

What it does:

  • Loads specified dataset
  • Performs k-fold cross-validation
  • Trains decision tree on each training fold
  • Evaluates on each test fold
  • Computes averaged metrics across all folds
  • Generates evaluation visualization

Options:

  • --k INT: Number of cross-validation folds, minimum 2 (default: 10)
  • --data_path PATH: Dataset file path (default: wifi_db/clean_dataset.txt)
  • --random_seed INT: Random seed for reproducibility (default: -1)

Output:

  • Console: Average accuracy, confusion matrix, precision, recall, F1-scores
  • PNG file: Cross-validation evaluation plots

4. Help and Usage Information

python3 run.py --help
python3 run.py -h

What it shows:

  • Detailed command descriptions
  • All available options and their defaults
  • Usage examples for each command
  • Parameter validation information

Dataset Paths

  • Clean dataset: wifi_db/clean_dataset.txt
  • Noisy dataset: wifi_db/noisy_dataset.txt

Output Files Location

All generated files are saved in the docs/figures/ directory:

  • Tree visualizations: tree_*.png
  • Evaluation plots: decision_tree_evaluation_*.png

Validation and Error Handling

  • Training split must be between 0.1 and 0.9
  • Cross-validation folds must be ≥ 2
  • Warns for computationally expensive operations (k > 20)
  • Provides helpful error messages and suggestions

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages