- Python 3.7+
- NumPy
- Matplotlib
pip install numpy matplotlibRuns the full decision tree analysis including tree building, visualization, and cross-validation on both clean and noisy datasets.
python3 run.py
python3 run.py --random_seed 42What it does:
- Loads clean dataset and splits into train/test (80/20)
- Builds decision tree on training data
- Generates tree visualization
- Performs 10-fold cross-validation on clean dataset
- Performs 10-fold cross-validation on noisy dataset
- Saves evaluation plots and metrics
Options:
--random_seed INT: Set random seed for reproducible results (default: -1, 42 can be used to exactly replicate results in report)
Output Files:
Output files are stored in location ./CW1/docs/figures
tree_clean.png: Decision tree visualizationdecision_tree_evaluation_clean.png: Clean data evaluation metricsdecision_tree_evaluation_noisy.png: Noisy data evaluation metrics
Builds a single decision tree on specified dataset and evaluates its performance.
python3 run.py build_tree
python3 run.py build_tree --data_path wifi_db/noisy_dataset.txt --visualize
python3 run.py build_tree --train_split 0.7 --random_seed 123What it does:
- Loads specified dataset
- Splits data according to train_split ratio
- Builds decision tree on training data
- Evaluates tree on test data
- Optionally generates tree visualization
- Prints accuracy and performance metrics
Options:
--data_path PATH: Dataset file path (Data must be present in location./CW1/data/and path supplied must be relative to this location)--train_split FLOAT: Training data ratio, 0.1 - 0.9 (default: 0.8)--random_seed INT: Random seed for reproducibility (default: -1)--visualize: Generate and save tree visualization
Output:
- Console: Accuracy, confusion matrix, precision, recall, F1-scores
- Optional: Tree visualization PNG file
Performs k-fold cross-validation on specified dataset and reports averaged metrics.
python3 run.py cross_validate
python3 run.py cross_validate --k 5 --data_path wifi_db/noisy_dataset.txt
python3 run.py cross_validate --k 10 --random_seed 42What it does:
- Loads specified dataset
- Performs k-fold cross-validation
- Trains decision tree on each training fold
- Evaluates on each test fold
- Computes averaged metrics across all folds
- Generates evaluation visualization
Options:
--k INT: Number of cross-validation folds, minimum 2 (default: 10)--data_path PATH: Dataset file path (default:wifi_db/clean_dataset.txt)--random_seed INT: Random seed for reproducibility (default: -1)
Output:
- Console: Average accuracy, confusion matrix, precision, recall, F1-scores
- PNG file: Cross-validation evaluation plots
python3 run.py --help
python3 run.py -hWhat it shows:
- Detailed command descriptions
- All available options and their defaults
- Usage examples for each command
- Parameter validation information
- Clean dataset:
wifi_db/clean_dataset.txt - Noisy dataset:
wifi_db/noisy_dataset.txt
All generated files are saved in the docs/figures/ directory:
- Tree visualizations:
tree_*.png - Evaluation plots:
decision_tree_evaluation_*.png
- Training split must be between 0.1 and 0.9
- Cross-validation folds must be ≥ 2
- Warns for computationally expensive operations (k > 20)
- Provides helpful error messages and suggestions