A comprehensive data science portfolio project comparing state-of-the-art deep learning architectures for automated glaucoma detection from fundus images. This repository showcases a complete experimental pipeline including hyperparameter tuning, K-fold cross-validation, ablation studies, and explainability analysis.
Glaucoma is the second leading cause of irreversible blindness worldwide, affecting over 80 million people. Early detection through automated analysis of fundus images can significantly reduce vision loss. This project presents a systematic comparison of CNN, Vision Transformer, and hybrid architectures for glaucoma detection, achieving 99.76% accuracy with the MaxViT-Tiny model.
- π State-of-the-Art Performance: 99.76% accuracy, 99.69% F1-score, and perfect 1.0000 AUC-ROC
- π¬ Comprehensive Benchmarking: Systematic comparison of 4 model architectures (CNN, ViT, Hybrid, SSL)
- ποΈ Hyperparameter Optimization: Grid search ensuring peak performance for each architecture
- π Robust Evaluation: 5-fold cross-validation with overfitting analysis
- π Explainability: Grad-CAM visualizations for model interpretability
- π§ͺ Ablation Studies: Systematic evaluation of preprocessing techniques
- π Academic Publication-Ready: Structured notebook following IMRaD format
This study benchmarks four distinct deep learning architectures:
- EfficientNetV2-S + Disc Crop - CNN baseline with domain-specific preprocessing
- DeiT-Small/16 - Vision Transformer baseline
- MaxViT-Tiny - Hybrid CNN-ViT architecture (Best Performer) β
- DINO SSL + Finetune - Self-supervised learning approach
| Model | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | AUC-ROC |
|---|---|---|---|---|---|
| MaxViT-Tiny | 99.76 | 99.82 | 99.55 | 99.69 | 1.0000 |
| EfficientNetV2-S + Disc Crop | 99.65 | 99.73 | 99.38 | 99.55 | 0.9999 |
| DeiT-Small/16 | 99.44 | 99.20 | 99.38 | 99.29 | 0.9998 |
| DINO SSL + Finetune | 81.80 | 85.75 | 63.93 | 73.25 | 0.8797 |
- β MaxViT-Tiny achieves best performance (99.76% accuracy) - hybrid architecture combining CNN and ViT benefits
- β Disc crop preprocessing is critical - provides +0.11% improvement over baseline
- β Excellent generalization - K-fold CV shows minimal overfitting (3.36% gap)
- β Hyperparameter tuning essential - 0.38-0.45% improvement over default configurations
.
βββ Glaucoma_Detection_model_benchmarking.ipynb # Main Jupyter notebook (portfolio showcase)
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ .gitignore # Git ignore rules
βββ images/ # Visualization images
βββ model_comparison.png
βββ roc_curves.png
βββ kfold_cv_results.png
βββ ablation_study.png
βββ gradcam_glaucoma.png
βββ gradcam_normal.png
- Python 3.8 or higher
- CUDA-capable GPU (optional, but recommended) or Apple Silicon (MPS)
- Jupyter Notebook or JupyterLab
-
Clone the repository:
git clone https://github.com/ShaonINT/Glaucoma_Detection.git cd Glaucoma_Detection -
Create a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Start Jupyter Notebook:
jupyter notebook
-
Open the notebook:
- Navigate to
Glaucoma_Detection_model_benchmarking.ipynb
- Navigate to
-
Run all cells:
- The notebook will automatically:
- Load and preprocess the dataset
- Perform hyperparameter tuning for each model
- Train all models with optimal hyperparameters
- Conduct 5-fold cross-validation
- Perform ablation studies
- Generate Grad-CAM visualizations
- Create comparison visualizations and reports
- The notebook will automatically:
The notebook follows academic publication standards (IMRaD format) and includes:
- Dataset statistics and visualization
- Preprocessing pipeline (disc crop, augmentation)
- Data loaders setup
- EfficientNetV2-S implementation
- DeiT-Small implementation
- MaxViT-Tiny implementation
- DINO SSL implementation
- Grid search for learning rate and weight decay
- Optimal hyperparameter identification
- Performance comparison across configurations
- Training with early stopping
- Validation monitoring
- Best model checkpointing
- 5-fold stratified cross-validation
- Overfitting analysis
- Generalization assessment
- Preprocessing component evaluation
- Disc crop impact analysis
- Augmentation effect assessment
- Test set performance metrics
- Confusion matrices
- ROC curves
- Model comparison
- Grad-CAM visualization
- Attention map generation
- Clinical relevance validation
- Grid Search: Learning rate [1Γ10β»β΄, 5Γ10β»β΅, 2Γ10β»β΅] Γ Weight decay [1Γ10β»β΄, 1Γ10β»β΅]
- Result: All top models converged to LR=1Γ10β»β΄
- Impact: 0.38-0.45% accuracy improvement over default configurations
- Method: 5-fold stratified cross-validation
- Purpose: Robust generalization assessment and overfitting detection
- Result: Minimal overfitting gaps (3.36-3.52%) across all models
- Configurations: 7 different preprocessing combinations
- Finding: Disc crop preprocessing provides most significant improvement (+0.11%)
- Insight: Aligns with clinical practice (optic disc is primary diagnostic region)
- Method: Grad-CAM visualization
- Finding: All models focus on optic disc region (clinically relevant)
- Impact: Validates model learning and enables clinical trust
Grad-CAM: Glaucoma Case |
Grad-CAM: Normal Case |
Grad-CAM Visualizations: Model attention maps showing focus on optic disc region
Data Source: Kaggle - Fundus Image Dataset for Glaucoma Detection
- Total Images: 17,242 fundus images
- Training Set: 8,621 images (5,293 normal, 3,328 glaucoma)
- Validation Set: 5,747 images (3,539 normal, 2,208 glaucoma)
- Test Set: 2,874 images (1,754 normal, 1,120 glaucoma)
- Class Balance: ~1.59:1 (Normal:Glaucoma)
- First systematic comparison of CNN, ViT, and hybrid architectures for glaucoma detection
- State-of-the-art performance (99.76% accuracy) exceeding previous studies
- Comprehensive evaluation including hyperparameter tuning, K-fold CV, and ablation studies
- Explainability integration through Grad-CAM visualizations
- Reproducible benchmark with complete code and methodology
- PyTorch: Deep learning framework
- timm: Pre-trained model library
- scikit-learn: Evaluation metrics
- matplotlib/seaborn: Visualization
- pandas: Data analysis
- Jupyter: Interactive development
This project demonstrates:
- β Advanced Model Architecture Knowledge: CNN, Vision Transformers, Hybrid models
- β Rigorous Evaluation Methodology: K-fold CV, hyperparameter tuning, ablation studies
- β Medical AI Expertise: Domain-specific preprocessing, clinical validation
- β Explainability: Grad-CAM visualizations for model interpretability
- β Statistical Rigor: Overfitting analysis, generalization assessment
- β Research Communication: Publication-ready notebook and paper
- GitHub: @ShaonINT
- Repository: Glaucoma_Detection
- Dataset: Fundus image dataset from Kaggle
- PyTorch and timm communities for excellent deep learning frameworks
- Vision Transformer research community for architectural innovations
Note: This repository focuses on the research methodology and benchmarking notebook. The dataset and trained models are excluded due to size constraints.
Disclaimer: This tool is designed for research and portfolio purposes. Always consult with qualified healthcare professionals for medical diagnosis and treatment decisions.





