This project develops deep learning models to automatically colorize grayscale bird images. We explore two main approaches: a Naive CNN Model and an advanced GAN-based Model to generate realistic colorized images.
- Source: UK Garden Birds Dataset from Kaggle
- Content: 20 species of UK garden birds in natural settings
- Image Size: All images resized to 128×128 pixels
- Task: Convert grayscale input → RGB color output
The project consists of three Jupyter notebooks:
Contains three CNN autoencoder experiments with progressive improvements:
- Experiment 1: Basic encoder-decoder with Conv2D and upsampling layers
- Experiment 2: Enhanced with dropout layers to prevent overfitting
- Experiment 3: Advanced model with batch normalization (best performing naive model)
Each model converts grayscale (1-channel) input to RGB (3-channel) output using MSE loss.
Advanced conditional GAN (CGAN) approach with two different generator architectures:
- Model 1: Pix2Pix U-Net generator with PatchGAN discriminator (47 epochs)
- Model 2: MobileNetV2 transfer learning generator with improved discriminator (115 epochs)
Uses adversarial loss + L1 reconstruction loss for realistic colorization.
Based on: Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE conference on computer vision and pattern recognition, 1125-1134. Paper Link
Evaluation notebook that compares the two best models:
- Architecture 1 Model 3: Best naive CNN model
- Architecture 2 GAN Model: Best GAN-based model
Downloads pre-trained weights and provides side-by-side comparison on test images.
- Google Colab Account (Recommended environment)
- Kaggle Account for dataset access
- Libraries are automatically installed via
!pip installcommands in the notebooks
-
Upload notebooks to Google Drive or open directly in Colab
-
Start with Architecture 1 (Naive Model):
- Open
Arch-1-ATDL-FinalProject.ipynbin Google Colab - Run the installation cells to install required libraries
- Authenticate with Kaggle when prompted
- Run cells sequentially to train all 3 experiments
- Open
-
Run Architecture 2 (GAN Model):
- Open
Arc-2-ATDL-FinalProject.ipynbin Google Colab - Run installation and authentication cells
- Train Pix2Pix U-Net model (Model 1) - approximately 47 epochs
- Train MobileNetV2 model (Model 2) - approximately 115 epochs
- Open
-
Test and Compare Models:
- Open
Test-ATDL-FinalProject.ipynbin Google Colab - Pre-trained models are automatically downloaded via
gdown - Upload test images for colorization comparison
- View side-by-side results of both architectures
- Open
If running locally, ensure you have:
- Python 3.7+
- TensorFlow/Keras
- Required libraries:
numpy,matplotlib,pillow,scipy,kagglehub,gdown
- Simple Architecture: Basic encoder-decoder structure
- Progressive Enhancement: Adding dropout and batch normalization
- Fast Training: 20-50 epochs depending on experiment
- Loss Function: Mean Squared Error (MSE)
- Advanced Architecture: U-Net generators with discriminators
- Transfer Learning: MobileNetV2 pre-trained encoder
- Sophisticated Loss: Adversarial + L1 reconstruction loss (λ=100)
- Extended Training: 47-115 epochs for optimal results
- Optimizer: Adam (lr=2e-4)
- Batch Size: 32
- Epochs: 20-50 per experiment
- Data Augmentation: Basic preprocessing
- Generator Optimizer: Adam (lr=2e-4, β₁=0.5)
- Discriminator Optimizer: Adam (lr=1e-4, β₁=0.5)
- Batch Size: 10
- Advanced Augmentation: Random jitter, horizontal flip, rotations
- Training Strategy: Progressive phases with manual evaluation
- Architecture 1 Experiment 3: Best naive approach with batch normalization
- Architecture 2 Model 2: MobileNetV2-based generator (115 epochs)
- Naive Models: Fast training but limited color saturation and detail
- GAN Models: Superior visual quality with more realistic and vibrant colorization
- Transfer Learning: MobileNetV2 encoder significantly improved feature extraction
- Training Balance: Careful discriminator-generator balance crucial for stability
BirdsColoring/
├── Arch-1-ATDL-FinalProject.ipynb # Naive CNN models (Google Colab)
├── Arc-2-ATDL-FinalProject.ipynb # GAN-based models (Google Colab)
├── Test-ATDL-FinalProject.ipynb # Model comparison (Google Colab)
├── README.md # This file
├── LICENSE # MIT License
├── THIRD_PARTY_LICENSES.md # Third-party library licenses
└── .gitignore # Git ignore rules
- Environment: Notebooks are optimized for Google Colab with GPU support
- Model Downloads: Pre-trained models are hosted on Google Drive and downloaded via
gdown - Dataset Access: Requires Kaggle authentication for UK Garden Birds dataset
- Training Time: GAN models require significant computational resources (GPU recommended)
-
Pix2Pix: Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1125-1134. https://arxiv.org/abs/1611.07004
-
MobileNetV2: Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). MobileNetV2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4510-4519. https://arxiv.org/abs/1801.04381
-
Dataset: Mahony, D. (2020). 20 UK Garden Birds. Kaggle. https://www.kaggle.com/davemahony/20-uk-garden-birds
- Inon Elgabsi
- Nati Forish
- Iyar Gadolov
- Roy Edri
This project is licensed under the MIT License - see the LICENSE file for details.
For third-party library licenses, see THIRD_PARTY_LICENSES.md.
This project demonstrates the evolution from basic CNN autoencoders to sophisticated GAN architectures for image colorization, showing significant improvements in output quality through advanced deep learning techniques.
