Deep Learning Image Colorization Project

This project develops deep learning models to automatically colorize grayscale bird images. We explore two main approaches: a Naive CNN Model and an advanced GAN-based Model to generate realistic colorized images.

Dataset

Source: UK Garden Birds Dataset from Kaggle
Content: 20 species of UK garden birds in natural settings
Image Size: All images resized to 128×128 pixels
Task: Convert grayscale input → RGB color output

Project Structure

The project consists of three Jupyter notebooks:

1. Architecture 1 - Naive Model (`Arch-1-ATDL-FinalProject.ipynb`)

Contains three CNN autoencoder experiments with progressive improvements:

Experiment 1: Basic encoder-decoder with Conv2D and upsampling layers
Experiment 2: Enhanced with dropout layers to prevent overfitting
Experiment 3: Advanced model with batch normalization (best performing naive model)

Each model converts grayscale (1-channel) input to RGB (3-channel) output using MSE loss.

2. Architecture 2 - GAN Model (`Arc-2-ATDL-FinalProject.ipynb`)

Advanced conditional GAN (CGAN) approach with two different generator architectures:

Model 1: Pix2Pix U-Net generator with PatchGAN discriminator (47 epochs)
Model 2: MobileNetV2 transfer learning generator with improved discriminator (115 epochs)

Uses adversarial loss + L1 reconstruction loss for realistic colorization.

Based on: Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE conference on computer vision and pattern recognition, 1125-1134. Paper Link

3. Test Environment (`Test-ATDL-FinalProject.ipynb`)

Evaluation notebook that compares the two best models:

Architecture 1 Model 3: Best naive CNN model
Architecture 2 GAN Model: Best GAN-based model

Downloads pre-trained weights and provides side-by-side comparison on test images.

How to Run

Prerequisites

Google Colab Account (Recommended environment)
Kaggle Account for dataset access
Libraries are automatically installed via !pip install commands in the notebooks

Running the Notebooks in Google Colab

Upload notebooks to Google Drive or open directly in Colab
Start with Architecture 1 (Naive Model):
- Open Arch-1-ATDL-FinalProject.ipynb in Google Colab
- Run the installation cells to install required libraries
- Authenticate with Kaggle when prompted
- Run cells sequentially to train all 3 experiments
Run Architecture 2 (GAN Model):
- Open Arc-2-ATDL-FinalProject.ipynb in Google Colab
- Run installation and authentication cells
- Train Pix2Pix U-Net model (Model 1) - approximately 47 epochs
- Train MobileNetV2 model (Model 2) - approximately 115 epochs
Test and Compare Models:
- Open Test-ATDL-FinalProject.ipynb in Google Colab
- Pre-trained models are automatically downloaded via gdown
- Upload test images for colorization comparison
- View side-by-side results of both architectures

Local Environment Setup (Alternative)

If running locally, ensure you have:

Python 3.7+
TensorFlow/Keras
Required libraries: numpy, matplotlib, pillow, scipy, kagglehub, gdown

Key Features

Naive CNN Models

Simple Architecture: Basic encoder-decoder structure
Progressive Enhancement: Adding dropout and batch normalization
Fast Training: 20-50 epochs depending on experiment
Loss Function: Mean Squared Error (MSE)

GAN Models

Advanced Architecture: U-Net generators with discriminators
Transfer Learning: MobileNetV2 pre-trained encoder
Sophisticated Loss: Adversarial + L1 reconstruction loss (λ=100)
Extended Training: 47-115 epochs for optimal results

Training Details

Architecture 1 Training

Optimizer: Adam (lr=2e-4)
Batch Size: 32
Epochs: 20-50 per experiment
Data Augmentation: Basic preprocessing

Architecture 2 Training

Generator Optimizer: Adam (lr=2e-4, β₁=0.5)
Discriminator Optimizer: Adam (lr=1e-4, β₁=0.5)
Batch Size: 10
Advanced Augmentation: Random jitter, horizontal flip, rotations
Training Strategy: Progressive phases with manual evaluation

Results Summary

Best Performing Models

Architecture 1 Experiment 3: Best naive approach with batch normalization
Architecture 2 Model 2: MobileNetV2-based generator (115 epochs)

Key Observations

Naive Models: Fast training but limited color saturation and detail
GAN Models: Superior visual quality with more realistic and vibrant colorization
Transfer Learning: MobileNetV2 encoder significantly improved feature extraction
Training Balance: Careful discriminator-generator balance crucial for stability

Interface Preview

Files Structure

BirdsColoring/
├── Arch-1-ATDL-FinalProject.ipynb    # Naive CNN models (Google Colab)
├── Arc-2-ATDL-FinalProject.ipynb     # GAN-based models (Google Colab)  
├── Test-ATDL-FinalProject.ipynb      # Model comparison (Google Colab)
├── README.md                          # This file
├── LICENSE                            # MIT License
├── THIRD_PARTY_LICENSES.md            # Third-party library licenses
└── .gitignore                         # Git ignore rules

Important Notes

Environment: Notebooks are optimized for Google Colab with GPU support
Model Downloads: Pre-trained models are hosted on Google Drive and downloaded via gdown
Dataset Access: Requires Kaggle authentication for UK Garden Birds dataset
Training Time: GAN models require significant computational resources (GPU recommended)

References

Pix2Pix: Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1125-1134. https://arxiv.org/abs/1611.07004
MobileNetV2: Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). MobileNetV2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4510-4519. https://arxiv.org/abs/1801.04381
Dataset: Mahony, D. (2020). 20 UK Garden Birds. Kaggle. https://www.kaggle.com/davemahony/20-uk-garden-birds

Authors

Inon Elgabsi
Nati Forish
Iyar Gadolov
Roy Edri

License

This project is licensed under the MIT License - see the LICENSE file for details.

For third-party library licenses, see THIRD_PARTY_LICENSES.md.

This project demonstrates the evolution from basic CNN autoencoders to sophisticated GAN architectures for image colorization, showing significant improvements in output quality through advanced deep learning techniques.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deep Learning Image Colorization Project

Dataset

Project Structure

1. Architecture 1 - Naive Model (`Arch-1-ATDL-FinalProject.ipynb`)

2. Architecture 2 - GAN Model (`Arc-2-ATDL-FinalProject.ipynb`)

3. Test Environment (`Test-ATDL-FinalProject.ipynb`)

How to Run

Prerequisites

Running the Notebooks in Google Colab

Local Environment Setup (Alternative)

Key Features

Naive CNN Models

GAN Models

Training Details

Architecture 1 Training

Architecture 2 Training

Results Summary

Best Performing Models

Key Observations

Interface Preview

Files Structure

Important Notes

References

Authors

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
Arc-2-ATDL-FinalProject.ipynb		Arc-2-ATDL-FinalProject.ipynb
Arch-1-ATDL-FinalProject.ipynb		Arch-1-ATDL-FinalProject.ipynb
README.md		README.md
THIRD_PARTY_LICENSES.md		THIRD_PARTY_LICENSES.md
Test-ATDL-FinalProject.ipynb		Test-ATDL-FinalProject.ipynb
interface_test.png		interface_test.png

InonELGABSI/BirdsColoring

Folders and files

Latest commit

History

Repository files navigation

Deep Learning Image Colorization Project

Dataset

Project Structure

1. Architecture 1 - Naive Model (Arch-1-ATDL-FinalProject.ipynb)

2. Architecture 2 - GAN Model (Arc-2-ATDL-FinalProject.ipynb)

3. Test Environment (Test-ATDL-FinalProject.ipynb)

How to Run

Prerequisites

Running the Notebooks in Google Colab

Local Environment Setup (Alternative)

Key Features

Naive CNN Models

GAN Models

Training Details

Architecture 1 Training

Architecture 2 Training

Results Summary

Best Performing Models

Key Observations

Interface Preview

Files Structure

Important Notes

References

Authors

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Architecture 1 - Naive Model (`Arch-1-ATDL-FinalProject.ipynb`)

2. Architecture 2 - GAN Model (`Arc-2-ATDL-FinalProject.ipynb`)

3. Test Environment (`Test-ATDL-FinalProject.ipynb`)

Packages